Performance of in operator using list and set

I had this use case where I have to check which elements of a list of words where available in another list of words. So I decided to use the operator in. Just for further reference a tried the following:

# common code for all test
base_list = [...]
query_list = [...]
  1. Pretty simple method:
    for word in query_list:
      if word in base_list:
        # do something

    For a list of 4284 elements against a list of 107 it took 9 seconds. Using simple lists, this method is the most straight forward of all, and also the slowest one.

  2. Sorting things:
    base_list.sort()
    for word in query_list:
      if word in base_list:
        # do something

    After sorting the list, guess what? Yeap, nothing changed, same 9 seconds

  3. What about sets?
    bs = set(base_list)
    for word in query_list:
      if word in bs:
        # do something

    Using sets this is another history, 0.6 seconds for the same amount of data; but… if this could be achived turning one of lists into a set, what if…

  4. Using more sets
    bs = set(base_list)
    qs = set(query_list)
    solution = bs.intersection(qs)

    0.02 seconds.

Well, as you can see, sets are great.

Comments are currently closed.