Nice Python Gems

EDIT (9/18/2011): There is another way of having a default value in python dictionaries, and it’s using the defaultdict class in the collections module (python 2.5).  Here are the docs.

I was checking my old bookmarks and I stumbled upon these two blog posts: Gems of Python by Eric Florenzano and Python gems of my own by Eric Holscher.

Did you know about setdefault for dicts? I’ve found myself more than once using a dict as a multimap and I always felt that there must be a better way of doing it than this:

dct = {}
items = ['anne', 'david', 'kevin', 'eric', 'anthony', 'andrew']
for name in items:
    if name[0] not in dct:
        dct[name[0]] = []
    dct[name[0]].append(name)

And I was right, from Eric Florenzano’s post:

dct = {}
items = ['anne', 'david', 'kevin', 'eric', 'anthony', 'andrew']
for name in items:
    dct.setdefault(name[0], []).append(name)

I knew it…

Create a tarball from a specific git commit

I have this code I’m working on that is needs to be deployed remotely (Fabric makes this SO easy) and I’m using git as my version control system. Well, I was creating tarballs for this, but I was basically doing them as similar as I could to what git had in the tree (ignoring the same files, etc.), so it was kind of repetitive and, boring. Well, git to the rescue!!!


git archive --format=tar HEAD | gzip > myproject.tar.gz

And you have a nice clean zipped tarball of your code as is on HEAD, without tears :)

Git is awesome!

Performance of in operator using list and set

I had this use case where I have to check which elements of a list of words where available in another list of words. So I decided to use the operator in. Just for further reference a tried the following:

# common code for all test
base_list = [...]
query_list = [...]
  1. Pretty simple method:
    for word in query_list:
      if word in base_list:
        # do something

    For a list of 4284 elements against a list of 107 it took 9 seconds. Using simple lists, this method is the most straight forward of all, and also the slowest one.

  2. Sorting things:
    base_list.sort()
    for word in query_list:
      if word in base_list:
        # do something

    After sorting the list, guess what? Yeap, nothing changed, same 9 seconds

  3. What about sets?
    bs = set(base_list)
    for word in query_list:
      if word in bs:
        # do something

    Using sets this is another history, 0.6 seconds for the same amount of data; but… if this could be achived turning one of lists into a set, what if…

  4. Using more sets
    bs = set(base_list)
    qs = set(query_list)
    solution = bs.intersection(qs)

    0.02 seconds.

Well, as you can see, sets are great.