Highlights

Subscribe

Creative Commons License
This weblog is licensed under a Creative Commons License.
Powered by
Movable Type 4.25

« Internet underground | Main | Roma »

Nested dictionaries in python

Python's defaultdict is perfect for making nested dictionaries -- especially useful if you're doing any kind of work with json or nosql. It provides a dict which returns a default value when a key isn't found. Set that default value an empty dict, and you have a convenient dict of dicts:


>>> from collections import defaultdict
>>> foo = defaultdict(dict)
>>> foo['x']
{}

But it breaks down when you go more than one layer deep:


>>> foo['x']['y']
Traceback (most recent call last):
  File "", line 1, in 
KeyError: 'y'

You can get another layer by passing in a defaultdict of dicts as the default:


>>> bar = defaultdict(lambda: defaultdict(dict))
>>> bar['x']['y']
{}

But suppose you want deeply-nesting dictionaries. This means you can refer as deeply into the hierarchy as you want, without needing to check whether the intermediate dictionaries have already been created. You do need to be sure that intervening levels aren't anything other than a recursive defaultdict, mind. But if you know you're going to have your content filed away inside, say, quadruple-nested dicts, this isn't necessarily a problem.

One approach would be to extend the method above, with lambdas inside lambdas:


>>> baz = defaultdict(lambda: defaultdict(lambda:defaultdict(dict)))
>>> baz[1][2][3]
{}
>>> baz[1][2][3][4]
Traceback (most recent call last):
  File "", line 1, in 
KeyError: 4
>>> 

It's marginally more readable if we use partial rather than lambda:


>>> thud = defaultdict(partial(defaultdict, partial(defaultdict, dict)))
>>> thud[1][2][3]
{}

But still pretty ugly, and non-extending. Want infinite nesting instead? You can do it with a recursive function:


>>> def infinite_defaultdict():
...     return defaultdict(infinite_defaultdict)
... 
>>> spam = infinite_defaultdict() #defaultdict(infinite_defaultdict) is equivalent
>>> spam['x']['y']['z']['l']['m']
defaultdict(<function infinite_defaultdict at 0x7fe4fb0c9de8>, {})

This works fine. The repr output is annoyingly convoluted, though:


>>> spam = infinite_defaultdict()
>>> spam['x']['y']['z']['l']['m']
defaultdict(, {})
>>> spam
defaultdict(<function infinite_defaultdict at 0x7fe4fb0c9de8>, {'x': 
defaultdict(<function infinite_defaultdict at 0x7fe4fb0c9de8>, {'y': 
defaultdict(<function infinite_defaultdict at 0x7fe4fb0c9de8>, {'z': 
defaultdict(<function infinite_defaultdict at 0x7fe4fb0c9de8>, {'l': 
defaultdict(<function infinite_defaultdict at 0x7fe4fb0c9de8>, {'m': 
defaultdict(<function infinite_defaultdict at 0x7fe4fb0c9de8>, {})})})})})})

A cleaner way of achieving the same effect is to ignore defaultdict entirely, and make a direct subclass of dict. This is based on Peter Norvig's original implementation of defaultdict:


>>> class NestedDict(dict):
...     def __getitem__(self, key):
...         if key in self: return self.get(key)
...         return self.setdefault(key, NestedDict())


>>> eggs = NestedDict()
>>> eggs[1][2][3][4][5]
{}
>>> eggs
{1: {2: {3: {4: {5: {}}}}}}

Comments

You made my day with this post. I'm continously working with defaultdicts and was missing something like the NestedDict that you describe here. I didn't put much thought on it but now I read it here I see it's easy to implement. NestedDictionary should be included in defaultdict module.

you know __missing__, do you?

Your example references `RecursiveDict` within your `NestedDict` implementation. @a's `__missing__` suggestion works like a charm btw.

Thanks, Angelo! Fixed.