May 10, 2011

ad-hoc webserver from the shell

Here is a neat trick to make the current directory hierarchy available online: $ cd /tmp $ python -m SimpleHTTPServer Serving HTTP on port 8000 ...

July 15, 2010

Nested dictionaries in python

Python's defaultdict is perfect for making nested dictionaries -- especially useful if you're doing any kind of work with json or nosql. It provides a dict which returns a default value when a key isn't found. Set that default value an empty dict, and you have a convenient dict of dicts:

>>> from collections import defaultdict
>>> foo = defaultdict(dict)
>>> foo['x']

But it breaks down when you go more than one layer deep:

>>> foo['x']['y']
Traceback (most recent call last):
  File "", line 1, in 
KeyError: 'y'

You can get another layer by passing in a defaultdict of dicts as the default:

>>> bar = defaultdict(lambda: defaultdict(dict))
>>> bar['x']['y']

But suppose you want deeply-nesting dictionaries. This means you can refer as deeply into the hierarchy as you want, without needing to check whether the intermediate dictionaries have already been created. You do need to be sure that intervening levels aren't anything other than a recursive defaultdict, mind. But if you know you're going to have your content filed away inside, say, quadruple-nested dicts, this isn't necessarily a problem.

One approach would be to extend the method above, with lambdas inside lambdas:

>>> baz = defaultdict(lambda: defaultdict(lambda:defaultdict(dict)))
>>> baz[1][2][3]
>>> baz[1][2][3][4]
Traceback (most recent call last):
  File "", line 1, in 
KeyError: 4

It's marginally more readable if we use partial rather than lambda:

>>> thud = defaultdict(partial(defaultdict, partial(defaultdict, dict)))
>>> thud[1][2][3]

But still pretty ugly, and non-extending. Want infinite nesting instead? You can do it with a recursive function:

>>> def infinite_defaultdict():
...     return defaultdict(infinite_defaultdict)
>>> spam = infinite_defaultdict() #defaultdict(infinite_defaultdict) is equivalent
>>> spam['x']['y']['z']['l']['m']
defaultdict(<function infinite_defaultdict at 0x7fe4fb0c9de8>, {})

This works fine. The repr output is annoyingly convoluted, though:

>>> spam = infinite_defaultdict()
>>> spam['x']['y']['z']['l']['m']
defaultdict(, {})
>>> spam
defaultdict(<function infinite_defaultdict at 0x7fe4fb0c9de8>, {'x': 
defaultdict(<function infinite_defaultdict at 0x7fe4fb0c9de8>, {'y': 
defaultdict(<function infinite_defaultdict at 0x7fe4fb0c9de8>, {'z': 
defaultdict(<function infinite_defaultdict at 0x7fe4fb0c9de8>, {'l': 
defaultdict(<function infinite_defaultdict at 0x7fe4fb0c9de8>, {'m': 
defaultdict(<function infinite_defaultdict at 0x7fe4fb0c9de8>, {})})})})})})

A cleaner way of achieving the same effect is to ignore defaultdict entirely, and make a direct subclass of dict. This is based on Peter Norvig's original implementation of defaultdict:

>>> class NestedDict(dict):
...     def __getitem__(self, key):
...         if key in self: return self.get(key)
...         return self.setdefault(key, NestedDict())

>>> eggs = NestedDict()
>>> eggs[1][2][3][4][5]
>>> eggs
{1: {2: {3: {4: {5: {}}}}}}

December 26, 2008

Opening up a tax haven

Panama is still one of the biggest and most important tax havens. As well as its absurd tax regime, its corporate disclosure regime means it is very difficult to get information about Panamanian companies.

Or rather, it was. Panama recently put online their company registry. You can now retrieve the names of the current directors of every Panamanian company, as well as all the company's filings themselves (minutes of company meetings, details of shareholdings, ownership, certificates of incorporation etc. etc.).

Nice, but you can only search by the name of the company. If you want to find somebody who is dodging tax or doing something else dubious, you really need to search by director's name.

This tool fixes that problem. I've scraped all 600,000 company records, going back 30 years, and indexed by directors.

Now you can, for instance, look up recently-arrested arms dealer Monzer al-Kassar, and you find a couple of companies. Looking through the records, you find the company's current treasurer is Hans-Ulrich Ming, chairman of Swiss firm Pax Anlage. Previous directors include Enrico Ravano, president of Contship, the Italian company that controls the Calabrian port of Gioia Tauro. A Feb 2008 report for the Italian parliament accused Ravano of complicity in cocaine smuggling by the Calabrian mafia through Gioia Tauro - the report cited Italian estimates that 80% of all Europe's cocaine is smuggled through Gioia Tauro. Ravano's connection to al Kassar could help to stand up accusations (which al Kassar has always denied) that al Kassar was involved in drug trafficking as well as weapons trafficking; and helps to undermine Ravano's recent denials that he's had anything to do with any trafficking of any sort. [This set of connections was in fact found manually, by Global Witness, and was part of the inspiration to build the search]

Or take Nadhmi Auchi: Iraqi-British billionnaire, companion of Saddam Hussein in the '50s, convicted of fraud in France (but appealing). I've not yet looked through the records of companies held by him and his friends - but there are plenty of records there, doubtless including some interesting connections.

And there are plenty more interesting names to look up. Most satisfyingly, it's already proving useful in figuring out the activities of various currently-active arms dealers...

Want the raw data? Here is a database dump.

November 27, 2006

Westminster's map

[Update: I finally got round to adding legends to the maps]

Which countries get talked about in parliament? With data from They Work For You, I've put together these maps of where MPs like to talk about. Here's the number of mentions a country has had in parliament recently, adjusted for population:

<- Few mentions_________________Many mentions->

Looking at this, I'm actually surprised at how globally-minded Parliament is. Sudan (pop. 34.2 million) gets 2,302 mentions; Germany (pop. 82.5 million) has only 3,695 mentions in parliament.

Far from being ignored, Africa actually gets mentioned well beyond its economic importance to the UK. South America, on the other hand, is basically ignored.

Then there's the size bias: small countries get more mentions than big ones, once you adjust for population. Look at Mongolia: Westminster, it seems, finds Mongolians immensely more important than Chinese. The bias can partly be discounted as a problem with measurement: parliament is prone to lists of foreign relations and trade issues, for instance, which mention every country regardless of how small it is. Also, it's possible MPs talk about areas within China or India, which I wouldn't have picked up on.

But there's more to it: larger countries really do get short-changed in the attention we give them. China has a population perhaps 150 times larger than than of Bolivia - but we don't hear anything like 150 times as much news from China. We're all biased by imagining a world made up of nations, and giving the same weight to nations of all sizes. Small islands got discussed an incredible amount - particularly places in the news, like Tuvalu and the Pitcairns, but others as well.

Continue reading "Westminster's map" »

November 7, 2006

Memes: toxic in China

Remember the Free Hugs meme? Somebody in Australia started hugging people in the streets, it spread to Russia, Italy, Taiwan, Korea, Poland, and pretty much the rest of the world.

Then, some people in Shanghai tried it - and were promptly arrested

Shanghai Free Hugs

Before the arrest, presumably

The huggers were released after a couple of hours, but still: a big 'meh!' to the Chinese police

[cross-post from livejournal]

October 3, 2006

Conference reloaded

How can you develop a service without sharing a language with your users?

Holed up in Budapest, my head too messed up to do any proper work (eep! the doom she is a-coming!), I've been listening to danah Boyd's keynote at the blogtalk conference that's just winding up in Vienna.

She touches on the fact that the creators of Orkut don't have the faintest idea what their Portugese or Hindi-speaking users are doing. I'd always vaguely assumed that there would be a fair few Portugese-speakers within the Orkut development team, for instance. But obviously not.

It'd be a nice little project for a journalist or an anthropologist, to work out how much the developers of these sites know about their users.