Open Data

July 9th, 2010 § 0 comments § permalink

We’re in the midst of a data explosion. Then again, we’re always in the midst of a daa explosion. It’s been developing, wave by wave, since the first Sumerian scribe pushed his wedge into clay. Maybe it feels different this time; maybe it’s always felt different.

The past two centuries saw the gradual triumph of ordered data collection: the regimented and expensive process of the census, the time-motion study, the economic indicator. The province of powerful behemoths — government, military, corporate or the omnipresent RAND corporation — such projects were rigorously plannedat the top, then executed by a small army of functionaries.

In the last 15 years, something has changed. Quantitative change, initially: more data, faster computers, easier transmission of information. But also a change in quality. Now we’ve moved into the era of data as by-product. Our clicks and our purchases are tracked because watching us is cheap and easy, not as part of a pre-planned technocratic project. Such cheapness brings us into the age of data abundance, and we’re only beginning to appreciate the consequences and the possibilities.

Enter the Open Data movement. Bubbling with geekish idealism, this is a loose grouping of campaigners trying to prize large datasets out of government and corporate hands, bringing them into the agora. Knowledge here may be measured in SQL dumps, linked data and gigabytes of official transcripts, but the idealism fits into the standard pattern: the Truth will set you free.

Freebase

July 9th, 2010 § 0 comments § permalink

getting country population data from freebase:


from freebase.api import HTTPMetawebSession, MetawebError
mss = HTTPMetawebSession('www.freebase.com')
list(mss.mqlread([{'name': None, 'type': '/location/country', '/location/country/iso31661alpha2' : None, '/location/statistical_region/population' : [{'number': None}] }]))

GANTT

July 8th, 2010 § 0 comments § permalink

Embedded in a project that’s floundering a little as it expands beyond the size that the devs can keep in their heads. So, looking for some relatively lightweight, way of visualizing the moving parts and the work that needs to be done. And, as every other time I’ve looked in this area, finding most solutions to be too feature-light, too complicated, or sometimes both.

First are the project scheduling systems. Whatever they focus on, it’s hard to think of them except as tools for generating GANTT charts. I can imagine these being useful for, say, a big construction project with complex interdependencies of people and machines. For coding, not so much. Particularly not Taskjuggler, which seems to delight in being non-user-friendly. That is , it is is complicated and does a bad job of explaining itself — but then tries to use this as evidence of how sophisticated it is. I ran away before finding out; complexity is not what I want!

Gnome planner is quite possibly much inferior for large projects, but at least lets me add a task without hours grepping through the docs. If I ever need a gantt chart, I’ll certainly head there rather than taskjuggler. I honestly believe that coding extra features into planner as required would be easier than making sense of taskjuggler

So, I think I’ll do without!

MongoDB

July 8th, 2010 § 0 comments § permalink

MongoDB (and nosql generally) is an appealing idea. The words written about it, though, are problematic: too much hype, too little documentation. That’ll change soon; we’re over the peak of the nosql hype cycle, into the trough. People are looking at the nosql systems they’ve eagerly implemented in recent months, noticing that they won’t solve every problem imaginable. For now, though, every blogpost with mongodb instructions is prefaced with grumbles about the lack of information.

So, i spend a ridiculous amount of time figuring out how to do grouping. Have a bunch of download logs, want to break them down by country.
The simplest way I could find of doing this is:

db.loglines.group({ ‘cond’ : {}, initial: {count: 0}, reduce: function(doc, out){out.count++;if(out[doc.country] == undefined){out[doc.country] = 0;};out[doc.country] += 1;}});

Or, the version in pymongo:


> reduce_func = """function(doc, out){
out.total++;
if(out[doc.country] == undefined){
out[doc.country] = 0;};
out[doc.country] += 1;};
"""

> l.group(key = {},
condition = {},
initial = {'total':0},
reduce = reduce_func)
[{
u'AE': 215.0,
u'AG': 23.0,
u'AM': 140.0,
u'AN': 58.0,
u'AO': 56.0,
...
u'total' : 87901;
}]

[apologies for formatting; I’ve not really figured out how to edit js within a python repl]

BP oil spill

July 8th, 2010 § 0 comments § permalink

I often avoid certain news stories: not because they’re unimportant, but because I doubt I’ll learn much by discovering them in the day-by-day dribble of the daily press.

The BP Oil Spill is one: I’m not going to bother with short articles on it, but I’d really love to follow the long ones. I’ve idly watched the speculation ramp up to biblical proportions, but have no idea how to interpret it.

[no content here, as you can see, just a stick in the ground to note how shameful it is that I know nothing about this]

Protected: Does reality pass the flowchart test?

July 8th, 2010 § Enter your password to view comments. § permalink

This content is password protected. To view it please enter your password below:

more gaga

July 6th, 2010 § 0 comments § permalink

More on Alejandro:
– Bad Romance may have been similarly intricate
– The dreamscape reminds me strongly of Gaiman, although that probably means no more than that Gaiman’s been on my mind lately
– Everybody seems to have seen the religious elements as a homaget to Madonna, with “like a prayer”. Fair enough, but it surely also has some connection to Derek Jarman’s video for the pet shop boys’ It’s a Sins

GaGa

July 6th, 2010 § 0 comments § permalink

It’s taken a long while, but I’m now, finally, a convert to the church of gaga. It’s all Alejandro’s fault, and more particularly in the video. It’s another epic 8-minute piece, which means there’s plenty of time to develop a good many themes. She’s doing what I like best: not making a syllogism with her music, but layering loosely-connected themes so that, if you clap your hands and try to believe, you’ll be able to weave your own meaning out of it.

It’s somehow very European, but drawn from disparate sources within that; Gaga surely deserves some EU subsidy for semiotic integration. The setting is mystical and unspecific, but in a cold and German fashion. Gaga appears as Dream or an Ice Queen, or maybe as Narnia’s White Witch. But this isn’t Narnia, with children and a christ-like lion. It’s Weimar, a collapsing world where introspectively melodramatic romance must take the place of morality. It’s intense and fearful, slightly frigid, physicality replaced by power. Even the male dance troupe are desexualised; after entering with a haka-like swagger, they retreat into stylised weirdness.

So far, we’re in a familiar aesthetic, one which runs from Rammstein to Bauhaus through the entire spectrum of goth. The equivocation between sex and violence is likewise familiar, though rebel chic rarely gets as far as a semi-automatic bra. It’s the hispanic eurodisco elements that take us away. Our tragic ice queen seems about to start singing ‘numa numa ey’. Teutonic tragic Romance meets the Romance culture — in accent, if not in much else.

debugging python regexes

July 5th, 2010 § 0 comments § permalink

Neat trick from stackoverflow: the re.DEBUG flag for python regexes:


> re.compile('a(b+)a', re.DEBUG)
literal 97
subpattern 1
max_repeat 1 65535
literal 98
literal 97

documentaries

July 5th, 2010 § 0 comments § permalink

Spent a chunk of the weekend with a clique of Australian travellers and party animals — who turned out to have a sweet and counter-intuitive affection for watching documentaries. Also, chess. On the documentaries, they turned me onto this giant list of of documentaries to watch online.

Patch: vi-style scrolling for the comix image viewer

July 3rd, 2010 § 0 comments § permalink

Comix is my favourite no-bloat viewer for collections of images: not just for comics, but also for paging through a directory full of graphs or photographs. But needing to slip to the arrow keys for navigation is an irritation: hence this quick little patch to enable h,j,k,l scrolling. [I later realise that the h conflicts with another keyboard shortcut. So it goes]. This is in lieu of a large patch, which I’ll probably never write, allowing shortcut keys to be set from a config file.

Here on github — the first time I’ve used github in this way, and an impressively painless experience. I’m now itching to hack on other code that’s hosted there. [also, there’s probably a way of getting automatic github updates posted here, or to facebook, or something]

Patch: blogger post from stdin for googlecl

July 3rd, 2010 § 0 comments § permalink

Going to start hijacking this blog, to record/link to patches I submit to various open-source projects. As with everything else on here, it’s mainly to ensure I can find these little snippets a few months later.

So, to start, something intended for this blog itself. A patch to the google commandline tools enabling the “google blogger post” command to post content read from stdin (adding to the current options of supplying a string or a filename). Usage is the traditional ‘-‘ in place of a filename.

This enables two pieces of functionality I’d find very useful:
A) filter content through other programs. e.g. using markdown to HTMLify my content:
$ markdown post.txt | google blogger post –
B) make a blogpost from within vim, by selecting my post content and piping it to googlecl

tail wagging the dog

July 3rd, 2010 § 0 comments § permalink

AP, via Wired:

“This year, the Pentagon will employ 27,000 people just for recruitment, advertising and public relations — almost as many as the total 30,000-person work force in the
State Department.”

LSE Podcasts

July 3rd, 2010 § 0 comments § permalink

:The LSE seem to have had an unusually interesting speakers lately. Not sure if it’s an end-of-term twist away from serious economics towards the more accessible stuff. Žižek, Clay Shirky and Andrew Ross Sorkin, all in the space of a day; what a treat!

New post

July 2nd, 2010 § 0 comments § permalink

what is the ø in infinite thøught? Merely the philosophical counterpart to the Heavy Metal Umlaut? Or are we in the equally-depressing land of subtle and pointless theoretical in-jokes?

New post

July 2nd, 2010 § 0 comments § permalink

mysql file output is efficient — but needs the FILE permission, which mysql turns off by default for most users:

mysql> select foo into outfile '/tmp/bar.txt' from sometable group by foo;

New post

July 2nd, 2010 § 0 comments § permalink

While loving both The Parallax View and All the President’s Men, I’d somehow never realised that they had a forgotten sibling. Klute is the first member of what came to be known as director Alan Pakula’s political paranoia trilogy. One to watch.

New post

July 2nd, 2010 § 0 comments § permalink

While loving both _The Parallax View_ and _All the President’s Men_, I’d somehow never realised that they had a forgotten sibling. Klute is the first member of what came to be known as director Alan Pakula’s _political paranoia trilogy_. One to watch.

finding and editing

July 1st, 2010 § 0 comments § permalink

Search for files containing some text, open them in vim (one per tab)

 grep -l foo ./* | xargs vim -p

Alternatively, to get a single-line list that can be edited and then copy-pasted to a command-line:

grep -l foo ./* | xargs echo

There are more heavy-duty ways of removing lines in output listed here, but I see little reason for using them.t

Where am I?

You are currently viewing the archives for July, 2010 at Dan O'Huiginn.