« January 2009 | Main | March 2009 »

February 28, 2009

OED, again

A little more on the OED. The idea of creating a publicly-accessible version has obviously been floating around for a few years. As well it might: not only would an open OED be fantastically useful, but there's a certain justice in bringing it back to the community. As Kragen Sitaker writes, the original OED

is one of the earliest instances of what are now called "pro-am" or "commons-based peer production" projects. From 1857 to 1928, thousands of readers collected examples of uses of words their dictionaries didn't define; they mailed these examples on slips of paper to a small number of editors, who undertook to collate them into a dictionary.

Kragen's attempt to liberate the OED was the most effective: not only did he get one set of the OED scanned, he also cooked up some code making it possible to look up individual words. Alas, his system is now offline - such is the fate of one-man projects. Rufus Pollock's attempt to revive it, within the framework of the Open Knowledge Foundation, seems not to have got anywhere.

More ambitious are the Distributed Proofreaders, a group who take OCR'ed books, edit and correct them by hand, and pass them on to rProject Gutenberg. They've been contemplating the idea of tacking the OED for some time now. But it's a pretty daunting project - both in scale, and in the complexity of the typography - and every attempt seems to peter out.

Which is all a bit of a disappointment. I'm not quite foolhardy enough to lauch myself into digitising the OED just yet, but there must be at least some prospect to make those scans slightly more user-friendly.

The Oxford English Dictionary, free

[update: Here is a very rough interface, which will be improved whenever I next have some free time]

Using the OED online costs £200/year, which is silly. Fortunately the first edition is out of copyright, and available at the Internet Archive. Unfortunately, it's a bit tricky to find the right volume in a format that doesn't expect you to download 200MB to look up a word. Djvu seems the best option; you need to install a browser plugin first, but then you can look at individual pages quite easily. Here are links to each volume:

A-B, C, D-E, F-G (pdf only) , H-K, L, M-N, O-P (flip-book only), Q-R, S-SH, SI-SU (flip-book only), SV-TH, TI-U, V-Z (flip-book only)

Other formats are at these links (yes, there are two separate scans, one from the University of Toronto and another from Kragen Sitaker):

  • Volume 1, A-B: Sitaker
  • Volume 2, C: Sitaker
  • Volume 3, D-E: Toronto (partial), Complete?
  • Volume 4, F-G: Sitaker, Toronto (no djvu for either)
  • Volume 5, H-K: Sitaker
  • Volume 6A, L:A Sitaker
  • Volume 6B, M-NB (Sitaker)
  • Volume 7, O-P: Toronto (flip-book only), Unlabelled (flipbook/pdf only)
  • Volume 8A, Q-R: A - Sitaker
  • Volume 8B, S-SHB - Sitaker
  • Volume 9A, SH-SU: Sitaker.
  • Volume 9B, SV-TH: Sitaker, Toronto
  • Volume 10A, TI-U: Sitaker, Toronto
  • Volume 10B, V-Z: Toronto, Sitaker