May 16th, 2011 § § permalink
Oddly, there seems to be no mode function in the python standard library. It feels like something that should have an optimized C version squirreled away somewhere. ‘Mode’ is too ambiguous to be easily searchable, alas. Anyway, here’s a version that should be reasonably fast
from collections import defaultdict
def mode(iterable):
counts = defaultdict(int)
for item in iterable:
counts[item] += 1
return max(counts, key = counts.get)
|
from collections import defaultdict def mode(iterable): counts = defaultdict(int) for item in iterable: counts[item] += 1 return max(counts, key = counts.get)
Should be reasonably fast (for pure-python), though could eat up a lot of memory on an iterable contaning large items.
October 16th, 2010 § § permalink
A python gotcha I never knew of before:
Python 2.6.5 (release26-maint, Aug 20 2010, 17:50:24)
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> 0 is 0 + 0
True
>>> 256 is 256 + 0
True
>>> 257 is 257 + 0
False
>>>
July 8th, 2010 § § permalink
MongoDB (and nosql generally) is an appealing idea. The words written about it, though, are problematic: too much hype, too little documentation. That’ll change soon; we’re over the peak of the nosql hype cycle, into the trough. People are looking at the nosql systems they’ve eagerly implemented in recent months, noticing that they won’t solve every problem imaginable. For now, though, every blogpost with mongodb instructions is prefaced with grumbles about the lack of information.
So, i spend a ridiculous amount of time figuring out how to do grouping. Have a bunch of download logs, want to break them down by country.
The simplest way I could find of doing this is:
db.loglines.group({ ‘cond’ : {}, initial: {count: 0}, reduce: function(doc, out){out.count++;if(out[doc.country] == undefined){out[doc.country] = 0;};out[doc.country] += 1;}});
Or, the version in pymongo:
> reduce_func = """function(doc, out){
out.total++;
if(out[doc.country] == undefined){
out[doc.country] = 0;};
out[doc.country] += 1;};
"""
> l.group(key = {},
condition = {},
initial = {'total':0},
reduce = reduce_func)
[{
u'AE': 215.0,
u'AG': 23.0,
u'AM': 140.0,
u'AN': 58.0,
u'AO': 56.0,
...
u'total' : 87901;
}]
[apologies for formatting; I’ve not really figured out how to edit js within a python repl]
July 5th, 2010 § § permalink
Neat trick from stackoverflow: the re.DEBUG flag for python regexes:
> re.compile('a(b+)a', re.DEBUG)
literal 97
subpattern 1
max_repeat 1 65535
literal 98
literal 97