I have a series of functions that I apply to each record in a dataset to generate a new field I store in a dictionary (the records—"documents"—are stored using MongoDB). I broke them all up as they are basically unrelated, and tie them back together by passing them as a list to a function that iterates through each operation for each record and adds on the results.
What irks me is how I'm going about it in what seems like a fairly inelegant manner; semi-duplicating names among other things.
def _midline_length(blob):
'''Generate a midline sequence for *blob*'''
return 42
midline_length = {
'func': _midline_length,
'key': 'calc_seq_midlen'} #: Midline sequence key/function pair.
Lots of these...
do_calcs = [midline_length, ] # all the functions ...
Then called like:
for record in mongo_collection.find():
for calc in do_calcs:
record[calc['key']] = calc['func'](record) # add new data to record
# update record in DB
Splitting up the keys like this makes it easier to remove all the calculated fields in the database (pointless after everything is set, but while developing the code and methodology it's handy).
I had the thought to maybe use classes, but it seems more like an abuse:
class midline_length(object):
key = 'calc_seq_midlen'
@staticmethod
def __call__(blob):
return 42
I could then make a list of instances (do_calcs = [midline_length(), ...]
) and run through that calling each thing or pulling out it's key
member. Alternatively, it seems like I can arbitrarily add members to functions, def myfunc():
then myfunc.key = 'mykey'
...that seems even worse. Better ideas?