Metadata for Functions | Python Decorator

Often I find myself wanting more information about the Python functions I’m running, whether it’s because I want to debug, log or even time their completion. All of these are relatively well-defined problems (debugging excepted). Unfortunately no tool makes it easy enough from my research to truly see the input, output, time elapsed, errors, warnings, etc about a function in a simple interface. So I wrote a simple python decorator compatible with Python 2.7+ (and probably earlier versions), including Py3.

What does the meta_func decorator actually do?

It stores all arguments for every function call, both positional and keyword arguments, error information (including the ability to catch and not raise errors), warnings, time elements (time started, ended and elapsed), and the returned value as a standard python dictionary.

What’s the point of tracking all this [meta]data?

Debugging, Logging, Timing… The use cases are nearly endless, because it tells us a lot of what’s going on in one easily interpreted structure.

Important Notes

This decorator should be expected to add a good deal of overhead to many function calls due to the handling of so many dimensions.

Arguments (Positional and Keywords), Return value, Warnings and Exceptions will be stored in their raw form, so any transformations (such as stringifying errors and traceback) would need to be done post-processing.

The error_info field will return a tuple from sys.exc_info() with error details.

Github Repo

The Power of Hadoop in under 10 lines!

Okay Okay, I may have oversold it a bit, but here are less than 10 bash lines that resemble (if you squint really hard) Hadoop/MapReduce.

split -d -a 5 -l 100000 $in_file $in_file"_" && \
ls $in_file"_"* | xargs -P8 -n1 -I file $code_to_run file file.out && \
cat $in_file"_"*.out > $out_file && \
rm $in_file"_"*

What will this do?
Takes 3 args

  • code_to_run is just a path to an executable
  • in_file is a path to a single in_file
  • out_file is a path to a single out_file
split -d -a 5 -l 100000 $in_file $in_file"_"

Split the in_file into 100,000 line chunks with an underscore and numbers following (e.g. in_file = “file.tsv”, temp files file.tsv_00000, file.tsv_00001, etc)

ls $in_file"_"* | xargs -P8 -n1 -I file $code_to_run file file.out

Get a list of all temp numbered in files, pass into xargs to run 8 processes of your code_to_run executable passing in the chunked in_file and outputting a chunked out_file.

cat $in_file"_"*.out > $out_file

Then cat chunked out files into single out file as you expect

rm $in_file"_"*

Cleanup (re: remove) all temporary files; both in and out temporary files will be removed.

For the sake of data safety, we include “&&” following each line to ensure all subsequent commands are not run unless the prior conditions are met.