Converting CSVs with Headers to AVRO

Recently I wanted to quickly convert some CSV files to AVRO due to recent logging changes that meant we were receiving AVRO logs instead of CSV. I wanted to have some AVRO files to test my logic on and to get more familiar with AVRO. After looking around for a while and trying a few… Continue reading


Metadata for Functions | Python Decorator

Often I find myself wanting more information about the Python functions I’m running, whether it’s because I want to debug, log or even time their completion. All of these are relatively well-defined problems (debugging excepted). Unfortunately no tool makes it easy enough from my research to truly see the input, output, time elapsed, errors, warnings,… Continue reading


The Power of Hadoop in under 10 lines!

Okay Okay, I may have oversold it a bit, but here are less than 10 bash lines that resemble (if you squint really hard) Hadoop/MapReduce. code_to_run=$1 in_file=$2 out_file=$3 split -d -a 5 -l 100000 $in_file $in_file”_” && \ ls $in_file”_”* | xargs -P8 -n1 -I file $code_to_run file file.out && \ cat $in_file”_”*.out > $out_file… Continue reading


ELI5 – The Human analogy to Multi-threading (Concurrency) and how it can go wrong

Ever feel like your brain is processing multiple things at once? An example might be when you’re reading a book and listening to music. While you may not realize it, your brain is processing both what you’re reading and what you’re hearing simultaneously. A computer operates similarly; it is constantly working in the background doing… Continue reading