Converting CSVs with Headers to AVRO

Recently I wanted to quickly convert some CSV files to AVRO due to recent logging changes that meant we were receiving AVRO logs instead of CSV. I wanted to have some AVRO files to test my logic on and to get more familiar with AVRO. After looking around for a while and trying a few different CSV to AVRO conversion utilities, none of them actually worked as expected. I just wanted a simple conversion, why was this so hard to find? The closest I came to finding a working CSV2AVRO utility was avroutils.

Unfortunately avroutils hasn’t been updated in 5 years and the code was very restrictive since it offered a command line argument for “headers.” However regardless of whether that argument was passed, it used the [in my opinion, flawed] utility “csv.Sniffer” in Python to confirm there was a header. For me, this meant even though I was explicitly telling it my file had a header it still did not convert due to the Sniffer claiming there was no header.

To reiterate, I really just wanted a simple utility to convert CSV to AVRO if for no other reason than to some hands-on experience with the AVRO format. Why was I having such a hard time?

So… I wrote my own [very basic] utility to handle CSV to AVRO conversions. It does no validation (re: it assumes you know what you’re doing) and is hard-coded to treat all fields as strings. Remember, this was primarily a learning exercise!

Without further ado, I present to you csv2avro

Questions/Comments/Bugs/Requests/etc are encouraged!

Leave a Reply

Your email address will not be published. Required fields are marked *