Recently I wanted to quickly convert some CSV files to AVRO due to recent logging changes that meant we were receiving AVRO logs instead of CSV. I wanted to have some AVRO files to test my logic on and to get more familiar with AVRO. After looking around for a while and trying a few different CSV to AVRO conversion utilities, none of them actually worked as expected. I just wanted a simple conversion, why was this so hard to find? The closest I came to finding a working CSV2AVRO utility was avroutils.

Unfortunately avroutils hasn’t been updated in 5 years and the code was very restrictive since it offered a command line argument for “headers.” However regardless of whether that argument was passed, it used the [in my opinion, flawed] utility “csv.Sniffer” in Python to confirm there was a header. For me, this meant even though I was explicitly telling it my file had a header it still did not convert due to the Sniffer claiming there was no header.

To reiterate, I really just wanted a simple utility to convert CSV to AVRO if for no other reason than to some hands-on experience with the AVRO format. Why was I having such a hard time?

So… I wrote my own [very basic] utility to handle CSV to AVRO conversions. It does no validation (re: it assumes you know what you’re doing) and is hard-coded to treat all fields as strings. Remember, this was primarily a learning exercise!

Without further ado, I present to you csv2avro

Questions/Comments/Bugs/Requests/etc are encouraged!


Split file by keys Metadata for Functions | Python Decorator

  1. When I am running this code it is giving me an error that: TypeError: write() argument must be str, not bytes.
    Using the same code and files as yours can you tell me why this is happening

    1. Probably related to your python version (e.g. 2.7 versus 3.6). Hopefully you got it resolved.

Leave a Reply

Your email address will not be published.