October 2017 – Garren's [Big] Data Blog

Spark File Format Showdown – CSV vs JSON vs Parquet

Posted by Garren on 2017/10/09

Apache Spark supports many different data sources, such as the ubiquitous Comma Separated Value (CSV) format and web API friendly JavaScript Object Notation (JSON) format. A common format used primarily for big data analytical purposes is Apache Parquet. Parquet is a fast columnar data format that you can read more about in two of my… Continue reading→

Apache Spark Best Practices, CSV, JSON, Parquet, s3, spark Leave a Comment