<br />
<b>Deprecated</b>:  Creation of dynamic property wpdb::$categories is deprecated in <b>/home/garrens3/public_html/blog/wp-includes/wp-db.php</b> on line <b>760</b><br />
<br />
<b>Deprecated</b>:  Creation of dynamic property wpdb::$post2cat is deprecated in <b>/home/garrens3/public_html/blog/wp-includes/wp-db.php</b> on line <b>760</b><br />
<br />
<b>Deprecated</b>:  Creation of dynamic property wpdb::$link2cat is deprecated in <b>/home/garrens3/public_html/blog/wp-includes/wp-db.php</b> on line <b>760</b><br />
<br />
<b>Deprecated</b>:  Using ${var} in strings is deprecated, use {$var} instead in <b>/home/garrens3/public_html/blog/wp-includes/comment-template.php</b> on line <b>1747</b><br />
<br />
<b>Deprecated</b>:  Optional parameter $term_id declared before required parameter $meta_value is implicitly treated as a required parameter in <b>/home/garrens3/public_html/blog/wp-content/plugins/advanced-code-editor/advanced-code-editor.php</b> on line <b>1927</b><br />
<br />
<b>Deprecated</b>:  Optional parameter $term_id declared before required parameter $meta_value is implicitly treated as a required parameter in <b>/home/garrens3/public_html/blog/wp-content/plugins/advanced-code-editor/advanced-code-editor.php</b> on line <b>1941</b><br />
<br />
<b>Deprecated</b>:  Optional parameter $term_id declared before required parameter $meta_key is implicitly treated as a required parameter in <b>/home/garrens3/public_html/blog/wp-content/plugins/advanced-code-editor/advanced-code-editor.php</b> on line <b>1956</b><br />
<br />
<b>Deprecated</b>:  Optional parameter $term_id declared before required parameter $key is implicitly treated as a required parameter in <b>/home/garrens3/public_html/blog/wp-content/plugins/advanced-code-editor/advanced-code-editor.php</b> on line <b>1970</b><br />
<br />
<b>Deprecated</b>:  Automatic conversion of false to array is deprecated in <b>/home/garrens3/public_html/blog/wp-content/plugins/loginizer/init.php</b> on line <b>250</b><br />
<br />
<b>Deprecated</b>:  Automatic conversion of false to array is deprecated in <b>/home/garrens3/public_html/blog/wp-content/plugins/loginizer/init.php</b> on line <b>265</b><br />
<br />
<b>Deprecated</b>:  Creation of dynamic property WP_Block_Type::$skip_inner_blocks is deprecated in <b>/home/garrens3/public_html/blog/wp-includes/class-wp-block-type.php</b> on line <b>391</b><br />
<br />
<b>Deprecated</b>:  Creation of dynamic property WP_Block_Type::$skip_inner_blocks is deprecated in <b>/home/garrens3/public_html/blog/wp-includes/class-wp-block-type.php</b> on line <b>391</b><br />
{"id":114,"date":"2017-04-08T19:21:34","date_gmt":"2017-04-09T03:21:34","guid":{"rendered":"http:\/\/garrens.com\/blog\/?p=114"},"modified":"2018-03-02T20:52:13","modified_gmt":"2018-03-03T04:52:13","slug":"getting-started-and-tips-for-using-apache-parquet-with-apache-spark-2-x","status":"publish","type":"post","link":"https:\/\/garrens.com\/blog\/2017\/04\/08\/getting-started-and-tips-for-using-apache-parquet-with-apache-spark-2-x\/","title":{"rendered":"Tips for using Apache Parquet with Spark 2.x"},"content":{"rendered":"<p><strong>What is Apache Parquet?<\/strong><br \/>\nIt is a compressable binary columnar data format used in the hadoop ecosystem. We&#8217;ll talk about it primarily with relation to the Hadoop Distributed File System (HDFS) and Spark 2.x contexts.<\/p>\n<p><strong>What role does it fill?<\/strong><br \/>\nIt is a fast and efficient data format great for scalable big data analytics. <\/p>\n<p><strong>Optimization Tips<\/strong><\/p>\n<li>Aim for around 1GB parquet output files, but experiment with other sizes for your use case and cluster setup (<a href=\"https:\/\/forums.databricks.com\/questions\/101\/what-is-an-optimal-size-for-file-partitions-using.html\">source<\/a>) <\/li>\n<li>Ideally store on HDFS in file sizes of at least the HDFS block size (default 128MB)<\/li>\n<li>Storing Parquet files on S3 is also possible (side note: use amazon athena, which charges based on data read if you want Presto SQL-like queries on demand at low cost)<\/li>\n<li>Use snappy compression if storage space is not a concern due to it being splittable, but for what should be a relatively small performance hit but much better compression, use gzip (<a href=\"http:\/\/boristyukin.com\/is-snappy-compressed-parquet-file-splittable\/\">source<\/a>)<\/li>\n<p><!--more--><\/p>\n<p><strong>More information: <\/strong><br \/>\n<a href=\"https:\/\/parquet.apache.org\/\">https:\/\/parquet.apache.org\/<\/a><br \/>\n<a href=\"http:\/\/stackoverflow.com\/questions\/42918663\/is-it-better-to-have-one-large-parquet-file-or-lots-of-smaller-parquet-files\/42919015#42919015\">http:\/\/stackoverflow.com\/questions\/42918663\/is-it-better-to-have-one-large-parquet-file-or-lots-of-smaller-parquet-files\/42919015#42919015<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>What is Apache Parquet? It is a compressable binary columnar data format used in the hadoop ecosystem. We&#8217;ll talk about it primarily with relation to the Hadoop Distributed File System (HDFS) and Spark 2.x contexts. What role does it fill? It is a fast and efficient data format great for scalable big data analytics. Optimization&hellip; <a href=\"https:\/\/garrens.com\/blog\/2017\/04\/08\/getting-started-and-tips-for-using-apache-parquet-with-apache-spark-2-x\/\" title=\"Read More\" class=\"read-more\">Continue reading<span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[22],"tags":[19,2],"_links":{"self":[{"href":"https:\/\/garrens.com\/blog\/wp-json\/wp\/v2\/posts\/114"}],"collection":[{"href":"https:\/\/garrens.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/garrens.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/garrens.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/garrens.com\/blog\/wp-json\/wp\/v2\/comments?post=114"}],"version-history":[{"count":11,"href":"https:\/\/garrens.com\/blog\/wp-json\/wp\/v2\/posts\/114\/revisions"}],"predecessor-version":[{"id":125,"href":"https:\/\/garrens.com\/blog\/wp-json\/wp\/v2\/posts\/114\/revisions\/125"}],"wp:attachment":[{"href":"https:\/\/garrens.com\/blog\/wp-json\/wp\/v2\/media?parent=114"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/garrens.com\/blog\/wp-json\/wp\/v2\/categories?post=114"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/garrens.com\/blog\/wp-json\/wp\/v2\/tags?post=114"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}