Deprecated: Creation of dynamic property wpdb::$categories is deprecated in /home/garrens3/public_html/blog/wp-includes/wp-db.php on line 760

Deprecated: Creation of dynamic property wpdb::$post2cat is deprecated in /home/garrens3/public_html/blog/wp-includes/wp-db.php on line 760

Deprecated: Creation of dynamic property wpdb::$link2cat is deprecated in /home/garrens3/public_html/blog/wp-includes/wp-db.php on line 760

Deprecated: Using ${var} in strings is deprecated, use {$var} instead in /home/garrens3/public_html/blog/wp-includes/comment-template.php on line 1747

Deprecated: Optional parameter $term_id declared before required parameter $meta_value is implicitly treated as a required parameter in /home/garrens3/public_html/blog/wp-content/plugins/advanced-code-editor/advanced-code-editor.php on line 1927

Deprecated: Optional parameter $term_id declared before required parameter $meta_value is implicitly treated as a required parameter in /home/garrens3/public_html/blog/wp-content/plugins/advanced-code-editor/advanced-code-editor.php on line 1941

Deprecated: Optional parameter $term_id declared before required parameter $meta_key is implicitly treated as a required parameter in /home/garrens3/public_html/blog/wp-content/plugins/advanced-code-editor/advanced-code-editor.php on line 1956

Deprecated: Optional parameter $term_id declared before required parameter $key is implicitly treated as a required parameter in /home/garrens3/public_html/blog/wp-content/plugins/advanced-code-editor/advanced-code-editor.php on line 1970

Deprecated: Automatic conversion of false to array is deprecated in /home/garrens3/public_html/blog/wp-content/plugins/loginizer/init.php on line 250

Deprecated: Automatic conversion of false to array is deprecated in /home/garrens3/public_html/blog/wp-content/plugins/loginizer/init.php on line 265

Deprecated: Creation of dynamic property WP_Block_Type::$skip_inner_blocks is deprecated in /home/garrens3/public_html/blog/wp-includes/class-wp-block-type.php on line 391

Deprecated: Creation of dynamic property WP_Block_Type::$skip_inner_blocks is deprecated in /home/garrens3/public_html/blog/wp-includes/class-wp-block-type.php on line 391

Deprecated: Return type of Requests_Cookie_Jar::offsetExists($key) should either be compatible with ArrayAccess::offsetExists(mixed $offset): bool, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in /home/garrens3/public_html/blog/wp-includes/Requests/Cookie/Jar.php on line 63

Deprecated: Return type of Requests_Cookie_Jar::offsetGet($key) should either be compatible with ArrayAccess::offsetGet(mixed $offset): mixed, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in /home/garrens3/public_html/blog/wp-includes/Requests/Cookie/Jar.php on line 73

Deprecated: Return type of Requests_Cookie_Jar::offsetSet($key, $value) should either be compatible with ArrayAccess::offsetSet(mixed $offset, mixed $value): void, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in /home/garrens3/public_html/blog/wp-includes/Requests/Cookie/Jar.php on line 89

Deprecated: Return type of Requests_Cookie_Jar::offsetUnset($key) should either be compatible with ArrayAccess::offsetUnset(mixed $offset): void, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in /home/garrens3/public_html/blog/wp-includes/Requests/Cookie/Jar.php on line 102

Deprecated: Return type of Requests_Cookie_Jar::getIterator() should either be compatible with IteratorAggregate::getIterator(): Traversable, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in /home/garrens3/public_html/blog/wp-includes/Requests/Cookie/Jar.php on line 111

Deprecated: Return type of Requests_Utility_CaseInsensitiveDictionary::offsetExists($key) should either be compatible with ArrayAccess::offsetExists(mixed $offset): bool, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in /home/garrens3/public_html/blog/wp-includes/Requests/Utility/CaseInsensitiveDictionary.php on line 40

Deprecated: Return type of Requests_Utility_CaseInsensitiveDictionary::offsetGet($key) should either be compatible with ArrayAccess::offsetGet(mixed $offset): mixed, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in /home/garrens3/public_html/blog/wp-includes/Requests/Utility/CaseInsensitiveDictionary.php on line 51

Deprecated: Return type of Requests_Utility_CaseInsensitiveDictionary::offsetSet($key, $value) should either be compatible with ArrayAccess::offsetSet(mixed $offset, mixed $value): void, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in /home/garrens3/public_html/blog/wp-includes/Requests/Utility/CaseInsensitiveDictionary.php on line 68

Deprecated: Return type of Requests_Utility_CaseInsensitiveDictionary::offsetUnset($key) should either be compatible with ArrayAccess::offsetUnset(mixed $offset): void, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in /home/garrens3/public_html/blog/wp-includes/Requests/Utility/CaseInsensitiveDictionary.php on line 82

Deprecated: Return type of Requests_Utility_CaseInsensitiveDictionary::getIterator() should either be compatible with IteratorAggregate::getIterator(): Traversable, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in /home/garrens3/public_html/blog/wp-includes/Requests/Utility/CaseInsensitiveDictionary.php on line 91

Deprecated: Creation of dynamic property WP_Term::$object_id is deprecated in /home/garrens3/public_html/blog/wp-includes/class-wp-term-query.php on line 1118

Deprecated: Creation of dynamic property WP_Term::$object_id is deprecated in /home/garrens3/public_html/blog/wp-includes/class-wp-term-query.php on line 1118

Deprecated: Creation of dynamic property WP_Term::$object_id is deprecated in /home/garrens3/public_html/blog/wp-includes/class-wp-term-query.php on line 1118

Deprecated: Creation of dynamic property WP_Term::$object_id is deprecated in /home/garrens3/public_html/blog/wp-includes/class-wp-term-query.php on line 1118

Deprecated: Creation of dynamic property WP_Term::$object_id is deprecated in /home/garrens3/public_html/blog/wp-includes/class-wp-term-query.php on line 1118

Deprecated: Creation of dynamic property WP_Term::$object_id is deprecated in /home/garrens3/public_html/blog/wp-includes/class-wp-term-query.php on line 1118

Deprecated: Creation of dynamic property WP_Term::$object_id is deprecated in /home/garrens3/public_html/blog/wp-includes/class-wp-term-query.php on line 1118

Deprecated: Creation of dynamic property WP_Term::$object_id is deprecated in /home/garrens3/public_html/blog/wp-includes/class-wp-term-query.php on line 1118

Deprecated: Creation of dynamic property WP_Term::$object_id is deprecated in /home/garrens3/public_html/blog/wp-includes/class-wp-term-query.php on line 1118

Deprecated: Creation of dynamic property WP_Term::$object_id is deprecated in /home/garrens3/public_html/blog/wp-includes/class-wp-term-query.php on line 1118

Deprecated: Creation of dynamic property WP_Term::$object_id is deprecated in /home/garrens3/public_html/blog/wp-includes/class-wp-term-query.php on line 1118

Deprecated: Creation of dynamic property WP_Term::$object_id is deprecated in /home/garrens3/public_html/blog/wp-includes/class-wp-term-query.php on line 1118

Deprecated: Creation of dynamic property WP_Term::$object_id is deprecated in /home/garrens3/public_html/blog/wp-includes/class-wp-term-query.php on line 1118

Warning: Cannot modify header information - headers already sent by (output started at /home/garrens3/public_html/blog/wp-includes/Requests/Cookie/Jar.php:15) in /home/garrens3/public_html/blog/wp-includes/feed-rss2-comments.php on line 8
Comments for Garren's [Big] Data Blog https://garrens.com/blog Discussing data, big, small and in between Sat, 11 Apr 2026 04:48:42 +0000 hourly 1 https://wordpress.org/?v=6.0.11 Comment on Spark File Format Showdown – CSV vs JSON vs Parquet by on cloud running shoes women's https://garrens.com/blog/2017/10/09/spark-file-format-showdown-csv-vs-json-vs-parquet/#comment-261 Sat, 11 Apr 2026 04:48:42 +0000 http://garrens.com/blog/?p=169#comment-261 Great effort, well appreciated.

]]>
Comment on Big data [Spark] and its small files problem by Mohammad https://garrens.com/blog/2017/11/04/big-data-spark-and-its-small-files-problem/#comment-254 Tue, 08 Dec 2020 19:23:38 +0000 http://garrens.com/blog/?p=181#comment-254 In reply to Garren.

Oh do you mean to simulate a stream? That’s a pretty cool idea. My approach, which worked, is a little different. I used python’s zipfile module. Since I had the files stored locally I was able to use python to append json files together to make bigger files of at least 128MB and at most 1GB. So far it seems to have worked. I went from having ~2M mini json files to 28 appropriately-sized json files (~300MB each). Your blog was super helpful, thanks!

]]>
Comment on Big data [Spark] and its small files problem by Garren https://garrens.com/blog/2017/11/04/big-data-spark-and-its-small-files-problem/#comment-253 Tue, 08 Dec 2020 00:46:26 +0000 http://garrens.com/blog/?p=181#comment-253 In reply to Mohammad.

Mohammad, if you already have an army of tiny files, you can do a batch conversion to parquet by reading in chunks of the data set (e.g. 1 hour partition at a time of a 24 hour day)

]]>
Comment on Big data [Spark] and its small files problem by Mohammad https://garrens.com/blog/2017/11/04/big-data-spark-and-its-small-files-problem/#comment-252 Mon, 07 Dec 2020 23:10:10 +0000 http://garrens.com/blog/?p=181#comment-252 What if you already have an army of tiny json files (~10KB) in S3. Is there a way to preliminarily convert those files to parquet in S3?

]]>
Comment on Converting CSVs with Headers to AVRO by Garren https://garrens.com/blog/2015/03/21/converting-csvs-with-headers-to-avro/#comment-251 Wed, 17 Jun 2020 02:49:07 +0000 http://garrens.com/blog/?p=87#comment-251 In reply to Manpreet Shuann.

Probably related to your python version (e.g. 2.7 versus 3.6). Hopefully you got it resolved.

]]>