Big data [Spark] and its small files problem
Often we log data in JSON, CSV or other text format to Amazon’s S3 as compressed files. This pattern is a) accessible and b) infinitely scalable by nature of being in S3 as common text files. However, there are some subtle but critical caveats that come with this pattern that can cause quite a bit… Continue reading
Deprecated: Creation of dynamic property WP_Term::$cat_ID is deprecated in /home/garrens3/public_html/blog/wp-includes/category.php on line 378
Deprecated: Creation of dynamic property WP_Term::$category_count is deprecated in /home/garrens3/public_html/blog/wp-includes/category.php on line 379
Deprecated: Creation of dynamic property WP_Term::$category_description is deprecated in /home/garrens3/public_html/blog/wp-includes/category.php on line 380
Deprecated: Creation of dynamic property WP_Term::$cat_name is deprecated in /home/garrens3/public_html/blog/wp-includes/category.php on line 381
Deprecated: Creation of dynamic property WP_Term::$category_nicename is deprecated in /home/garrens3/public_html/blog/wp-includes/category.php on line 382
Deprecated: Creation of dynamic property WP_Term::$category_parent is deprecated in /home/garrens3/public_html/blog/wp-includes/category.php on line 383
Categories
Apache Spark
6 Comments on Big data [Spark] and its small files problem