Garren's [Big] Data Blog – Page 8 – Discussing data, big, small and in between

Pseudo-Normalized Database Engine Concept

Posted by Garren on 2014/12/14

Currently in a Relational Database such as MySQL, Oracle, SQL Server, etc, the two most common schools of thought are Normalized vs Denormalized database designs. Essentially, Normalized Database design entails grouping similar dimensions into a single table, such as the ephemeral orders, customers, and products tables. Normalized design might have an orders table with order_id,… Continue reading→

Deprecated: Creation of dynamic property WP_Term::$cat_ID is deprecated in /home/garrens3/public_html/blog/wp-includes/category.php on line 378

Deprecated: Creation of dynamic property WP_Term::$category_count is deprecated in /home/garrens3/public_html/blog/wp-includes/category.php on line 379

Deprecated: Creation of dynamic property WP_Term::$category_description is deprecated in /home/garrens3/public_html/blog/wp-includes/category.php on line 380

Deprecated: Creation of dynamic property WP_Term::$cat_name is deprecated in /home/garrens3/public_html/blog/wp-includes/category.php on line 381

Deprecated: Creation of dynamic property WP_Term::$category_nicename is deprecated in /home/garrens3/public_html/blog/wp-includes/category.php on line 382

Deprecated: Creation of dynamic property WP_Term::$category_parent is deprecated in /home/garrens3/public_html/blog/wp-includes/category.php on line 383
Default Leave a Comment

Apache Pig chokes on many small files

Posted by Garren on 2014/12/08

I had the displeasure of using multiple versions of Apache Pig (0.9, 0.11, 0.12, 0.13 and 0.14) in different capacities. Why was it so unpleasant you ask? My scripts were running quickly and efficiently on Pig 0.9.2. I was using globs in my LOAD statement (e.g. “a = LOAD ‘/files/*/*type_v4*.lzo”) to find tens to hundreds… Continue reading→

Deprecated: Creation of dynamic property WP_Term::$cat_ID is deprecated in /home/garrens3/public_html/blog/wp-includes/category.php on line 378

Deprecated: Creation of dynamic property WP_Term::$category_count is deprecated in /home/garrens3/public_html/blog/wp-includes/category.php on line 379

Deprecated: Creation of dynamic property WP_Term::$category_description is deprecated in /home/garrens3/public_html/blog/wp-includes/category.php on line 380

Deprecated: Creation of dynamic property WP_Term::$cat_name is deprecated in /home/garrens3/public_html/blog/wp-includes/category.php on line 381

Deprecated: Creation of dynamic property WP_Term::$category_nicename is deprecated in /home/garrens3/public_html/blog/wp-includes/category.php on line 382

Deprecated: Creation of dynamic property WP_Term::$category_parent is deprecated in /home/garrens3/public_html/blog/wp-includes/category.php on line 383
Default Leave a Comment

Split metadata size exceeded 10000000

Posted by Garren on 2014/12/08

java.io.IOException: Split metadata size exceeded 10000000. was the error I got when trying to process ~20TB of highly compressed logs (~100TB uncompressed) on my 64 node Amazon EMR cluster. Naturally I found some good resources recommending a quick file by modifying the mapred-site.xml file in /home/hadoop/conf/ Warning: By setting this configuration to -1, you are… Continue reading→

Deprecated: Creation of dynamic property WP_Term::$cat_ID is deprecated in /home/garrens3/public_html/blog/wp-includes/category.php on line 378

Deprecated: Creation of dynamic property WP_Term::$category_count is deprecated in /home/garrens3/public_html/blog/wp-includes/category.php on line 379

Deprecated: Creation of dynamic property WP_Term::$category_description is deprecated in /home/garrens3/public_html/blog/wp-includes/category.php on line 380

Deprecated: Creation of dynamic property WP_Term::$cat_name is deprecated in /home/garrens3/public_html/blog/wp-includes/category.php on line 381

Deprecated: Creation of dynamic property WP_Term::$category_nicename is deprecated in /home/garrens3/public_html/blog/wp-includes/category.php on line 382

Deprecated: Creation of dynamic property WP_Term::$category_parent is deprecated in /home/garrens3/public_html/blog/wp-includes/category.php on line 383
Default Leave a Comment

Now using S4CMD

Posted by Garren on 2014/12/08

S3CMD’s distinct lack of multi-threading led me to hunt for alternatives. While I tried many alternatives, such as s3-multipart (great when I did use it), s3funnel and s3cp among others, none quite fit the bill of supporting the key features I found important. 1) Listing/Downloading/Uploading/etc of files and “folders” 2) Multi-threaded 3) Synchronization handled so… Continue reading→

Deprecated: Creation of dynamic property WP_Term::$cat_ID is deprecated in /home/garrens3/public_html/blog/wp-includes/category.php on line 378

Deprecated: Creation of dynamic property WP_Term::$category_count is deprecated in /home/garrens3/public_html/blog/wp-includes/category.php on line 379

Deprecated: Creation of dynamic property WP_Term::$category_description is deprecated in /home/garrens3/public_html/blog/wp-includes/category.php on line 380

Deprecated: Creation of dynamic property WP_Term::$cat_name is deprecated in /home/garrens3/public_html/blog/wp-includes/category.php on line 381

Deprecated: Creation of dynamic property WP_Term::$category_nicename is deprecated in /home/garrens3/public_html/blog/wp-includes/category.php on line 382

Deprecated: Creation of dynamic property WP_Term::$category_parent is deprecated in /home/garrens3/public_html/blog/wp-includes/category.php on line 383
Default 2 Comments