spark – Page 2 – Garren's [Big] Data Blog

Spark File Format Showdown – CSV vs JSON vs Parquet

Posted by Garren on 2017/10/09

Apache Spark supports many different data sources, such as the ubiquitous Comma Separated Value (CSV) format and web API friendly JavaScript Object Notation (JSON) format. A common format used primarily for big data analytical purposes is Apache Parquet. Parquet is a fast columnar data format that you can read more about in two of my… Continue reading→

Deprecated: Creation of dynamic property WP_Term::$cat_ID is deprecated in /home/garrens3/public_html/blog/wp-includes/category.php on line 378

Deprecated: Creation of dynamic property WP_Term::$category_count is deprecated in /home/garrens3/public_html/blog/wp-includes/category.php on line 379

Deprecated: Creation of dynamic property WP_Term::$category_description is deprecated in /home/garrens3/public_html/blog/wp-includes/category.php on line 380

Deprecated: Creation of dynamic property WP_Term::$cat_name is deprecated in /home/garrens3/public_html/blog/wp-includes/category.php on line 381

Deprecated: Creation of dynamic property WP_Term::$category_nicename is deprecated in /home/garrens3/public_html/blog/wp-includes/category.php on line 382

Deprecated: Creation of dynamic property WP_Term::$category_parent is deprecated in /home/garrens3/public_html/blog/wp-includes/category.php on line 383
Apache Spark Best Practices, CSV, JSON, Parquet, s3, spark 1 Comment

Using Spark Efficiently | Understanding Spark Event 7/29/17

Posted by Garren on 2017/07/29

This page is dedicated to resources related to the 7/29/17 Understanding Spark event presentation in Bellevue, WA. Slides Great [FREE!] resources on all things Spark: https://jaceklaskowski.gitbooks.io/mastering-apache-spark/ https://spark.apache.org/docs/latest/sql-programming-guide.html Databricks was founded by the original creators of Spark and is currently the largest contributor to Apache Spark. As such, they are a phenomenal resource for information and… Continue reading→

Deprecated: Creation of dynamic property WP_Term::$cat_ID is deprecated in /home/garrens3/public_html/blog/wp-includes/category.php on line 378

Deprecated: Creation of dynamic property WP_Term::$category_count is deprecated in /home/garrens3/public_html/blog/wp-includes/category.php on line 379

Deprecated: Creation of dynamic property WP_Term::$category_description is deprecated in /home/garrens3/public_html/blog/wp-includes/category.php on line 380

Deprecated: Creation of dynamic property WP_Term::$cat_name is deprecated in /home/garrens3/public_html/blog/wp-includes/category.php on line 381

Deprecated: Creation of dynamic property WP_Term::$category_nicename is deprecated in /home/garrens3/public_html/blog/wp-includes/category.php on line 382

Deprecated: Creation of dynamic property WP_Term::$category_parent is deprecated in /home/garrens3/public_html/blog/wp-includes/category.php on line 383
Apache Spark Best Practices, spark 1 Comment

Switching between Scala and Python on Spark tips

Posted by Garren on 2017/06/27

Switching between Scala and Python on Spark is relatively straightforward, but there are a few differences that can cause some minor frustration. Here are some of the little things I’ve run into and how to adjust for them. PySpark Shell does not support code completion (autocomplete) by default. Why? PySpark uses the basic Python interpreter… Continue reading→

Deprecated: Creation of dynamic property WP_Term::$cat_ID is deprecated in /home/garrens3/public_html/blog/wp-includes/category.php on line 378

Deprecated: Creation of dynamic property WP_Term::$category_count is deprecated in /home/garrens3/public_html/blog/wp-includes/category.php on line 379

Deprecated: Creation of dynamic property WP_Term::$category_description is deprecated in /home/garrens3/public_html/blog/wp-includes/category.php on line 380

Deprecated: Creation of dynamic property WP_Term::$cat_name is deprecated in /home/garrens3/public_html/blog/wp-includes/category.php on line 381

Deprecated: Creation of dynamic property WP_Term::$category_nicename is deprecated in /home/garrens3/public_html/blog/wp-includes/category.php on line 382

Deprecated: Creation of dynamic property WP_Term::$category_parent is deprecated in /home/garrens3/public_html/blog/wp-includes/category.php on line 383
Apache Spark Best Practices, Python, Scala, spark Leave a Comment

Real Time Big Data analytics: Parquet (and Spark) + bonus

Posted by Garren on 2017/06/26

Apache Spark and Parquet (SParquet) are a match made in scalable data analytics and delivery heaven. Spark brings a wide ranging, powerful computing platform to the equation while Parquet offers a data format that is purpose-built for high-speed big data analytics. If this sounds like fluffy marketing talk, resist the temptation to close this tab,… Continue reading→

Deprecated: Creation of dynamic property WP_Term::$cat_ID is deprecated in /home/garrens3/public_html/blog/wp-includes/category.php on line 378

Deprecated: Creation of dynamic property WP_Term::$category_count is deprecated in /home/garrens3/public_html/blog/wp-includes/category.php on line 379

Deprecated: Creation of dynamic property WP_Term::$category_description is deprecated in /home/garrens3/public_html/blog/wp-includes/category.php on line 380

Deprecated: Creation of dynamic property WP_Term::$cat_name is deprecated in /home/garrens3/public_html/blog/wp-includes/category.php on line 381

Deprecated: Creation of dynamic property WP_Term::$category_nicename is deprecated in /home/garrens3/public_html/blog/wp-includes/category.php on line 382

Deprecated: Creation of dynamic property WP_Term::$category_parent is deprecated in /home/garrens3/public_html/blog/wp-includes/category.php on line 383
Apache Spark aws, Best Practices, Cloudera, Impala, Parquet, spark 3 Comments