January 2018 – Garren's [Big] Data Blog

Intro to PySpark Workshop 2018-01-24

Posted by Garren on 2018/01/24

In this Intro to PySpark Workshop, there are five main points: About Apache Spark Sample PySpark Application walkthrough with explanations Custom built Jupyter Azure Notebook to interactively demonstrate fundamental PySpark concepts Python-specific Spark advice Curated resources to learn more Slides PDF Version: Intro to PySpark Workshop Q&A Options: Twitter: #PySparkWorkshop Sample app from pyspark.sql import… Continue reading→

Apache Spark IPython Notebook, Jupyter, PySpark, Python, spark, Workshop Leave a Comment

Scaling Python for Data Science using Spark

Posted by Garren on 2018/01/06

Python is the de facto language of Data Science & Engineering. (IMHO R is grand for statisticians, but Python is for the rest of us.) As a prominent language in the field, it only makes sense that Apache Spark supports it with Python specific APIs. Spark makes it so easy to use Python that it… Continue reading→

Apache Spark Best Practices, Data Science, PySpark, Python, spark 1 Comment