June 2018 – Garren's [Big] Data Blog

Avoiding Performance Potholes: Scaling Python for Data Science using Spark @ Spark + AI Summit

Posted by Garren on 2018/06/05

Python is the de facto language of data science and engineering, which affords it an outsized community of users. However, when many data scientists and engineers come to Spark with a Python background, unexpected performance potholes can stand in the way of progress. These “Performance Potholes” include PySpark’s ease of integration with existing packages (e.g.… Continue reading→

Default Leave a Comment