Avoiding Performance Potholes: Scaling Python for Data Science using Spark @ Spark + AI Summit

Python is the de facto language of data science and engineering, which affords it an outsized community of users. However, when many data scientists and engineers come to Spark with a Python background, unexpected performance potholes can stand in the way of progress. These “Performance Potholes” include PySpark’s ease of integration with existing packages (e.g.… Continue reading