<br />
<b>Deprecated</b>:  Creation of dynamic property wpdb::$categories is deprecated in <b>/home/garrens3/public_html/blog/wp-includes/wp-db.php</b> on line <b>760</b><br />
<br />
<b>Deprecated</b>:  Creation of dynamic property wpdb::$post2cat is deprecated in <b>/home/garrens3/public_html/blog/wp-includes/wp-db.php</b> on line <b>760</b><br />
<br />
<b>Deprecated</b>:  Creation of dynamic property wpdb::$link2cat is deprecated in <b>/home/garrens3/public_html/blog/wp-includes/wp-db.php</b> on line <b>760</b><br />
<br />
<b>Deprecated</b>:  Using ${var} in strings is deprecated, use {$var} instead in <b>/home/garrens3/public_html/blog/wp-includes/comment-template.php</b> on line <b>1747</b><br />
<br />
<b>Deprecated</b>:  Optional parameter $term_id declared before required parameter $meta_value is implicitly treated as a required parameter in <b>/home/garrens3/public_html/blog/wp-content/plugins/advanced-code-editor/advanced-code-editor.php</b> on line <b>1927</b><br />
<br />
<b>Deprecated</b>:  Optional parameter $term_id declared before required parameter $meta_value is implicitly treated as a required parameter in <b>/home/garrens3/public_html/blog/wp-content/plugins/advanced-code-editor/advanced-code-editor.php</b> on line <b>1941</b><br />
<br />
<b>Deprecated</b>:  Optional parameter $term_id declared before required parameter $meta_key is implicitly treated as a required parameter in <b>/home/garrens3/public_html/blog/wp-content/plugins/advanced-code-editor/advanced-code-editor.php</b> on line <b>1956</b><br />
<br />
<b>Deprecated</b>:  Optional parameter $term_id declared before required parameter $key is implicitly treated as a required parameter in <b>/home/garrens3/public_html/blog/wp-content/plugins/advanced-code-editor/advanced-code-editor.php</b> on line <b>1970</b><br />
<br />
<b>Deprecated</b>:  Automatic conversion of false to array is deprecated in <b>/home/garrens3/public_html/blog/wp-content/plugins/loginizer/init.php</b> on line <b>250</b><br />
<br />
<b>Deprecated</b>:  Automatic conversion of false to array is deprecated in <b>/home/garrens3/public_html/blog/wp-content/plugins/loginizer/init.php</b> on line <b>265</b><br />
<br />
<b>Deprecated</b>:  Creation of dynamic property WP_Block_Type::$skip_inner_blocks is deprecated in <b>/home/garrens3/public_html/blog/wp-includes/class-wp-block-type.php</b> on line <b>391</b><br />
<br />
<b>Deprecated</b>:  Creation of dynamic property WP_Block_Type::$skip_inner_blocks is deprecated in <b>/home/garrens3/public_html/blog/wp-includes/class-wp-block-type.php</b> on line <b>391</b><br />
{"id":248,"date":"2018-06-05T14:02:45","date_gmt":"2018-06-05T22:02:45","guid":{"rendered":"http:\/\/garrens.com\/blog\/?p=248"},"modified":"2020-01-24T08:29:54","modified_gmt":"2020-01-24T16:29:54","slug":"avoiding-performance-potholes-scaling-python-for-data-science-using-spark-spark-ai-summit","status":"publish","type":"post","link":"https:\/\/garrens.com\/blog\/2018\/06\/05\/avoiding-performance-potholes-scaling-python-for-data-science-using-spark-spark-ai-summit\/","title":{"rendered":"Avoiding Performance Potholes: Scaling Python for Data Science using Spark @ Spark + AI Summit"},"content":{"rendered":"<div class=\"session-content\">\n<p>Python is the de facto language of data science and engineering, which affords it an outsized community of users. However, when many data scientists and engineers come to Spark with a Python background, unexpected performance potholes can stand in the way of progress. These \u201cPerformance Potholes\u201d include PySpark\u2019s ease of integration with existing packages (e.g. Pandas, SciPy, Scikit Learn, etc), using Python UDFs, and utilizing the RDD APIs instead of Spark SQL DataFrames without understanding the implications. Additionally, Spark 2.3 changes the game even further with vectorized UDFs. In this talk, we will discuss:<\/p>\n<p>\u2013 How PySpark works broadly (&amp; why it matters)<br \/>\n\u2013 Integrating popular Python packages with Spark<br \/>\n\u2013 Python UDFs (how to [not] use them)<br \/>\n\u2013 RDDs vs Spark SQL DataFrames<br \/>\n\u2013 Spark 2.3 Vectorized UDFs<\/p>\n<p>Session hashtag: <a href=\"https:\/\/twitter.com\/search?q=%23Py9SAIS&amp;src=typd\">#Py9SAIS<\/a><\/p>\n<p>Download full slides <a href=\"http:\/\/garrens.com\/files\/Performance%20Potholes.pdf\">here<\/a><\/p>\n<p><a href=\"https:\/\/databricks.com\/session\/avoiding-performance-potholes-scaling-python-for-data-science-using-apche-spark\">Spark + AI Summit session page with video<\/a><\/p>\n<\/div>\n<blockquote class=\"wp-embedded-content\" data-secret=\"G68UbyOH8T\"><p><a href=\"http:\/\/garrens.com\/blog\/2018\/01\/06\/scaling-python-for-data-science-using-spark\/\">Scaling Python for Data Science using Spark<\/a><\/p><\/blockquote>\n<p><iframe title=\"&#8220;Scaling Python for Data Science using Spark&#8221; &#8212; Garren&#039;s [Big] Data Blog\" class=\"wp-embedded-content\" sandbox=\"allow-scripts\" security=\"restricted\" style=\"position: absolute; clip: rect(1px, 1px, 1px, 1px);\" src=\"http:\/\/garrens.com\/blog\/2018\/01\/06\/scaling-python-for-data-science-using-spark\/embed\/#?secret=G68UbyOH8T\" data-secret=\"G68UbyOH8T\" width=\"600\" height=\"338\" frameborder=\"0\" marginwidth=\"0\" marginheight=\"0\" scrolling=\"no\"><\/iframe><\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Python is the de facto language of data science and engineering, which affords it an outsized community of users. However, when many data scientists and engineers come to Spark with a Python background, unexpected performance potholes can stand in the way of progress. These \u201cPerformance Potholes\u201d include PySpark\u2019s ease of integration with existing packages (e.g.&hellip; <a href=\"https:\/\/garrens.com\/blog\/2018\/06\/05\/avoiding-performance-potholes-scaling-python-for-data-science-using-spark-spark-ai-summit\/\" title=\"Read More\" class=\"read-more\">Continue reading<span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":true,"template":"","format":"standard","meta":[],"categories":[1],"tags":[],"_links":{"self":[{"href":"https:\/\/garrens.com\/blog\/wp-json\/wp\/v2\/posts\/248"}],"collection":[{"href":"https:\/\/garrens.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/garrens.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/garrens.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/garrens.com\/blog\/wp-json\/wp\/v2\/comments?post=248"}],"version-history":[{"count":2,"href":"https:\/\/garrens.com\/blog\/wp-json\/wp\/v2\/posts\/248\/revisions"}],"predecessor-version":[{"id":253,"href":"https:\/\/garrens.com\/blog\/wp-json\/wp\/v2\/posts\/248\/revisions\/253"}],"wp:attachment":[{"href":"https:\/\/garrens.com\/blog\/wp-json\/wp\/v2\/media?parent=248"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/garrens.com\/blog\/wp-json\/wp\/v2\/categories?post=248"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/garrens.com\/blog\/wp-json\/wp\/v2\/tags?post=248"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}