<br />
<b>Deprecated</b>:  Creation of dynamic property wpdb::$categories is deprecated in <b>/home/garrens3/public_html/blog/wp-includes/wp-db.php</b> on line <b>760</b><br />
<br />
<b>Deprecated</b>:  Creation of dynamic property wpdb::$post2cat is deprecated in <b>/home/garrens3/public_html/blog/wp-includes/wp-db.php</b> on line <b>760</b><br />
<br />
<b>Deprecated</b>:  Creation of dynamic property wpdb::$link2cat is deprecated in <b>/home/garrens3/public_html/blog/wp-includes/wp-db.php</b> on line <b>760</b><br />
<br />
<b>Deprecated</b>:  Automatic conversion of false to array is deprecated in <b>/home/garrens3/public_html/blog/wp-content/plugins/loginizer/init.php</b> on line <b>250</b><br />
<br />
<b>Deprecated</b>:  Automatic conversion of false to array is deprecated in <b>/home/garrens3/public_html/blog/wp-content/plugins/loginizer/init.php</b> on line <b>265</b><br />
<br />
<b>Deprecated</b>:  Creation of dynamic property WP_Block_Type::$skip_inner_blocks is deprecated in <b>/home/garrens3/public_html/blog/wp-includes/class-wp-block-type.php</b> on line <b>391</b><br />
<br />
<b>Deprecated</b>:  Creation of dynamic property WP_Block_Type::$skip_inner_blocks is deprecated in <b>/home/garrens3/public_html/blog/wp-includes/class-wp-block-type.php</b> on line <b>391</b><br />
{"id":150,"date":"2017-06-27T10:36:12","date_gmt":"2017-06-27T18:36:12","guid":{"rendered":"http:\/\/garrens.com\/blog\/?p=150"},"modified":"2018-03-02T20:51:20","modified_gmt":"2018-03-03T04:51:20","slug":"switching-between-scala-and-python-on-spark-tips","status":"publish","type":"post","link":"https:\/\/garrens.com\/blog\/2017\/06\/27\/switching-between-scala-and-python-on-spark-tips\/","title":{"rendered":"Switching between Scala and Python on Spark tips"},"content":{"rendered":"<p>Switching between Scala and Python on Spark is relatively straightforward, but there are a few differences that can cause some minor frustration. Here are some of the little things I&#8217;ve run into and how to adjust for them.<\/p>\n<ul>\n<li>PySpark Shell does <em>not <\/em>support code completion (autocomplete) by default.<\/li>\n<\/ul>\n<p>Why? PySpark uses the basic Python interpreter REPL, so you get the same REPL you&#8217;d get by calling python at the command line.<\/p>\n<p>Fix: Use the iPython REPL by specifying the environment variable<br \/>\nPYSPARK_PYTHON=ipython3 before the pyspark command.<\/p>\n<p>Before:<br \/>\n<code>pyspark<\/code><\/p>\n<p><a href=\"http:\/\/garrens.com\/blog\/wp-content\/uploads\/2017\/06\/Screen-Shot-2017-06-27-at-10.28.06-AM.png\"><img loading=\"lazy\" class=\"alignnone size-medium wp-image-151\" src=\"http:\/\/garrens.com\/blog\/wp-content\/uploads\/2017\/06\/Screen-Shot-2017-06-27-at-10.28.06-AM-300x115.png\" alt=\"\" width=\"300\" height=\"115\" srcset=\"https:\/\/garrens.com\/blog\/wp-content\/uploads\/2017\/06\/Screen-Shot-2017-06-27-at-10.28.06-AM-300x115.png 300w, https:\/\/garrens.com\/blog\/wp-content\/uploads\/2017\/06\/Screen-Shot-2017-06-27-at-10.28.06-AM-768x295.png 768w, https:\/\/garrens.com\/blog\/wp-content\/uploads\/2017\/06\/Screen-Shot-2017-06-27-at-10.28.06-AM.png 900w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p>After:<br \/>\n<code>PYSPARK_PYTHON=ipython3 pyspark<\/code><\/p>\n<p><a href=\"http:\/\/garrens.com\/blog\/wp-content\/uploads\/2017\/06\/Screen-Shot-2017-06-27-at-10.27.23-AM.png\"><img loading=\"lazy\" class=\"alignnone size-medium wp-image-152\" src=\"http:\/\/garrens.com\/blog\/wp-content\/uploads\/2017\/06\/Screen-Shot-2017-06-27-at-10.27.23-AM-300x140.png\" alt=\"\" width=\"300\" height=\"140\" srcset=\"https:\/\/garrens.com\/blog\/wp-content\/uploads\/2017\/06\/Screen-Shot-2017-06-27-at-10.27.23-AM-300x140.png 300w, https:\/\/garrens.com\/blog\/wp-content\/uploads\/2017\/06\/Screen-Shot-2017-06-27-at-10.27.23-AM-768x359.png 768w, https:\/\/garrens.com\/blog\/wp-content\/uploads\/2017\/06\/Screen-Shot-2017-06-27-at-10.27.23-AM-1024x479.png 1024w, https:\/\/garrens.com\/blog\/wp-content\/uploads\/2017\/06\/Screen-Shot-2017-06-27-at-10.27.23-AM.png 1026w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<ul>\n<li>val and var are not python keywords!<\/li>\n<\/ul>\n<p>This is silly, but I catch myself trying to create variables in python regularly with the <code>val df = spark.read...<\/code> style.<\/p>\n<p>Before:<br \/>\n<code>&gt;&gt;&gt; val df = spark.range(100)<br \/>\nFile \"\", line 1<br \/>\nval df = spark.range(100)<br \/>\n^<br \/>\nSyntaxError: invalid syntax<\/code><\/p>\n<p>After:<br \/>\n<code>&gt;&gt;&gt; df = spark.range(100)<\/code><\/p>\n<ul>\n<li>It&#8217;s print not println<\/li>\n<\/ul>\n<p>Just like the val\/var conundrum, println is not a valid keyword in python, but print is!<\/p>\n<p>Before:<br \/>\n<code>In [5]: df.foreach(println)<br \/>\n---------------------------------------------------------------------------<br \/>\nNameError\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Traceback (most recent call last)<br \/>\n&lt;ipython-input-5-3d51e5dc3e2b&gt; in &lt;module&gt;()<br \/>\n----&gt; 1 df.foreach(println)<\/code><\/p>\n<p>NameError: name &#8216;println&#8217; is not defined<\/p>\n<p><code>After:<br \/>\nIn [6]: df.foreach(print)<br \/>\nRow(id=3)<br \/>\nRow(id=4)<br \/>\nRow(id=2)<br \/>\nRow(id=1)<br \/>\nRow(id=0)<\/code><\/p>\n<ul>\n<li>All function calls need parentheses in Python<\/li>\n<\/ul>\n<p>Yep, this is one of those frustrating gifts that just keeps on giving [pain].<\/p>\n<p>Scala:<br \/>\n<code>scala&gt; df.groupBy(\"element\").count.collect.foreach(println)<br \/>\n[bar,1]<br \/>\n[qux,1]<br \/>\n[foo,1]<br \/>\n[baz,1]<\/code><\/p>\n<p>Python<br \/>\nBefore:<br \/>\n<code>In [15]: df.groupBy(\"element\").count().foreach(print)<br \/>\n---------------------------------------------------------------------------<br \/>\nAttributeError                            Traceback (most recent call last)<br \/>\nin ()<br \/>\n----&gt; 1 df.groupBy(\"element\").count.collect.foreach(print)<\/code><\/p>\n<p>AttributeError: &#8216;function&#8217; object has no attribute &#8216;collect&#8217;<\/p>\n<p>After:<br \/>\n<code><\/code><code>In [17]: df = spark.createDataFrame([(1,\"foo\"), (2, \"bar\"), (3, \"baz\"), (4, \"qux\")]).toDF(\"time\", \"element\")<br \/>\nIn [18]: df.groupBy(\"element\").count().foreach(print)<br \/>\nRow(element='bar', count=1)<br \/>\nRow(element='qux', count=1)<br \/>\nRow(element='foo', count=1)<br \/>\nRow(element='baz', count=1)<\/code><\/p>\n<ul>\n<li>Quotes!<\/li>\n<\/ul>\n<p>Python allows both single (&#8216;) quotes and double (&#8220;) quotes for strings. Scala uses the single quote to denote more specific types.<\/p>\n<p>Scala<br \/>\n<code>scala&gt; 'f<br \/>\nres7: Symbol = 'f<\/p>\n<p>scala&gt; 'f'<br \/>\nres6: Char = f<\/p>\n<p>scala&gt; 'foo'<br \/>\n&lt;console&gt;:1: error: unclosed character literal<br \/>\n'foo'<\/p>\n<p>scala> \"foo\" == 'foo'<br \/>\n<console>:1: error: unclosed character literal<br \/>\n\"foo\" == 'foo'<\/code><\/p>\n<p>Python<br \/>\n<code><br \/>\nIn [19]: \"foo\" == 'foo'<br \/>\nOut[19]: True<br \/>\n<\/code><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Switching between Scala and Python on Spark is relatively straightforward, but there are a few differences that can cause some minor frustration. Here are some of the little things I&#8217;ve run into and how to adjust for them. PySpark Shell does not support code completion (autocomplete) by default. Why? PySpark uses the basic Python interpreter&hellip; <a href=\"https:\/\/garrens.com\/blog\/2017\/06\/27\/switching-between-scala-and-python-on-spark-tips\/\" title=\"Read More\" class=\"read-more\">Continue reading<span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[22],"tags":[17,14,23,2],"_links":{"self":[{"href":"https:\/\/garrens.com\/blog\/wp-json\/wp\/v2\/posts\/150"}],"collection":[{"href":"https:\/\/garrens.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/garrens.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/garrens.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/garrens.com\/blog\/wp-json\/wp\/v2\/comments?post=150"}],"version-history":[{"count":2,"href":"https:\/\/garrens.com\/blog\/wp-json\/wp\/v2\/posts\/150\/revisions"}],"predecessor-version":[{"id":154,"href":"https:\/\/garrens.com\/blog\/wp-json\/wp\/v2\/posts\/150\/revisions\/154"}],"wp:attachment":[{"href":"https:\/\/garrens.com\/blog\/wp-json\/wp\/v2\/media?parent=150"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/garrens.com\/blog\/wp-json\/wp\/v2\/categories?post=150"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/garrens.com\/blog\/wp-json\/wp\/v2\/tags?post=150"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}