<br />
<b>Deprecated</b>:  Creation of dynamic property wpdb::$categories is deprecated in <b>/home/garrens3/public_html/blog/wp-includes/wp-db.php</b> on line <b>760</b><br />
<br />
<b>Deprecated</b>:  Creation of dynamic property wpdb::$post2cat is deprecated in <b>/home/garrens3/public_html/blog/wp-includes/wp-db.php</b> on line <b>760</b><br />
<br />
<b>Deprecated</b>:  Creation of dynamic property wpdb::$link2cat is deprecated in <b>/home/garrens3/public_html/blog/wp-includes/wp-db.php</b> on line <b>760</b><br />
<br />
<b>Deprecated</b>:  Automatic conversion of false to array is deprecated in <b>/home/garrens3/public_html/blog/wp-content/plugins/loginizer/init.php</b> on line <b>250</b><br />
<br />
<b>Deprecated</b>:  Automatic conversion of false to array is deprecated in <b>/home/garrens3/public_html/blog/wp-content/plugins/loginizer/init.php</b> on line <b>265</b><br />
<br />
<b>Deprecated</b>:  Creation of dynamic property WP_Block_Type::$skip_inner_blocks is deprecated in <b>/home/garrens3/public_html/blog/wp-includes/class-wp-block-type.php</b> on line <b>391</b><br />
<br />
<b>Deprecated</b>:  Creation of dynamic property WP_Block_Type::$skip_inner_blocks is deprecated in <b>/home/garrens3/public_html/blog/wp-includes/class-wp-block-type.php</b> on line <b>391</b><br />
{"id":158,"date":"2017-07-29T08:30:43","date_gmt":"2017-07-29T16:30:43","guid":{"rendered":"http:\/\/garrens.com\/blog\/?p=158"},"modified":"2018-03-02T20:50:49","modified_gmt":"2018-03-03T04:50:49","slug":"using-spark-efficiently-understanding-spark-event-72917","status":"publish","type":"post","link":"https:\/\/garrens.com\/blog\/2017\/07\/29\/using-spark-efficiently-understanding-spark-event-72917\/","title":{"rendered":"Using Spark Efficiently | Understanding Spark Event 7\/29\/17"},"content":{"rendered":"<p>This page is dedicated to resources related to the 7\/29\/17 Understanding Spark event presentation in Bellevue, WA.<\/p>\n<p><a href=\"http:\/\/garrens.com\/blog\/wp-content\/uploads\/2017\/07\/Understanding-Spark-2017-07-29-copy.pdf\">Slides<\/a><\/p>\n<p>Great [FREE!] resources on all things Spark:<br \/>\n<a href=\"https:\/\/jaceklaskowski.gitbooks.io\/mastering-apache-spark\/\">https:\/\/jaceklaskowski.gitbooks.io\/mastering-apache-spark\/<\/a><br \/>\n<a href=\"https:\/\/spark.apache.org\/docs\/latest\/sql-programming-guide.html\">https:\/\/spark.apache.org\/docs\/latest\/sql-programming-guide.html<\/a><\/p>\n<p>Databricks was founded by the original creators of Spark and is currently the largest contributor to Apache Spark. As such, they are a phenomenal resource for information and services relating to Spark.<\/p>\n<p>Datasets: <a href=\"https:\/\/databricks.com\/blog\/2016\/01\/04\/introducing-apache-spark-datasets.html\">https:\/\/databricks.com\/blog\/2016\/01\/04\/introducing-apache-spark-datasets.html<\/a><br \/>\nCatalyst: <a href=\"https:\/\/www.slideshare.net\/databricks\/a-deep-dive-into-spark-sqls-catalyst-optimizer-with-yin-huai\">https:\/\/www.slideshare.net\/databricks\/a-deep-dive-into-spark-sqls-catalyst-optimizer-with-yin-huai<\/a><br \/>\n<a href=\"https:\/\/de.slideshare.net\/SparkSummit\/deep-dive-into-catalyst-apache-spark-20s-optimizer-63071120\">https:\/\/de.slideshare.net\/SparkSummit\/deep-dive-into-catalyst-apache-spark-20s-optimizer-63071120<\/a><br \/>\nTungsten: <a href=\"https:\/\/databricks.com\/blog\/2015\/04\/28\/project-tungsten-bringing-spark-closer-to-bare-metal.html\">https:\/\/databricks.com\/blog\/2015\/04\/28\/project-tungsten-bringing-spark-closer-to-bare-metal.html<\/a><br \/>\nMatrix: <a href=\"https:\/\/databricks.com\/blog\/2016\/07\/14\/a-tale-of-three-apache-spark-apis-rdds-dataframes-and-datasets.html\">https:\/\/databricks.com\/blog\/2016\/07\/14\/a-tale-of-three-apache-spark-apis-rdds-dataframes-and-datasets.html<\/a><\/p>\n<p>Personally curated Examples:<\/p>\n<p><strong>Create mock typed object data<\/strong><br \/>\n<code>import org.apache.spark.sql.functions._<\/code><\/p>\n<p>case class CountryGDP(countryCode : String, countryName : String, Year : String, gdp: Double, language : Option[String])<br \/>\nval objects = Seq[CountryGDP](<br \/>\nCountryGDP(&#8220;USA&#8221;, &#8220;&#8216;Murica&#8221;, &#8220;2014&#8221;, 17393103000000f, None),<br \/>\nCountryGDP(&#8220;USA&#8221;, &#8220;&#8216;Murica&#8221;, &#8220;2015&#8221;, 18036648000000f, None),<br \/>\nCountryGDP(&#8220;USA&#8221;, &#8220;&#8216;Murica&#8221;, &#8220;2016&#8221;, 18569100000000f, None),<br \/>\nCountryGDP(&#8220;CHE&#8221;, &#8220;Switzerland&#8221;, &#8220;2014&#8221;, 702705544908.583, None),<br \/>\nCountryGDP(&#8220;CHE&#8221;, &#8220;Switzerland&#8221;, &#8220;2015&#8221;, 670789928809.882, None),<br \/>\nCountryGDP(&#8220;CHE&#8221;, &#8220;Switzerland&#8221;, &#8220;2016&#8221;, 659827235193.83, None)<br \/>\n)<\/p>\n<p><strong>Strongly typed Datasets<\/strong><br \/>\n<code>val objectsDS = spark.createDataset(objects)<\/code><\/p>\n<p>\/\/ typed objects are evaluated at compile time (great for development in IDEs!)<br \/>\nval countriesWithLanguages = objectsDS.map(o =&gt; {<br \/>\nval lang = o.countryCode match {<br \/>\ncase &#8220;USA&#8221; =&gt; Some(&#8220;English&#8221;)<br \/>\ncase &#8220;CHE&#8221; =&gt; Some(&#8220;Schweizerdeutsch&#8221;)<br \/>\ncase _ =&gt; Some(&#8220;Simlish&#8221;)<br \/>\n}<br \/>\no.copy(language = lang)<br \/>\n})<\/p>\n<p><strong>Creating DataFrame and using UDF to transform<\/strong><br \/>\n<code>val rowsDF = spark.createDataFrame(objects)<\/code><\/p>\n<p>def getLang(countryCode: String): Option[String] = {<br \/>\ncountryCode match {<br \/>\ncase &#8220;USA&#8221; =&gt; Some(&#8220;English&#8221;)<br \/>\ncase &#8220;CHE&#8221; =&gt; Some(&#8220;Schweizerdeutsch&#8221;)<br \/>\ncase _ =&gt; Some(&#8220;Simlish&#8221;)<br \/>\n}<br \/>\n}<br \/>\nval gl = sqlContext.udf.register(&#8220;getLang&#8221;, getLang _)<\/p>\n<p>\/\/ String-based lookups are evaluated at Runtime<br \/>\nval rowsDFWithLanguage = rowsDF.withColumn(&#8220;language&#8221;, gl($&#8221;countryCode&#8221;))<\/p>\n<p>Event link: <a href=\"https:\/\/www.eventbrite.com\/e\/understanding-spark-tickets-35440866586#\">https:\/\/www.eventbrite.com\/e\/understanding-spark-tickets-35440866586#<\/a><\/p>\n<p>Video recording is here: <a href=\"https:\/\/livestream.com\/metis\/events\/7597562\">https:\/\/livestream.com\/metis\/events\/7597562<\/a><\/p>\n<p>Learn more about <a href=\"http:\/\/garrens.com\/blog\/2017\/06\/26\/real-time-big-data-analytics-parquet-and-spark-bonus\/\">parquet<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>This page is dedicated to resources related to the 7\/29\/17 Understanding Spark event presentation in Bellevue, WA. Slides Great [FREE!] resources on all things Spark: https:\/\/jaceklaskowski.gitbooks.io\/mastering-apache-spark\/ https:\/\/spark.apache.org\/docs\/latest\/sql-programming-guide.html Databricks was founded by the original creators of Spark and is currently the largest contributor to Apache Spark. As such, they are a phenomenal resource for information and&hellip; <a href=\"https:\/\/garrens.com\/blog\/2017\/07\/29\/using-spark-efficiently-understanding-spark-event-72917\/\" title=\"Read More\" class=\"read-more\">Continue reading<span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[22],"tags":[17,2],"_links":{"self":[{"href":"https:\/\/garrens.com\/blog\/wp-json\/wp\/v2\/posts\/158"}],"collection":[{"href":"https:\/\/garrens.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/garrens.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/garrens.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/garrens.com\/blog\/wp-json\/wp\/v2\/comments?post=158"}],"version-history":[{"count":7,"href":"https:\/\/garrens.com\/blog\/wp-json\/wp\/v2\/posts\/158\/revisions"}],"predecessor-version":[{"id":225,"href":"https:\/\/garrens.com\/blog\/wp-json\/wp\/v2\/posts\/158\/revisions\/225"}],"wp:attachment":[{"href":"https:\/\/garrens.com\/blog\/wp-json\/wp\/v2\/media?parent=158"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/garrens.com\/blog\/wp-json\/wp\/v2\/categories?post=158"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/garrens.com\/blog\/wp-json\/wp\/v2\/tags?post=158"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}