Connecting Apache Spark to External Data sources (e.g. Redshift, S3, MySQL)

Pre-requisites AWS S3 Hadoop AWS Jar AWS Java SDK Jar * Note: These AWS jars should not be necessary if you’re using Amazon EMR. Amazon Redshift JDBC Driver Spark-Redshift package * * The Spark-redshift package provided by Databricks is critical particularly if you wish to WRITE to Redshift, because it does bulk file operations instead… Continue reading