Windows
- Install a JDK (Java Development Kit) from http://www.oracle.com/technetwork/java/javase/downloads/index.html . Keep track of where you installed the JDK; you’ll need that later.
- Download a pre-built version of Apache Spark from https://spark.apache.org/downloads.html
- Extract the Spark archive, and copy its contents into C:\spark after creating that directory. You should end up with directories like c:\spark\bin, c:\spark\conf, etc. (download and install WinRAR, if necessary so you can extract the .tgz file you downloaded. http://www.rarlab.com/download.htm)
- Download winutils.exe from https://sundog-spark.s3.amazonaws.com/winutils.exe and move it into a C:\winutils\bin folder that you’ve created. (note, this is a 64-bit application. If you are on a 32-bit version of Windows, you’ll need to search for a 32-bit build of winutils.exe for Hadoop.)
- Open the the c:\spark\conf folder, and make sure “File Name Extensions” is checked in the “view” tab of Windows Explorer. Rename the log4j.properties.template file to log4j.properties. Edit this file (using Wordpad or something similar) and change the error level from INFO to ERROR forlog4j.rootCategory
- Right-click your Windows menu, select Control Panel, System and Security, and then System. Click on “Advanced System Settings” and then the “Environment Variables” button.
- Add the following new USER variables:
- SPARK_HOME c:\spark
- JAVA_HOME (the path you installed the JDK to in step 1, for example, C:\Program Files\Java\jdk1.8.0_101)
- HADOOP_HOME c:\winutils
- Add the following paths to your PATH user variable:
- %SPARK_HOME%\bin
%JAVA_HOME%\bin
- %SPARK_HOME%\bin
- Close the environment variable screen and the control panels.
- Install the latest Enthought Canopy from https://store.enthought.com/downloads/#default
MacOS
- Install Apache Spark using Homebrew.
- Install Homebrew if you don’t have it already by entering this from a terminal prompt: /usr/bin/ruby -e “$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)”
- Enterbrew install apache-spark
- Create a log4j.properties file via
- cd /usr/local/Cellar/apache-spark/2.0.0/libexec/conf
- cp log4j.properties.template log4j.properties
(substituted 2.0.0 for the version actually installed)
- Edit the log4j.properties file and change the log level from INFO to ERROR on log4j.rootCategory.
- Install the latest Enthought Canopy from https://store.enthought.com/downloads/#default
Linux
- Install Java, Scala, and Spark according to the particulars of your specific OS. A good starting point is http://www.tutorialspoint.com/apache_spark/apache_spark_installation.htm (but be sure to install Spark 2.0 or newer)
- Install the latest Enthought Canopy from https://store.enthought.com/downloads/#default
You’ve got everything set up! Hooray!

Leave a comment