Apache Spark 2.4.5: Installation on Ubuntu on AWS.
- Download the latest release of Spark here.
- Unpack the archive.
- tar -xvf spark-2.4.5-bin-hadoop2.7.tgz
- Move the resulting folder and create a symbolic link so that you can have multiple versions of Spark installed.
- sudo mv spark-2.4.5-bin-hadoop2.7 /usr/local
- sudo ln -s /usr/local/spark-2.4.5-bin-hadoop2.7/ /usr/local/spark
- cd spark-2.4.5-bin-hadoop2.7/
- Also add
SPARK_HOME
to your environment. - export SPARK_HOME=/usr/local/spark
- Start a standalone master server. At this point you can browse to http://localhost:8080/, to view the status
- $SPARK_HOME/sbin/start-master.sh
starting org.apache.spark.deploy.master.Master, logging to /usr/local/spark/logs/spark-osboxes-org.apache.spark.deploy.master.Master-1-osboxes.out
Start a Slave Process
$SPARK_HOME/sbin/start-slave.sh spark://osboxes:7077
To get this to work, make an entry in /etc/hosts as
27.0.0.1 localhost
127.0.1.1 osboxes
27.0.0.1 localhost
127.0.1.1 osboxes
vi /usr/local/spark/logs/spark-osboxes-org.apache.spark.deploy.worker.Worker-1-osboxes.out
Test out the Spark shell. You’ll note that this exposes the native Scala interface to Spark.
$SPARK_HOME/bin/spark-shell
- To use Py Spark
To Stop the Slave:
$SPARK_HOME/sbin/stop-slave.sh
To Stop the master:
$SPARK_HOME/sbin/stop-master.sh
$SPARK_HOME/sbin/stop-slave.sh
To Stop the master:
$SPARK_HOME/sbin/stop-master.sh