Harvinder Saluja's Springboot, OCI AWS, API, EIA, EISA, Data Science/Engineering and Dev/Ops BLOG: March 2020

Sunday, March 15, 2020

Apache Spark 2.4.5: Installation on Ubuntu on AWS.

Download the latest release of Spark here.
Unpack the archive.
tar -xvf spark-2.4.5-bin-hadoop2.7.tgz
Move the resulting folder and create a symbolic link so that you can have multiple versions of Spark installed.
sudo mv spark-2.4.5-bin-hadoop2.7 /usr/local
sudo ln -s /usr/local/spark-2.4.5-bin-hadoop2.7/ /usr/local/spark
cd spark-2.4.5-bin-hadoop2.7/
Also add SPARK_HOME to your environment.
export SPARK_HOME=/usr/local/spark
Start a standalone master server. At this point you can browse to http://localhost:8080/, to view the status
$SPARK_HOME/sbin/start-master.sh

starting org.apache.spark.deploy.master.Master, logging to /usr/local/spark/logs/spark-osboxes-org.apache.spark.deploy.master.Master-1-osboxes.out

Start a Slave Process

$SPARK_HOME/sbin/start-slave.sh spark://osboxes:7077

To get this to work, make an entry in /etc/hosts as

27.0.0.1 localhost
127.0.1.1 osboxes

To check the Logs

vi /usr/local/spark/logs/spark-osboxes-org.apache.spark.deploy.worker.Worker-1-osboxes.out

Test out the Spark shell. You’ll note that this exposes the native Scala interface to Spark.
$SPARK_HOME/bin/spark-shell

To use Py Spark

$SPARK_HOME/bin/pyspark

To Stop the Slave:
$SPARK_HOME/sbin/stop-slave.sh

To Stop the master:

$SPARK_HOME/sbin/stop-master.sh

Harvinder Saluja's Springboot, OCI AWS, API, EIA, EISA, Data Science/Engineering and Dev/Ops BLOG

Sunday, March 15, 2020

Apache Spark 2.4.5: Installation on Ubuntu on AWS.

Apache Spark 2.4.5: Installation on Ubuntu on AWS.

Amazon Sagemaker Studio

Total Pageviews