Sunday, March 15, 2020

Apache Spark 2.4.5: Installation on Ubuntu on AWS.


Apache Spark 2.4.5: Installation on Ubuntu on AWS.

  • Download the latest release of Spark here.
  • Unpack the archive.
  •  tar -xvf spark-2.4.5-bin-hadoop2.7.tgz
  •  Move the resulting folder and create a symbolic link so that you can have multiple versions of Spark installed.
  • sudo mv spark-2.4.5-bin-hadoop2.7 /usr/local 
  • sudo ln -s /usr/local/spark-2.4.5-bin-hadoop2.7/ /usr/local/spark
  • cd spark-2.4.5-bin-hadoop2.7/ 
  • Also add SPARK_HOME to your environment.
  • export SPARK_HOME=/usr/local/spark 
  • Start a standalone master server. At this point you can browse  to http://localhost:8080/, to view the status
  • $SPARK_HOME/sbin/start-master.sh 
starting org.apache.spark.deploy.master.Master, logging to /usr/local/spark/logs/spark-osboxes-org.apache.spark.deploy.master.Master-1-osboxes.out 




Start a Slave Process

$SPARK_HOME/sbin/start-slave.sh spark://osboxes:7077

To get this to work, make an entry in /etc/hosts as


27.0.0.1       localhost
127.0.1.1       osboxes

To check the Logs

vi /usr/local/spark/logs/spark-osboxes-org.apache.spark.deploy.worker.Worker-1-osboxes.out 

Test out the Spark shell. You’ll note that this exposes the native Scala interface to Spark. 
    $SPARK_HOME/bin/spark-shell



  • To use Py Spark
$SPARK_HOME/bin/pyspark




To Stop the Slave:
$SPARK_HOME/sbin/stop-slave.sh

To Stop the master:

$SPARK_HOME/sbin/stop-master.sh























OCI Knowledge Series: OCI Infrastructure components

  Oracle Cloud Infrastructure (OCI) provides a comprehensive set of infrastructure services that enable you to build and run a wide range of...