Harvinder Saluja's Springboot, OCI AWS, API, EIA, EISA, Data Science/Engineering and Dev/Ops BLOG: Apache-Spark

Apache Spark 2.4.5: Installation on Ubuntu on AWS.

Download the latest release of Spark here.
Unpack the archive.
tar -xvf spark-2.4.5-bin-hadoop2.7.tgz
Move the resulting folder and create a symbolic link so that you can have multiple versions of Spark installed.
sudo mv spark-2.4.5-bin-hadoop2.7 /usr/local
sudo ln -s /usr/local/spark-2.4.5-bin-hadoop2.7/ /usr/local/spark
cd spark-2.4.5-bin-hadoop2.7/
Also add SPARK_HOME to your environment.
export SPARK_HOME=/usr/local/spark
Start a standalone master server. At this point you can browse to http://localhost:8080/, to view the status
$SPARK_HOME/sbin/start-master.sh

starting org.apache.spark.deploy.master.Master, logging to /usr/local/spark/logs/spark-osboxes-org.apache.spark.deploy.master.Master-1-osboxes.out

Start a Slave Process

$SPARK_HOME/sbin/start-slave.sh spark://osboxes:7077

To get this to work, make an entry in /etc/hosts as

27.0.0.1 localhost
127.0.1.1 osboxes

To check the Logs

vi /usr/local/spark/logs/spark-osboxes-org.apache.spark.deploy.worker.Worker-1-osboxes.out

Test out the Spark shell. You’ll note that this exposes the native Scala interface to Spark.
$SPARK_HOME/bin/spark-shell

To use Py Spark

$SPARK_HOME/bin/pyspark

To Stop the Slave:
$SPARK_HOME/sbin/stop-slave.sh

To Stop the master:

$SPARK_HOME/sbin/stop-master.sh

Apache Spark is an open-source cluster-computing framework. Originally developed at the University of California, Berkeley's AMPLab, the Spark codebase was later donated to the Apache Software Foundation, which has maintained it since. Spark provides an interface for programming entire clusters with implicit data parallelism and fault-tolerance.

Please follow following instructions on installation Apache Spark on Windows 10.

Prerequisites:

Please ensure that you have installed JDK 1.8 or above on your environment.

Steps:

Installation of Scala 2.12.2