Showing posts with label Apache-Spark. Show all posts
Showing posts with label Apache-Spark. Show all posts

Sunday, March 15, 2020

Apache Spark 2.4.5: Installation on Ubuntu on AWS.


Apache Spark 2.4.5: Installation on Ubuntu on AWS.

  • Download the latest release of Spark here.
  • Unpack the archive.
  •  tar -xvf spark-2.4.5-bin-hadoop2.7.tgz
  •  Move the resulting folder and create a symbolic link so that you can have multiple versions of Spark installed.
  • sudo mv spark-2.4.5-bin-hadoop2.7 /usr/local 
  • sudo ln -s /usr/local/spark-2.4.5-bin-hadoop2.7/ /usr/local/spark
  • cd spark-2.4.5-bin-hadoop2.7/ 
  • Also add SPARK_HOME to your environment.
  • export SPARK_HOME=/usr/local/spark 
  • Start a standalone master server. At this point you can browse  to http://localhost:8080/, to view the status
  • $SPARK_HOME/sbin/start-master.sh 
starting org.apache.spark.deploy.master.Master, logging to /usr/local/spark/logs/spark-osboxes-org.apache.spark.deploy.master.Master-1-osboxes.out 




Start a Slave Process

$SPARK_HOME/sbin/start-slave.sh spark://osboxes:7077

To get this to work, make an entry in /etc/hosts as


27.0.0.1       localhost
127.0.1.1       osboxes

To check the Logs

vi /usr/local/spark/logs/spark-osboxes-org.apache.spark.deploy.worker.Worker-1-osboxes.out 

Test out the Spark shell. You’ll note that this exposes the native Scala interface to Spark. 
    $SPARK_HOME/bin/spark-shell



  • To use Py Spark
$SPARK_HOME/bin/pyspark




To Stop the Slave:
$SPARK_HOME/sbin/stop-slave.sh

To Stop the master:

$SPARK_HOME/sbin/stop-master.sh























Friday, May 5, 2017

Installation of Apache Spark on Windows 10

Apache Spark is an open-source cluster-computing framework. Originally developed at the University of California, Berkeley's AMPLab, the Spark codebase was later donated to the Apache Software Foundation, which has maintained it since. Spark provides an interface for programming entire clusters with implicit data parallelism and fault-tolerance.

Please follow following instructions on installation Apache Spark on Windows 10.

Prerequisites:

Please ensure that you have installed JDK 1.8 or above on your environment.

Steps:

Installation of Scala 2.12.2
  • Please Install Scala after downloading it. 
  • Scala can be downloaded from here.
  • Download will give you a .msi file. Follow instructions and install Scala






















Installation of Spark


  • Spark Can be downloaded from here
  • I am choosing version 2.1.1 prebuit for Hadoop. Please note, I shall be running this without Hadoop.






















  • Extract the tar file into a folder called c:\Spark
  • The contents of the Extract will look like





Download Winutils


  • Download Winutils from these links : 64 bits
  • Create a folder c:\Spark\Winutils\bin and copy this winutils.exe there
  • The folder structure will look like


















Setup Environment Variables


  • Following environment variables will need to be setup:
    • JAVA_HOME: C:\jdk1.8.0_91
    • SCALA_HOME: C:\Program Files (x86)\scala\bin
    • _JAVA_OPTION: -Xms128m -Xmx256m
    • HADOOP_HOME:  C:\Spark\WinUtils
    • SPARK_HOME: C:\Program Files (x86)\scala\bin
  • Create a folder c:\tmp\hive and give it read/write/execute privileges for all
Test Spark Environment

  • Navigate to SPARK_HOME/bin and execute command scala-shell
You should re ready to use Spark






Amazon Sagemaker Studio

Amazon SageMaker Studio is an integrated development environment (IDE) for machine learning that provides everything data scientists and dev...