Data is Future: Standalone Application in Spark

Tuesday, February 27, 2018

Standalone Application in Spark

Spark can be run interactively as well as in the standalone program .The major difference between the interactive shell and standalone application is that we need to define SparkContext in the case of standalone but in the interactive shell it is available through sc variable.
We will learn the spark through python implementation.In Python, you simply write applications as Python scripts, but you must run them using the bin/spark-submit script included in Spark. The spark-submit script includes the Spark dependencies for us in Python. This script sets up the environment for Spark’s Python API to function.

Intialization in Standalone Program;-

from pyspark import SparkContext,SparkConf
conf =SparkConf().setAppName("Count").setMaster("local")
sc =SparkContext(conf=conf)

The Line 1 will import all the Spark API for the Python.
Then in Line 2 , we are giving a name to identify this program on the cluster as count and which tells Spark how to connect to a cluster. local is a special value that runs Spark on one thread on the local machine, without connecting to a cluster.
In Line we initialize the SparkContext with sc Variable.

Data is Future

Tuesday, February 27, 2018

Standalone Application in Spark

No comments:

Post a Comment

Delta Lake - Time Travel