Tuesday, February 27, 2018

Standalone Application in Spark

Spark can be run interactively as well as in the standalone program .The major difference  between the interactive shell and standalone application is that we need to define SparkContext in the case of standalone but in the interactive shell it is available through sc variable.
We will learn the spark through python implementation.In Python, you simply write applications as Python scripts, but you must run them using the bin/spark-submit script included in Spark. The spark-submit script includes the Spark dependencies for us in Python. This script sets up the environment for Spark’s Python API to function.

Intialization in Standalone Program;-

from pyspark import SparkContext,SparkConf
conf =SparkConf().setAppName("Count").setMaster("local")
sc =SparkContext(conf=conf)

The Line 1 will import all the Spark API for the Python.
Then in Line 2 , we are giving a name to identify this program on the cluster as count and which tells Spark how to connect to a cluster. local is a special value that runs Spark on one thread on the local machine, without connecting to a cluster.
In Line we initialize the SparkContext with sc Variable.

No comments:

Post a Comment

Hadoop - What is a Job in Hadoop ?

In the field of computer science , a job just means a piece of program and the same rule applies to the Hadoop ecosystem as wel...