Tuesday, February 27, 2018

Apache Spark Core

In spark , the architecture is mainly divided between the Driver and Executor node.The Driver node is the part of the program where the main program get executed.The Driver take the main program and distribute the data sets into the worker nodes and also the operation that the worker nodes are suppose to do.
In layman words, the driver is the manager and executor are the developer .Driver distributes the resources and tasks to be performed by each developer.
The driver program access spark through sparkcontext  object which is connected to a computing cluster.


In spark shell ,you can connect to  the sparkcontext via sc variable .

If we are running our program on our local machine ,then it will run on a single cluster .But when we run the same program on cluster , different part of the program is run on different cluster.

No comments:

Post a Comment

Delta Lake - Time Travel

  Time Travel allows you to query, restore, or compare data from a previous version of a Delta table. Delta Lake automatically keeps tra...