Saturday, June 9, 2018

Spark -Pair RDD

Spark RDD has a special operation called paired rdd . We can also say like that a RDD with Key, value pair is called Paired RDD.The paired RDD contains two data items that are associated with each other. It is somewhat similar to Dictionary in Python , Maps in Scala.
       
 
Creating Pair RDD :-

We can easily convert a RDD into Pair RDD using the map function.The result set contain the value with key,value pair.
We have created a sample data set and place it in the HDFS home directory :-

 

Then we need to invoke the spark shell using pyspark command if we are using python as the programming language.

Python code :-




Output :-  The out put some what looks like this.



Once the paired RDD has been created  we can apply a lot of transformation on the pair RDD like reduceByKey(),groupByKey().






No comments:

Post a Comment

Hadoop - What is a Job in Hadoop ?

In the field of computer science , a job just means a piece of program and the same rule applies to the Hadoop ecosystem as wel...