Spark RDD has a special operation called paired rdd . We can also say like that a RDD with Key, value pair is called Paired RDD.The paired RDD contains two data items that are associated with each other. It is somewhat similar to Dictionary in Python , Maps in Scala.

Creating Pair RDD :-
We can easily convert a RDD into Pair RDD using the map function.The result set contain the value with key,value pair.
We have created a sample data set and place it in the HDFS home directory :-

Then we need to invoke the spark shell using pyspark command if we are using python as the programming language.
Python code :-

Output :- The out put some what looks like this.

Once the paired RDD has been created we can apply a lot of transformation on the pair RDD like reduceByKey(),groupByKey().
Creating Pair RDD :-
We can easily convert a RDD into Pair RDD using the map function.The result set contain the value with key,value pair.
We have created a sample data set and place it in the HDFS home directory :-
Then we need to invoke the spark shell using pyspark command if we are using python as the programming language.
Python code :-
Output :- The out put some what looks like this.
Once the paired RDD has been created we can apply a lot of transformation on the pair RDD like reduceByKey(),groupByKey().
No comments:
Post a Comment