Spark
has a special way to count the unique values in the RDD using the
countByValue() operation.
CountByValue
is an action .
This
action will always returns the unique values from a RDD.
We
should call this function only when we have a small dataset as it
will bring the complete dataset into the driver and will lead to
performance issue.
Example
:- We have a sample data country.txt having the name of different
countries .we need to count the occurence of each country.
Output
:-
ReplyDeleteGreat Article
Final Year Projects in Python
Python Training in Chennai
FInal Year Project Centers in Chennai
Python Training in Chennai