Tuesday, February 19, 2019

Spark - countByValue

Spark has a special way to count the unique values in the RDD using the countByValue() operation.

CountByValue is an action .
This action will always returns the unique values from a RDD.
We should call this function only when we have a small dataset as it will bring the complete dataset into the driver and will lead to performance issue.

Example :- We have a sample data country.txt having the name of different countries .we need to count the occurence of each country.

Output :-

1 comment:

Hadoop - What is a Job in Hadoop ?

In the field of computer science , a job just means a piece of program and the same rule applies to the Hadoop ecosystem as wel...