Tuesday, February 19, 2019

Spark - countByValue


Spark has a special way to count the unique values in the RDD using the countByValue() operation.

CountByValue is an action .
This action will always returns the unique values from a RDD.
We should call this function only when we have a small dataset as it will bring the complete dataset into the driver and will lead to performance issue.

Example :- We have a sample data country.txt having the name of different countries .we need to count the occurence of each country.


Output :-




1 comment:

Spark - Explode

  The explode function in PySpark is used to transform an array or map column into multiple rows. It is commonly used when dealing with nest...