Spark
DataFrames 2
In
the previous blog , we have gone through the basic of dataframes and
also created a dataframe from the sample test.csv file.In this post,
we go ahead and will see how to do different operations on dataframes.
We
will see the complete program and find out the different operations
that can be done on the dataframes.
We
will start the program by reading the csv file and try to display the count of rows in the dataframes.
After
the submitting the spark jobs , we will get the below output.
Display
the number of the columns and their name.
describe operation
is use to calculate the summary statistics of numerical column(s) in
DataFrame. If we don’t specify the name of columns it will
calculate summary statistics for all numerical columns present in
DataFrame.
Selecting specific columns in the dataframes .
The
specific columns in a dataframe can be selected by invoking the
dataframe and specifying the required columns.
Displaying the statistics of a specific column.
No comments:
Post a Comment