DataFrames are the combination of rows and columns and are quite similar to the tables in sql. They have combination of rows and columns Rows can have a variety of data formats (heterogeneous), whereas a column can have data of the same data type (homogeneous).
DataFrames are built in order to make the spark understand the schema of the dataframes.This can help the spark to optimize it’s execution plan.
It will further help in the slicing and dicing of the data. It will help where we have a lot of columns in the data source and we need very few columns out of it.
These dataframes are able to support various kinds of data sources like csv , Json, hive tables etc.
Let us read a csv and create the data frame out of it.
We can further check the schema of the dataframe which tell us the column by column detail of the data.
Let us some other stuff like counting of the data .
We will go through some more data frames commands in our next blog post.
Thanks !!!
DataFrames are built in order to make the spark understand the schema of the dataframes.This can help the spark to optimize it’s execution plan.
It will further help in the slicing and dicing of the data. It will help where we have a lot of columns in the data source and we need very few columns out of it.
These dataframes are able to support various kinds of data sources like csv , Json, hive tables etc.
Let us read a csv and create the data frame out of it.
We can further check the schema of the dataframe which tell us the column by column detail of the data.
Let us some other stuff like counting of the data .
We will go through some more data frames commands in our next blog post.
Thanks !!!
No comments:
Post a Comment