Data is Future: HIVE -Serde(CSV)

Saturday, April 14, 2018

HIVE -Serde(CSV)

HIVE has an immense capability of processing the data and it can be achieved by the help of the package SERDE which means Serialization and Deserialization.

To read the semi structured data like JSON,XML , HIVE need to understand how to process such kind of format.To solve this ,SERDE came into the picture.

SERDE performs two function mainly :-

1.)Reading data from Table.
2.)Writing the data back to the HDFS.

DESERIALIZER takes the binary or string representation of the record and converts into the Java object that can be manipulated by the HIVE.
SERIALIZER takes the Java object and convert it back into the such a format that can be written into the HDFS.

SERDE can be downloaded from the hadoop distribution vendor like (cloudera or Hortonworks)
The JAR file need to be placed into the $HIVE_HOME/lib.The required SERDE need to be register into the HIVE .
Let us take an example to understand the SERDE in a more efficient way.
we have a csv file serdefile.csv having the below data.

we need to put this file into hdfs using the put command.
hadoop fs -put <source> <destination>
we need to create the external table to read the

we can now do the basic select query from the above table.

Data is Future

Saturday, April 14, 2018

HIVE -Serde(CSV)

No comments:

Post a Comment

Delta Lake - Time Travel