Saturday, April 14, 2018

HIVE -Serde(CSV)

HIVE has an immense capability of processing the data and it can be achieved by the help of the package SERDE which means Serialization and Deserialization.

To read the semi structured data like JSON,XML , HIVE need to understand how to process such kind of format.To solve this ,SERDE came into the picture.

SERDE performs two function mainly :-

1.)Reading data from Table.
2.)Writing the data back to the HDFS.


DESERIALIZER takes the binary or string representation of the record and converts into the Java object that can be manipulated by the HIVE.
SERIALIZER takes the Java object and convert it back into the such a format that can be written into the HDFS.

SERDE can be downloaded from the hadoop distribution vendor like (cloudera or Hortonworks)
The JAR file need to be placed  into  the $HIVE_HOME/lib.The required SERDE need to be register  into the HIVE .
Let us take an example to understand the SERDE in a more efficient way.
we have a csv file serdefile.csv having the below data.
 
we need to put this file into hdfs using the put command.
hadoop fs -put <source> <destination>
we need to create the external table to read the
 
we can now do the basic select query from the above table.

No comments:

Post a Comment

Hadoop - What is a Job in Hadoop ?

In the field of computer science , a job just means a piece of program and the same rule applies to the Hadoop ecosystem as wel...