Monday, April 9, 2018

HIVE - Partitioned Tables

Partition in Hive is mainly  done to balance the load horizontally.Normally ,when we fire a query it will check for all the data and produce the result.This process is quite cumbersome and involve a lot of map reduce job.The job usually take too long to execute.
In order to reduce such a huge  effort , we need to partition the table based  on some column.

Below is the query to create a table with partition on the date column :

















The syntax for loading the data into the partition table is bit different from the normal table which we used in our earlier tutorials.
 

Once the table is created and we loaded the data , we will find the sub directory created in the user/hive/warehouse location.

Each sub directory corresponds to a particular partition.

If we fire a query with like select * from hivepart where year=2017.
The following query search for the data in the only sub directory that belongs to the year=2017 This will reduce a lot of map reduce operation and can be effective way of improving the performance.

STRICT MODE:-
By Default ,HIVE runs in a STRICT mode .The STRICT mode does not allow to fire a query on a partition table witjout where clause on partition table.

We can change the mode by giving "nonstrict".



In case if we fire a query then we get the semantic error .

How to check the partition in the table:

 

Dynamic Partitioning :- The Dynamic partitioning is used to eliminates the process of hard coding the partition value in a table.
But for the data to be loaded , we need a statging table.we cannot load the data from the directory .
First we will create a staging table.

create table hivestag1(col1 int, year string)
row format delimited
fields terminated by ','
lines terminated by '\n';


Then , we need to create the final table.

create table hivefin(col1 int)
partitioned by (year string)
row format delimited
fields terminated by ','
lines terminated by '\n';

Finally , we need to load the data from the staging table into the final table.


insert overwrite table  hivefin
partition(year)
select * from hivestag;
Partition for the Hivefin Table:


We need to check the below property :-

set hive.mapred.mode=nonstrict;
hive.exec.max.dynamic.partitions;
hive.exec.max.dynamic.partitions.pernode;

No comments:

Post a Comment

Hadoop - What is a Job in Hadoop ?

In the field of computer science , a job just means a piece of program and the same rule applies to the Hadoop ecosystem as wel...