Wednesday, March 6, 2019

Oozie - Introduction

Oozie is designed to run multi-stages Hadoop jobs as a single job.It is a scheduler system to run and manage Hadoop jobs in a distributed environment.It was developed by Yahoo and later outsourced in year 2010-11 to Apache.Oozie uses the concept of directed acyclic graph (DAG) to coordinate multi stages jobs.The output of one current action is used to run the next job.

Why we need Oozie ?


When we are working on Hadoop framework with a large data set and multiple of data transformation, it is not easy to handle all this things in a single job.We need multiple of process like Map reduce ,Hive , Pig etc to handle all this jobs in a single job .Oozie has the capability to handle all this processes in a single place.

How the name “Oozie” came into the picture ?


The team of engineer who built the Oozie want  a name that is somehow similar to elephant as hadoop was coined after the toy elephant.The engineers were looking for the name that controls the elephant .The Indian name for elephant is “mahout” which was already taken by Apache mahout.Then they found the Burmese name for elephant keeper called “Oozie”.


Where does Oozie sits in Hadoop framework ?


What are the common types of Jobs in Oozie ?

Some of the common types in Oozie are below :-

    • Oozie jobs running on demand are called workflow jobs.
    • Oozie jobs running periodically are called coordinator jobs.
    • A bundle job is a collection of coordinator jobs managed as a single job.

We will learn about all these jobs in detail in our next blog post.

No comments:

Post a Comment

Hadoop - What is a Job in Hadoop ?

In the field of computer science , a job just means a piece of program and the same rule applies to the Hadoop ecosystem as wel...