Tuesday, March 6, 2018

Hadoop - An Introduction

Hadoop is an open source java based network that can be used for writing and running distributed application that runs on a large amount of data.

What makes hadoop different from the  normal distributed system ;

  • Hadoop runs on large clusters of commodity hardware and also on the cloud like Amazon (EC2)
  • Since Hadoop runs on commodity hardware so the failure will be frequent.It can handle the failure easily.
  • It is highly scalable means it can handle large data by adding more nodes on the cluster.
  • Hadoop allows user to write simple parallel code.

 


Hadoop simplicity and accessibility has given it an edge over the existing technologies.

Hadoop has two main component :

Storage , can store a huge amount of data structured or unstructured data.

Processing ,has a parallel processing framework.

The storage part is handled by the HDFS (Hadoop Distributed File System) while the Processing is managed by MapReduce.



 

No comments:

Post a Comment

Hadoop - What is a Job in Hadoop ?

In the field of computer science , a job just means a piece of program and the same rule applies to the Hadoop ecosystem as wel...