Data is Future: Hadoop - An Introduction

Tuesday, March 6, 2018

Hadoop - An Introduction

Hadoop is an open source java based network that can be used for writing and running distributed application that runs on a large amount of data.

What makes hadoop different from the normal distributed system ;

Hadoop runs on large clusters of commodity hardware and also on the cloud like Amazon (EC2)
Since Hadoop runs on commodity hardware so the failure will be frequent.It can handle the failure easily.
It is highly scalable means it can handle large data by adding more nodes on the cluster.
Hadoop allows user to write simple parallel code.

Hadoop simplicity and accessibility has given it an edge over the existing technologies.

Hadoop has two main component :

Storage , can store a huge amount of data structured or unstructured data.

Processing ,has a parallel processing framework.

The storage part is handled by the HDFS (Hadoop Distributed File System) while the Processing is managed by MapReduce.

Data is Future

Tuesday, March 6, 2018

Hadoop - An Introduction

No comments:

Post a Comment

Delta Lake - Time Travel