Sunday, October 21, 2018

Flume - Architecture




Flume is a highly reliable tool for aggregating and transporting large amount of streaming data such as log files,events from various sources to centralized data source.

Flume chiefly consist of three components :-

Source 
Channel  
Sink


Source: It accepts the data from the incoming streamline and stores the data in the channel.
Channel: Normally, the reading speed is faster than the writing speed. Thus, we need some buffer to match the read & write speed difference. Basically, the buffer acts as a intermediary storage that stores the data being transferred temporarily and therefore prevents data loss. Similarly, channel acts as the local storage or  a temporary storage between the source of data and persistent data in the HDFS.
Sink: Then, our last component i.e. Sink, collects the data from the channel and commits or writes the data in the HDFS permanently.

Advantages of Flume :-

  • It is reliable, salable, fault tolerant and customizable for different sources and sinks.
  • Flume provides a steady flow of data between read and write operations.
  • Flume feed online streaming data from various sources like network traffic, social media, email messages, and log files into HDFS.
  • Supports multiple data flow like multiple-hop, fan-in, and fan-out.
Disadvantages of Flume :-

  • Flume has complex topology.
  • It does not support for data replication.It does not guarantee 100% unique message delivery (duplicate messages might enter at any times).

No comments:

Post a Comment

Hadoop - What is a Job in Hadoop ?

In the field of computer science , a job just means a piece of program and the same rule applies to the Hadoop ecosystem as wel...