Flume
is a highly reliable tool for aggregating and transporting large
amount of streaming data such as log files,events from various
sources to centralized data source.
Flume
chiefly consist of three components :-
Source
Channel
Sink
Channel: Normally, the reading speed is faster than the writing speed. Thus, we need some buffer to match the read & write speed difference. Basically, the buffer acts as a intermediary storage that stores the data being transferred temporarily and therefore prevents data loss. Similarly, channel acts as the local storage or a temporary storage between the source of data and persistent data in the HDFS.
Sink: Then, our last component i.e. Sink, collects the data from the channel and commits or writes the data in the HDFS permanently.
Advantages
of Flume :-
-
It is reliable, salable, fault tolerant and customizable for different sources and sinks.
-
Flume provides a steady flow of data between read and write operations.
-
Flume feed online streaming data from various sources like network traffic, social media, email messages, and log files into HDFS.
-
Supports multiple data flow like multiple-hop, fan-in, and fan-out.
Disadvantages
of Flume :-
-
Flume has complex topology.
-
It
does not support for data
replication.It
does not guarantee 100% unique
message delivery
(duplicate messages might enter at any times).
No comments:
Post a Comment