#bigdata 30e?—?Apache Flume and Sqoop
Credits Apache Foundation

#bigdata 30e?—?Apache Flume and Sqoop

To capture data or move them into Hadoop we have two tools that are part of the Hadoop Ecosystem, called FLUME and SQOOP.

1 — APACHE FLUME

Flume is free software, developed by Cloudera, and delivered to the management of the Apache Foundation.

Flume allows streaming data, coming from multiple locations, to be “injected” or moved into Hadoop clusters and written into HDFS.

Credits Apache Foundation

Flume is used to collect log files from a worldwide network of clusters, with data stored in HDFS to be analyzed later or even in real time.

2 — APACHE SQOOP

Sqoop is an open source software designed to transfer data between relational and Hadoop database systems.

Credits Apache Foundation

Used in Data Warehouse for the extraction of structured data for analysis in Hadoop.

CURIOSITIES

  1. One of the advantages of Flume is that the captured data can be stored directly into HBase or HDFS.
  2. Flume is widely used to import large volumes of data from events produced on social networks like Facebook and Twitter, and e-commerce sites like Amazon for example.
  3. Sqoop works with relational databases such as Teradata, Netezza, Oracle, MySQL, and Postgres.
  4. Sqoop helps to download ETL (Extract, Transform, Load) tasks from Data Warehouse to Hadoop in an efficient and low-cost way.
  5. Sqoop can also do the task reversed by transferring Hadoop data into Relational Database.

More information about this article

Article selected from the eBook “Big Data for Executives and Market Professionals.”

eBook in English: Amazon or Apple Store

eBook in Portuguese: Amazon or Apple Store

要查看或添加评论,请登录

José Antonio Ribeiro Neto的更多文章

社区洞察

其他会员也浏览了