#bigdata 30e?—?Apache Flume and Sqoop
José Antonio Ribeiro Neto
Author, Artificial Intelligence and Data Science Reseacher
To capture data or move them into Hadoop we have two tools that are part of the Hadoop Ecosystem, called FLUME and SQOOP.
1 — APACHE FLUME
Flume is free software, developed by Cloudera, and delivered to the management of the Apache Foundation.
Flume allows streaming data, coming from multiple locations, to be “injected” or moved into Hadoop clusters and written into HDFS.
Flume is used to collect log files from a worldwide network of clusters, with data stored in HDFS to be analyzed later or even in real time.
2 — APACHE SQOOP
Sqoop is an open source software designed to transfer data between relational and Hadoop database systems.
Used in Data Warehouse for the extraction of structured data for analysis in Hadoop.
CURIOSITIES
- One of the advantages of Flume is that the captured data can be stored directly into HBase or HDFS.
- Flume is widely used to import large volumes of data from events produced on social networks like Facebook and Twitter, and e-commerce sites like Amazon for example.
- Sqoop works with relational databases such as Teradata, Netezza, Oracle, MySQL, and Postgres.
- Sqoop helps to download ETL (Extract, Transform, Load) tasks from Data Warehouse to Hadoop in an efficient and low-cost way.
- Sqoop can also do the task reversed by transferring Hadoop data into Relational Database.
More information about this article
Article selected from the eBook “Big Data for Executives and Market Professionals.”
eBook in English: Amazon or Apple Store
eBook in Portuguese: Amazon or Apple Store