Hadoop Ecosystem Applications
Hadoop Ecosystem Applications

Hadoop Ecosystem Applications

In this article, we will discuss some commonly used applications of Hadoop Ecosystem in brief based on the following steps of bigdata lifecycle:

  1. Ingest Data
  2. Store Data
  3. Analyse Data
  4. Access Data


Ingest Data:

FLUME

  • Collects, aggregates and transfers big data
  • Has a simple and flexible architecture based on streaming data flows
  • Uses a simple extensible data model that allows for online analytic application

SQOOP

  • Designed to transfer data between relational database system and Hadoop
  • Accesses the database to understand the schema of the data
  • Generates a MapReduce application to import or export the data


Store Data:

HBASE

  • A non-relational database that runs on top of HDFS
  • Provides real time wrangling on data
  • Stores data as indexes to allow for random and faster access to data

CASSANDRA

  • A scalable, NOSQL database designed to have no single point of failure


Analyse Data:

PIG

  • Analyses large amounts of data
  • Operates on the client side of a cluster
  • A procedural data flow language

HIVE

  • Used for creating reports
  • Operates on the server side of a Cluster
  • A declarative programming language


Access Data:

IMPALA

  • Scalable and easy to use platform for everyone
  • No programming skills are required to use this

HUE

  • It stands for Hadoop User Experience
  • Allows you to upload, browse and query data
  • Runs Pig jobs and workflows
  • Provides editors for several SQL query languages like Hive and MYSQL

要查看或添加评论,请登录

Akash Sur的更多文章

  • HDFS Simplified: Architecture and important terminologies

    HDFS Simplified: Architecture and important terminologies

    The HDFS architecture and different types of operations have been shown in the figure attached. Below are some…

  • HDFS Simplified: Blocks and Nodes

    HDFS Simplified: Blocks and Nodes

    HDFS stands for Hadoop Distributed File System. It is the primary application to handle and store bigdata in Hadoop.

  • HBase Simplified: Architecture

    HBase Simplified: Architecture

    Main Components: HMaster Region Servers Region Zookeeper HMaster: Monitors the Region Server instances Assigns regions…

  • HBase Simplified: Features

    HBase Simplified: Features

    HBase is a column oriented non-relational database management system. It runs on top of HDFS.

  • Hive Simplified: Clients and Services

    Hive Simplified: Clients and Services

    Hive Clients: JDBC Client - allows Java applications to connect to Hive ODBC client - allows applications based on ODBC…

  • Hive Simplified

    Hive Simplified

    What is Hive? -- Data Warehouse Software within Hadoop Ecosystem -- Designed for reading, writing and managing tabular…

  • How does LinkedIn use Kafka to process events?

    How does LinkedIn use Kafka to process events?

    At LinkedIn, Kafka is used for various purposes such as logging, data replication, data processing, and data…

  • Kafka Simplified: How does Kafka Store logs ?

    Kafka Simplified: How does Kafka Store logs ?

    This example topic has four partitions P1–P4. Two different producer clients are publishing, independently from each…

  • Kafka Simplified: Introduction to Apache Kafka

    Kafka Simplified: Introduction to Apache Kafka

    Kafka uses a publish-subscribe (push to store logs and pull to consume) messaging model, where data is organized into…

社区洞察

其他会员也浏览了