Processing frameworks for Hadoop
Diego Marinho de Oliveira
Gen-AI Search, RecSys | ex-SEEK, AI Lead, Data Scientist Manager and ML Engineer Specialist
widespread applications. In the Hadoop ecosystem, you can store your data in one of the storage managers (for example, HDFS, HBase, Solr, etc.) and then use a processing framework to process the stored data. Hadoop first shipped with only one processing framework: MapReduce. Today, there are many other open source tools in the Hadoop ecosystem that can be used to process data in Hadoop; a few common tools include the following Apache projects: Hive, Pig, Spark, Cascading, Crunch, Tez, and Drill, along with Impala and Presto. Some of these frameworks are built on top of each other. For example, you can write queries in Hive that can run on MapReduce or Tez. Another example currently under development is the ability to run Hive queries on Spark.Hadoop has become the de-facto platform for storing and processing large amounts of data and has found
Amidst all of these options, two key questions arise for Hadoop users:
This post will you help answer both of these questions, giving you enough context to make an educated decision regarding the best processing framework for your specific use case.
- Which processing frameworks are most commonly used?
- How do I choose which framework(s) to use for my specific use case?
Written by Mark Grover.
Read full article at https://oreil.ly/1zZdEGG
Leading Product Vision, driving innovation. Product, Marketing & Technology expert | Driving Innovation & Market Growth | Expert in AI, Smart Cities & Smart Mobility, Telecom, Fintech, IoE, IoT | GTM Strategy
10 年I think YARN frame work is better nameing. I think the pic refer to YARN 2.0