登录查看更多内容

Processing frameworks for Hadoop

Diego Marinho de Oliveira

Gen-AI Search, RecSys | ex-SEEK, AI Lead, Data Scientist Manager and ML Engineer Specialist

发布日期: 2015年2月17日

Hadoop has become the de-facto platform for storing and processing large amounts of data and has found widespread applications. In the Hadoop ecosystem, you can store your data in one of the storage managers (for example, HDFS, HBase, Solr, etc.) and then use a processing framework to process the stored data. Hadoop first shipped with only one processing framework: MapReduce. Today, there are many other open source tools in the Hadoop ecosystem that can be used to process data in Hadoop; a few common tools include the following Apache projects: Hive, Pig, Spark, Cascading, Crunch, Tez, and Drill, along with Impala and Presto. Some of these frameworks are built on top of each other. For example, you can write queries in Hive that can run on MapReduce or Tez. Another example currently under development is the ability to run Hive queries on Spark.

Amidst all of these options, two key questions arise for Hadoop users:

Which processing frameworks are most commonly used?

How do I choose which framework(s) to use for my specific use case?

This post will you help answer both of these questions, giving you enough context to make an educated decision regarding the best processing framework for your specific use case.

Written by Mark Grover.

Read full article at https://oreil.ly/1zZdEGG

khalid Daoud

Leading Product Vision, driving innovation. Product, Marketing & Technology expert | Driving Innovation & Market Growth | Expert in AI, Smart Cities & Smart Mobility, Telecom, Fintech, IoE, IoT | GTM Strategy

10 年

I think YARN frame work is better nameing. I think the pic refer to YARN 2.0

要查看或添加评论，请登录

Diego Marinho de Oliveira的更多文章

Deep Learning for Personalized Search and Recommender Systems

2017年10月5日

Deep Learning for Personalized Search and Recommender Systems

Nice review about Deep Learning for Search + Recommender Systems: "Abstract Deep learning has been widely successful in…

2 条评论
Facets: An Open Source Visualization Tool for Machine Learning Training Data

2017年7月18日

Facets: An Open Source Visualization Tool for Machine Learning Training Data

"Abstract Getting the best results out of a machine learning (ML) model requires that you truly understand your data…

1 条评论
Spark: The Definitive Guide

2017年7月6日

Spark: The Definitive Guide

Databricks published some free chapters today about Spark. "Apache Spark has seen immense growth over the past several…

5 条评论
One Model To Learn Them All

2017年6月27日

One Model To Learn Them All

"Abstract Deep learning yields great results across many fields, from speech recognition, image classification, to…

3 条评论
Do Balancing Classes Improve Classifier Performance?

2017年5月25日

Do Balancing Classes Improve Classifier Performance?

Nice post by Nina Zumel "It’s a folk theorem I sometimes hear from colleagues and clients: that you must balance the…

4 条评论
Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders

2017年5月21日

Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders

"Abstract Generative models in vision have seen rapid progress due to algorithmic improvements and the availability of…
Neural Ranking Models with Weak Supervision

2017年5月5日

Neural Ranking Models with Weak Supervision

Abstract Despite the impressive improvements achieved by unsupervised deep neural networks in computer vision and NLP…
Simple/Incomplete Benchmark of Machine Learning Libraries for Classification

2017年4月19日

Simple/Incomplete Benchmark of Machine Learning Libraries for Classification

The sharing for today :) All benchmarks are wrong, but some are useful "This project aims at a minimal benchmark for…

1 条评论
Introducing tf-seq2seq: An Open Source Sequence-to-Sequence Framework in TensorFlow

2017年4月12日

Introducing tf-seq2seq: An Open Source Sequence-to-Sequence Framework in TensorFlow

Summary: "In addition to machine translation, tf-seq2seq can also be applied to any other sequence-to-sequence task…

1 条评论
Mask R-CNN

2017年3月22日

Mask R-CNN

"Abstract We present a conceptually simple, flexible, and general framework for object instance segmentation. Our…

3 条评论

See all articles

Processing frameworks for Hadoop

Diego Marinho de Oliveira

Gen-AI Search, RecSys | ex-SEEK, AI Lead, Data Scientist Manager and ML Engineer Specialist

Diego Marinho de Oliveira的更多文章

社区洞察

其他会员也浏览了

Oozie

Hadoop

Oozie

How To Create Hadoop Cluster In Just 10 Minutes ?

Hadoop Ecosystem Applications

HADOOP

The Power Of DistCp

HBase

Still confused about different data lakes? Hadoop Vs. In-Memory Databases

Understanding Hadoop in 8 Minutes

Diego Marinho de Oliveira的更多文章

Deep Learning for Personalized Search and Recommender Systems

Facets: An Open Source Visualization Tool for Machine Learning Training Data

Spark: The Definitive Guide

One Model To Learn Them All

Do Balancing Classes Improve Classifier Performance?

Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders

Neural Ranking Models with Weak Supervision

Simple/Incomplete Benchmark of Machine Learning Libraries for Classification

Introducing tf-seq2seq: An Open Source Sequence-to-Sequence Framework in TensorFlow

Mask R-CNN

社区洞察

其他会员也浏览了

Oozie

Hadoop

Oozie

How To Create Hadoop Cluster In Just 10 Minutes ?

Hadoop Ecosystem Applications

HADOOP

The Power Of DistCp

HBase

Still confused about different data lakes? Hadoop Vs. In-Memory Databases

Understanding Hadoop in 8 Minutes