登录查看更多内容

HBase MapReduce Integration

Malini Shukla

Senior Data Scientist || Hiring || 6M+ impressions || Trainer || Top Data Scientist || Speaker || Top content creator on LinkedIn || Tech Evangelist

发布日期: 2018年8月10日

What is MapReduce?

In order to solve the problem of processing in excess of terabytes of data in a scalable way, MapReduce process was designed. However, to build such a system that increases in performance linearly with the number of physical machines added, there should be a proper way. Basically, this is what the main purpose of MapReduce.

Let’s revise HBase Architecture

Moreover, by splitting the data located on a distributed file system, it follows a divide-and-conquer approach. Hence, the servers which are available can access these chunks of data and also can process them as fast as they can. However, we will have to consolidate the data at the end with this approach. So, MapReduce has this built right into it, again.

Classes

Here in the above MapReduce process figure, all the classes which are involved in the Hadoop implementation of MapReduce, is shown, let’s learn them in detail:

i. InputFormat

At very first, InputFormat splits the input data and further returns a RecordReader instance which defines the classes of the key and value objects. Also, it helps to iterate over each input record, with the help of next() method.

Explore features of HBase

ii. Mapper

Now, by using the map() method, each record read using the RecordReader is processed, in this step.

iii. Reducer

This stage is as same as Mapper stage. Here we use to process the output of a Mapper class after shuffling and sorting of data

iv. OutputFormat

Finally, OutputFormat class hold the data in various locations. Here are some specific implementations which allow output to files, or in the case of the TableOutputFormat class to HBase tables. Moreover, to write the data into the specific HBase output table, it uses a TableRecord Writer.

Supporting Classes in MapReduce Integration

Now, in setting up MapReduce jobs over HBase, the MapReduce support comes with the TableMapReduceUtil class. There are some static methods which help to configure a job, hence we can run it with HBase as the source and/or the target.

Let’s revise HBase Use Cases and Real-time Applications

Read Complete Article>>

要查看或添加评论，请登录

Malini Shukla的更多文章

Top 9 Computer Vision Project Ideas for Beginners

2020年1月21日

Top 9 Computer Vision Project Ideas for Beginners

Understand the visual world around us Computer Vision Projects Computer vision is the most powerful and compelling type…
12 Cool Data Science project ideas with source code - "Strengthen your Resume"

2019年11月13日

12 Cool Data Science project ideas with source code - "Strengthen your Resume"

INTRODUCTION Data Science, a field that brings out wonders almost every second day and that’s why it is often regarded…

3 条评论
Python Coding Interview Questions for Experienced - Python FAQ's

2019年9月30日

Python Coding Interview Questions for Experienced - Python FAQ's

Firstly, If you are here, you probably already have a interview scheduled so my friend all the very best with that…
How Data Science is the Backbone of Retail?

2019年7月16日

How Data Science is the Backbone of Retail?

Data Science is having an increasing impact on business models in all industries. And in today’s digital world, data…
How to Get The Coolest & The Sexiest Job Of the Century- “Become a Data Scientist”

2019年7月9日

How to Get The Coolest & The Sexiest Job Of the Century- “Become a Data Scientist”

“The goal is to turn data into information, and information into insight” Data Scientist is an analytical data expert…
What’s the Best programming Language to Start a Career in Data Science?

2019年6月25日

What’s the Best programming Language to Start a Career in Data Science?

If you are thinking which programming languages should I learn to Master data Science in 2019? Then you are at the…

1 条评论
11 Reason Why TensorFlow is So Popular

2019年6月15日

11 Reason Why TensorFlow is So Popular

TensorFlow Features | Why TensorFlow Is So Popular TensorFlow gives us an interactive multiplatform programming…
20 Deep Learning Terminologies You Must Know

2019年6月14日

20 Deep Learning Terminologies You Must Know

Deep Learning Terminologies a. Recurrent Neuron It’s one of the best from the Deep Learning Terminologies.

2 条评论
TensorFlow Performance Optimization – Tips To Improve Performance

2019年6月12日

TensorFlow Performance Optimization – Tips To Improve Performance

Ways for TensorFlow Performance Optimization There a variety of ways through which you can optimize your hardware tools…
Top 9 Reasons Why QlikView is Best in BI

2019年6月11日

Top 9 Reasons Why QlikView is Best in BI

QlikView Features Below are the 9 Features of QlikView, which gives us the importance of QlikView, let’s discuss them:…

See all articles

HBase MapReduce Integration

Malini Shukla

Senior Data Scientist || Hiring || 6M+ impressions || Trainer || Top Data Scientist || Speaker || Top content creator on LinkedIn || Tech Evangelist

What is MapReduce?

i. InputFormat

ii. Mapper

iii. Reducer

iv. OutputFormat

Supporting Classes in MapReduce Integration

Malini Shukla的更多文章

社区洞察

其他会员也浏览了

Hadoop vs Spark Comparison

Understanding the MapReduce Workflow: A Detailed Guide

HCatalog MapReduce Integration

Task Efficiency: A Comparative Study of Hadoop MapReduce, Apache Spark

Apache Spark vs. Hadoop MapReduce

Apache Hadoop vs Apache Spark

Apache Spark on YARN Architecture

Map Reduce

Apache Spark with Hadoop - Why it Matters?

Introduction to Hadoop MapReduce

What is MapReduce?

i. InputFormat

ii. Mapper

iii. Reducer

iv. OutputFormat

Supporting Classes in MapReduce Integration

Malini Shukla的更多文章

Top 9 Computer Vision Project Ideas for Beginners

12 Cool Data Science project ideas with source code - "Strengthen your Resume"

Python Coding Interview Questions for Experienced - Python FAQ's

How Data Science is the Backbone of Retail?

How to Get The Coolest & The Sexiest Job Of the Century- “Become a Data Scientist”

What’s the Best programming Language to Start a Career in Data Science?

11 Reason Why TensorFlow is So Popular

20 Deep Learning Terminologies You Must Know

TensorFlow Performance Optimization – Tips To Improve Performance

Top 9 Reasons Why QlikView is Best in BI

社区洞察

其他会员也浏览了

Hadoop vs Spark Comparison

Understanding the MapReduce Workflow: A Detailed Guide

HCatalog MapReduce Integration

Task Efficiency: A Comparative Study of Hadoop MapReduce, Apache Spark

Apache Spark vs. Hadoop MapReduce

Apache Hadoop vs Apache Spark

Apache Spark on YARN Architecture

Map Reduce

Apache Spark with Hadoop - Why it Matters?

Introduction to Hadoop MapReduce