HBase MapReduce Integration
HBase MapReduce Integration

HBase MapReduce Integration

What is MapReduce?

In order to solve the problem of processing in excess of terabytes of data in a scalable way, MapReduce process was designed. However, to build such a system that increases in performance linearly with the number of physical machines added, there should be a proper way. Basically, this is what the main purpose of MapReduce.

Let’s revise HBase Architecture 

Moreover, by splitting the data located on a distributed file system, it follows a divide-and-conquer approach. Hence, the servers which are available can access these chunks of data and also can process them as fast as they can. However, we will have to consolidate the data at the end with this approach. So, MapReduce has this built right into it, again.

Classes

Here in the above MapReduce process figure, all the classes which are involved in the Hadoop implementation of MapReduce, is shown, let’s learn them in detail:

i. InputFormat

At very first, InputFormat splits the input data and further returns a RecordReader instance which defines the classes of the key and value objects. Also, it helps to iterate over each input record, with the help of next() method.

Explore features of HBase

ii. Mapper

Now, by using the map() method, each record read using the RecordReader is processed, in this step.

iii. Reducer

This stage is as same as Mapper stage. Here we use to process the output of a Mapper class after shuffling and sorting of data

iv. OutputFormat

Finally, OutputFormat class hold the data in various locations. Here are some specific implementations which allow output to files, or in the case of the TableOutputFormat class to HBase tables. Moreover, to write the data into the specific HBase output table, it uses a TableRecord Writer.

Supporting Classes in MapReduce Integration

Now, in setting up MapReduce jobs over HBase, the MapReduce support comes with the TableMapReduceUtil class. There are some static methods which help to configure a job, hence we can run it with HBase as the source and/or the target.

Let’s revise HBase Use Cases and Real-time Applications

Read Complete Article>>



要查看或添加评论,请登录

Malini Shukla的更多文章

社区洞察

其他会员也浏览了