Using Aggregation Framework and creating Mapper and Reducer Program

Using Aggregation Framework and creating Mapper and Reducer Program

ARTH - Task 34 ???????

Task Description --- ?? Use Aggression Framework of MongoDB and Create Mapper and Reducer Program.

MongoDB is the first database which comes to mind when we have work on unstructured data and manipulate the shape of data quickly and efficiently. For this MongoDB comes with a powerful framework which the Aggregation Framework to manipulate data effectively in the server itself.

What is Aggregation Framework ?

The Aggregation framework is just a way to query documents in a collection in MongoDB. This framework exists because when you start working with and manipulating data, you often need to crunch collections together, modify them, pluck out fields, rename fields, concatinate them together, group documents by field, explode array of fields in different documents and so on.

The simple query set in MongoDB only allows you to retrieve full or parts of individual documents. They don't really allow you to manipulate the documents on the server and then return them to your application. This is where the aggregation framework from MongoDB comes in. It's nothing external, as aggregation comes baked into MongoDB.?

What is Pipeline in Aggregation Framework ?

The Aggregation framework relies on the pipeline concept. The pipeline consists of certain stages where certain?operators?modify the documents in the collection using various techniques. Finally, the output is returned to the application calling the query.

No alt text provided for this image

In MongoDB, the pipeline is an array consisting of various operators, which take in a bunch of documents and spit out modified documents according to the rules specified by the programmer. The next operator takes in the documents spat out by the previous operator, hence, it's called a pipeline. Some of the example of operators are $group , $match , $project, $sort etc.

Comparing it with the query like find() , which works in most of the cases but not handy when we want to modify and retrieve data at the same time.

Some Practical Demonstrations of Using Aggregation Pipelines

Here I have a collection of dummy data of user of a company with complete details like name, age , gender, state etc. In this article I will be using this type of collection.

No alt text provided for this image

Let's create a Pipeline to determine the number of male in each state.

> db.contacts.aggregate([

... {$match : {gender : "male"}},
        
... {$group : {_id: {state : "$location.state"} , total_males : {$sum : 1} }} ])        

The $match operator will collect all the documents with "gender" field as "male" and pass to next operator. Next operator is $group which will group the documents according to their state and also keep the count of total males in that state .

No alt text provided for this image

What is Map Reduce ?

MapReduce?is a programming paradigm that works on a big data over distributed system. It analysis data and produce aggregated results. Key / values pairs have declared in the map function which we use this values to accumulate data. Later in reduce function we use this accumulated data, accumulated in the map function, to convert them into the aggregated results. The Mapper and Reducer Programs are written in JavaScript language.

No alt text provided for this image

Let's create a Map-Reduce program to calculate average age of all males and females -

No alt text provided for this image
> var mapfun = function() {

... emit(this.gender , this.dob.age);
        
... };        

The above code is a map function to input documents. This above function "mapfun" map the gender of the person to age of that person. 'this' keyword used .

No alt text provided for this image
> var reducefun = function(gneder , age) {

... return Array.avg(age);
        
... };        

This is the code of Reducer function to process the incoming data . The function "reducefun" has two arguments - 'age' argument is array of the value of 'dob.age' emitted by Mapper function which are grouped by 'gender' argument. This function returns the average of ages of all the males and females in that particular collection.

Next step is to put these function in the mapReduce function .

No alt text provided for this image
> db.contacts.mapReduce(
... mapfun,
... reducefun,
... {out : "map_reduce11"}        

... )
        

The function 'mapfun' will take input documents from 'contacts' collection and pass to the Reducer function 'reducefun' which then calculate the average of male and female ages. Next the 'out' keyword receive the data form 'reducerfun' and saves them to "map_reduce11" collection . If their is no collection of the given name then new collection will be created and data will be stored in that . If the collection already exits with some other data then older data will be overwritten.

We can see whether new collection is created or not using (>show collections ) command and then view data in that collection using (>db.map_reduce11.find() ) command.

No alt text provided for this image

Let's do the same task using the Aggregation framework -

No alt text provided for this image
> db.contacts.aggregate([
... {$group : {_id : "$gender" , value: {$avg : "$dob.age"} }},
... {$out : "aggregation11"}        

... ])
        

The above code I have first grouped all the documents on the gender basis then found the average of their ages. In the next step I have saved the results in the "aggregation11" collection.

we can view the data in the collection using the command "db.aggregation.find()" command.

No alt text provided for this image

Example -2 : Here I am going to create a Map-Reduce program to calculate the number of males and females in a state.

No alt text provided for this image
> var mapfun22 = function() { 
emit(this.location.state , this.gender); 
};        

This Mapper function will collection the Map the gender of each document with the name of it's state.

No alt text provided for this image
> var reducerfun22 = function(state ,gender){ 
var result = {male : 0 , female : 0}; 
for (var idx =0 ; idx < gender.length ; ++idx){ 
if(gender[idx] == "male")? result.male++; 
else? result.female++ ; 
} 
return result; 
};        

The above reducer function will count the number of males and females in a state and return results to next statement.

No alt text provided for this image
> db.contacts.mapReduce( 
mapfun22, 
reducerfun22, 
{out : "map_reduce22"} 
)        

Here we have clubbed the Mapper and Reducer functions and saved final output to "map_reduce22" collection.

To view the results of the program we have to use command (>db.map_reduce22.find() )--

No alt text provided for this image

Here _id is equivalent to "location.state" field of the original dataset.


THANK YOU FOR READING.....

要查看或添加评论,请登录

Vivek Sharma的更多文章

社区洞察

其他会员也浏览了