登录查看更多内容

Using Aggregation Framework and creating Mapper and Reducer Program

Vivek Sharma

RedHat Certified in Containers and Kubernetes(EX180) ||Arth || Aspiring DevOps Engineer || Python || Docker || Ansible || Jenkins || Kubernetes || OpenShift || Terraform || GitHub || AWS || AZURE

发布日期: 2021年7月11日

+ 关注

ARTH - Task 34 ???????

Task Description --- ?? Use Aggression Framework of MongoDB and Create Mapper and Reducer Program.

MongoDB is the first database which comes to mind when we have work on unstructured data and manipulate the shape of data quickly and efficiently. For this MongoDB comes with a powerful framework which the Aggregation Framework to manipulate data effectively in the server itself.

What is Aggregation Framework ?

The Aggregation framework is just a way to query documents in a collection in MongoDB. This framework exists because when you start working with and manipulating data, you often need to crunch collections together, modify them, pluck out fields, rename fields, concatinate them together, group documents by field, explode array of fields in different documents and so on.

The simple query set in MongoDB only allows you to retrieve full or parts of individual documents. They don't really allow you to manipulate the documents on the server and then return them to your application. This is where the aggregation framework from MongoDB comes in. It's nothing external, as aggregation comes baked into MongoDB.?

What is Pipeline in Aggregation Framework ?

The Aggregation framework relies on the pipeline concept. The pipeline consists of certain stages where certain?operators?modify the documents in the collection using various techniques. Finally, the output is returned to the application calling the query.

In MongoDB, the pipeline is an array consisting of various operators, which take in a bunch of documents and spit out modified documents according to the rules specified by the programmer. The next operator takes in the documents spat out by the previous operator, hence, it's called a pipeline. Some of the example of operators are $group , $match , $project, $sort etc.

Comparing it with the query like find() , which works in most of the cases but not handy when we want to modify and retrieve data at the same time.

Some Practical Demonstrations of Using Aggregation Pipelines

Here I have a collection of dummy data of user of a company with complete details like name, age , gender, state etc. In this article I will be using this type of collection.

Let's create a Pipeline to determine the number of male in each state.

> db.contacts.aggregate([

... {$match : {gender : "male"}},

... {$group : {_id: {state : "$location.state"} , total_males : {$sum : 1} }} ])

The $match operator will collect all the documents with "gender" field as "male" and pass to next operator. Next operator is $group which will group the documents according to their state and also keep the count of total males in that state .

What is Map Reduce ?

MapReduce?is a programming paradigm that works on a big data over distributed system. It analysis data and produce aggregated results. Key / values pairs have declared in the map function which we use this values to accumulate data. Later in reduce function we use this accumulated data, accumulated in the map function, to convert them into the aggregated results. The Mapper and Reducer Programs are written in JavaScript language.

Let's create a Map-Reduce program to calculate average age of all males and females -

> var mapfun = function() {

... emit(this.gender , this.dob.age);

... };

The above code is a map function to input documents. This above function "mapfun" map the gender of the person to age of that person. 'this' keyword used .

> var reducefun = function(gneder , age) {

... return Array.avg(age);

领英推荐

Timescale Newsletter ?? Learn, Build, Benchmark

Timescale 2 个月前

Timescale Newsletter ?? Create AI Embeddings in…

Timescale 3 个月前

Using Airbyte with Tabular

Tabular (now part of Databricks) 1 年前

... };

This is the code of Reducer function to process the incoming data . The function "reducefun" has two arguments - 'age' argument is array of the value of 'dob.age' emitted by Mapper function which are grouped by 'gender' argument. This function returns the average of ages of all the males and females in that particular collection.

Next step is to put these function in the mapReduce function .

> db.contacts.mapReduce(
... mapfun,
... reducefun,
... {out : "map_reduce11"}


... )

The function 'mapfun' will take input documents from 'contacts' collection and pass to the Reducer function 'reducefun' which then calculate the average of male and female ages. Next the 'out' keyword receive the data form 'reducerfun' and saves them to "map_reduce11" collection . If their is no collection of the given name then new collection will be created and data will be stored in that . If the collection already exits with some other data then older data will be overwritten.

We can see whether new collection is created or not using (>show collections ) command and then view data in that collection using (>db.map_reduce11.find() ) command.

Let's do the same task using the Aggregation framework -

> db.contacts.aggregate([
... {$group : {_id : "$gender" , value: {$avg : "$dob.age"} }},
... {$out : "aggregation11"}


... ])

The above code I have first grouped all the documents on the gender basis then found the average of their ages. In the next step I have saved the results in the "aggregation11" collection.

we can view the data in the collection using the command "db.aggregation.find()" command.

Example -2 : Here I am going to create a Map-Reduce program to calculate the number of males and females in a state.

> var mapfun22 = function() { 
emit(this.location.state , this.gender); 
};

This Mapper function will collection the Map the gender of each document with the name of it's state.

> var reducerfun22 = function(state ,gender){ 
var result = {male : 0 , female : 0}; 
for (var idx =0 ; idx < gender.length ; ++idx){ 
if(gender[idx] == "male")? result.male++; 
else? result.female++ ; 
} 
return result; 
};

The above reducer function will count the number of males and females in a state and return results to next statement.

> db.contacts.mapReduce( 
mapfun22, 
reducerfun22, 
{out : "map_reduce22"} 
)

Here we have clubbed the Mapper and Reducer functions and saved final output to "map_reduce22" collection.

To view the results of the program we have to use command (>db.map_reduce22.find() )--

Here _id is equivalent to "location.state" field of the original dataset.

THANK YOU FOR READING.....

要查看或添加评论，请登录

Vivek Sharma的更多文章

Video Chat App with Face detection using Python and OpenCV

2021年10月3日

Video Chat App with Face detection using Python and OpenCV

Hello everyone. In this article I am going to explain about my python based Video chat application.
How OSPF (Open Short Path First) Routing Protocol implemented using Dijkstra Algorithm

2021年9月3日

How OSPF (Open Short Path First) Routing Protocol implemented using Dijkstra Algorithm

WHAT IS A ROUTING PROTOCOL? A routing protocol is used to deliver application traffic. It provides appropriate…
K-means Clustering And It's Use Cases

2021年9月3日

K-means Clustering And It's Use Cases

What is clustering? Clustering is the most popular technique in unsupervised learning, where data is grouped based on…
Creating Chat Servers using Socket Programming in Python

2021年8月17日

Creating Chat Servers using Socket Programming in Python

ARTH - Task 17 ?????? ?? Create your own Chat Servers, and establish a network to transfer data using Socket Programing…
Using Flask with MongoDB and SQLite database

2021年7月7日

Using Flask with MongoDB and SQLite database

Task Description?? ?? Integrate MongoDB with flask. Replace the SQlite Database which was explained by sir in…
Image Processing using OpenCV and Python

2021年6月11日

Image Processing using OpenCV and Python

Task Description ?? ?? Task 35.1 ?? Create image by yourself Using Python Code ?? Task 35.

2 条评论
Case study and Industrial Use Cases of MongoDB

2021年5月14日

Case study and Industrial Use Cases of MongoDB

MongoDB is one of the newest competitors in the field of Data Storage. Still, it has become very popular with its…
Running GUI Application on Docker Container

2021年5月14日

Running GUI Application on Docker Container

ARTH - Task 26 ??????? Task Description ?? ?? Launch a container on docker in GUI mode ?? Run any GUI software on the…
Setup WordPress and RDS database over AWS

2021年4月30日

Setup WordPress and RDS database over AWS

ARTH Task 18 Task Description?? ?? Create an AWS EC2 instance ?? Configure the instance with Apache Webserver. ??…
Using Ansible Launching a docker container , auto updating ip of container to inventory then configuring httpd server on the container

2021年4月1日

Using Ansible Launching a docker container , auto updating ip of container to inventory then configuring httpd server on the container

Task Description ?? 14.2 Further in ARTH - Task 10 have to create an Ansible playbook that will retrieve newContainer…

See all articles

Using Aggregation Framework and creating Mapper and Reducer Program

Vivek Sharma

RedHat Certified in Containers and Kubernetes(EX180) ||Arth || Aspiring DevOps Engineer || Python || Docker || Ansible || Jenkins || Kubernetes || OpenShift || Terraform || GitHub || AWS || AZURE

What is Aggregation Framework ?

What is Pipeline in Aggregation Framework ?

Some Practical Demonstrations of Using Aggregation Pipelines

What is Map Reduce ?

领英推荐

Vivek Sharma的更多文章

社区洞察

其他会员也浏览了

2025 Guide to Architecting an Iceberg Lakehouse

January 2023 - Iceberg Community News

DBT and Databricks part 3: Loading noSQL data (from MongoDB) into Databricks

MongoDB Series - Part 1 - The Basics

Candlestick Pattern Analysis with MongoDB Vector?Search

Working with Semi-Structured JSON Data in Databricks

What is Elasticsearch and why is it so fast?

Cheers to Real-time Analytics with Apache Flink : Part 3 of 3

Building an Open, Multi-Engine Data Lakehouse with S3 and Python

Record Level Indexing in Apache Hudi Delivers 70% Faster Point Lookups

What is Aggregation Framework ?

What is Pipeline in Aggregation Framework ?

Some Practical Demonstrations of Using Aggregation Pipelines

What is Map Reduce ?

领英推荐

Vivek Sharma的更多文章

Video Chat App with Face detection using Python and OpenCV

How OSPF (Open Short Path First) Routing Protocol implemented using Dijkstra Algorithm

K-means Clustering And It's Use Cases

Creating Chat Servers using Socket Programming in Python

Using Flask with MongoDB and SQLite database

Image Processing using OpenCV and Python

Case study and Industrial Use Cases of MongoDB

Running GUI Application on Docker Container

Setup WordPress and RDS database over AWS

Using Ansible Launching a docker container , auto updating ip of container to inventory then configuring httpd server on the container

社区洞察

其他会员也浏览了

2025 Guide to Architecting an Iceberg Lakehouse

January 2023 - Iceberg Community News

DBT and Databricks part 3: Loading noSQL data (from MongoDB) into Databricks

MongoDB Series - Part 1 - The Basics

Candlestick Pattern Analysis with MongoDB Vector?Search

Working with Semi-Structured JSON Data in Databricks

What is Elasticsearch and why is it so fast?

Cheers to Real-time Analytics with Apache Flink : Part 3 of 3

Building an Open, Multi-Engine Data Lakehouse with S3 and Python

Record Level Indexing in Apache Hudi Delivers 70% Faster Point Lookups