Aggregation framework and Map Reduce in Mongodb

Aggregation framework and Map Reduce in Mongodb

Task Description :- Use Aggression Framework of MongoDB and Create Mapper and Reducer Program.

What is NoSQL ?

No alt text provided for this image


NoSQL databases (aka "not only SQL") are non tabular, and store data differently than relational tables. NoSQL databases come in a variety of types based on their data model. The main types are document, key-value, wide-column, and graph. They provide flexible schemas and scale easily with large amounts of data and high user loads.

?What is MongoDB ?

  • MongoDB?stores data in flexible, JSON-like documents, meaning fields can vary from document to document and data structure can be changed over time
  • The document model?maps to the objects in your application code, making data easy to work with

?Mongodb is a source-available cross-platform document-oriented database program. Classified as a NoSQL database program, MongoDB uses JSON-like documents with optional schemas.

  • Ad hoc queries, indexing, and real time aggregation?provide powerful ways to access and analyze your data
  • MongoDB is a?distributed database at its core, so high availability, horizontal scaling, and geographic distribution are built in and easy to use

What is MongoDB Aggregation Framework ?

No alt text provided for this image

Aggregation operations process data records and return computed results. Aggregation operations group values from multiple documents together, and can perform a variety of operations on the grouped data to return a single result. MongoDB provides three ways to perform aggregation: the aggregation pipeline, the map-reduce function, and single purpose aggregation methods.

What is Aggregation Pipeline ?

MongoDB’s aggregation framework is modeled on the concept of data processing pipelines. Documents enter a multi-stage pipeline that transforms the documents into an aggregated result.

What is Map Reduce Function ?

No alt text provided for this image

MapReduce is?a software framework and programming model used for processing huge amounts of data. MapReduce program work in two phases, namely, Map and Reduce. Map tasks deal with splitting and mapping of data while Reduce tasks shuffle and reduce the data.

Map-reduce is a data processing paradigm for condensing large volumes of data into useful?aggregated?results.

We will perform this using two MongoDB Aggregation Framework :

  1. Aggregation Pipeline
  2. Map-Reduce Function

Method 1: Aggregation Pipeline

db.countries.aggregate([{$group: {_id: {Language: “$Language”},
totalCountry: {$sum: 1}}}, {$sort: {totalCountry: 1}}])

# {$group: {_id: {Language: "$Language"} -->  group by Language

# totalCountry: {$sum: 1} --> count the total countries asscoiated
with that language

# {$sort: {totalCountry: 1} --> sort them in ascending order
        

Method 2: Map Reduce Function

var mapFunction = function() { … };
var reduceFunction = function(key, values) { … };
db.runCommand(
 {
 mapReduce: <input-collection>,
 map: mapFunction,
 reduce: reduceFunction,
 out: { merge: <output-collection> },
 query: <query>
 }
 )

        

Declaring Map variable :

var mapFunc1 = function()  {
  var cntry = emit(this.Language, this.CountryName);  
  $split: [ cntry, "," ];
};
# defined country variable which will be grouping the data based on Language and Country Name and then splitting the data by comma         

Declaring Reduce variable :

var ReduceFunc1 = function(keyLang, valuesCountryName) { 
 return valuesCountryName.length;
};
# after grouping, here we are counting the number of countries after the output is been sent by mapper         

Using Map Reduce Function :

db.countries.mapReduce(
   mapFunc1,
   ReduceFunc1, 
   {out: "map_reduced"} 
)
# now using map reduce function and saving it in map_reduced collection         

!!!!!!!!!!!!!!!!!!!!!!!!!1Thanks for Reading !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

要查看或添加评论,请登录

Anurag Vashishth的更多文章

社区洞察

其他会员也浏览了