Map Reduce

Map Reduce

What is MapReduce?

MapReduce is a processing technique and a program model for distributed computing based on java. The MapReduce algorithm contains two important tasks, namely Map and Reduce. Map takes a set of data and converts it into another set of data, where individual elements are broken down into tuples (key/value pairs). Secondly, reduce task, which takes the output from a map as an input and combines those data tuples into a smaller set of tuples. As the sequence of the name MapReduce implies, the reduce task is always performed after the map job.

The major advantage of MapReduce is that it is easy to scale data processing over multiple computing nodes. Under the MapReduce model, the data processing primitives are called mappers and reducers. Decomposing a data processing application into?mappers?and?reducers?is sometimes nontrivial. But, once we write an application in the MapReduce form, scaling the application to run over hundreds, thousands, or even tens of thousands of machines in a cluster is merely a configuration change. This simple scalability is what has attracted many programmers to use the MapReduce model

The Algorithm

  • Generally MapReduce paradigm is based on sending the computer to where the data resides!
  • MapReduce program executes in three stages, namely map stage, shuffle stage, and reduce stage.
  • Map stage?? The map or mapper’s job is to process the input data. Generally the input data is in the form of file or directory and is stored in the Hadoop file system (HDFS). The input file is passed to the mapper function line by line. The mapper processes the data and creates several small chunks of data.
  • Reduce stage?? This stage is the combination of the?Shuffle?stage and the?Reduce?stage. The Reducer’s job is to process the data that comes from the mapper. After processing, it produces a new set of output, which will be stored in the HDFS.
  • During a MapReduce job, Hadoop sends the Map and Reduce tasks to the appropriate servers in the cluster.
  • The framework manages all the details of data-passing such as issuing tasks, verifying task completion, and copying data around the cluster between the nodes.
  • Most of the computing takes place on nodes with data on local disks that reduces the network traffic.
  • After completion of the given tasks, the cluster collects and reduces the data to form an appropriate result, and sends it back to the Hadoop server.

要查看或添加评论,请登录

Anu Priya的更多文章

  • Predictive Analytics

    Predictive Analytics

    What is predictive analytics? Predictive analytics is a branch of advanced analytics that makes predictions about…

  • Springboot

    Springboot

    Spring Boot is an open source Java-based framework used to create a micro Service. It is developed by Pivotal Team and…

  • Business Intelligence

    Business Intelligence

    What Is Business Intelligence (BI)? Business intelligence (BI) refers to the procedural and technical infrastructure…

  • SharePoint

    SharePoint

    What is Microsoft SharePoint and what is it used for? Microsoft SharePoint is a document management and collaboration…

  • Snowflake

    Snowflake

    What is a Snowflake data warehouse? Snowflake is the first analytics database built with the cloud and delivered as a…

  • Automation Testing.

    Automation Testing.

    What is Automation Testing? Automation Testing is a software testing technique that performs using special automated…

  • DevOps

    DevOps

    DevOps is a set of practices, tools, and a cultural philosophy that automate and integrate the processes between…

  • Cloud Ops

    Cloud Ops

    What is Cloud Operations (CloudOps)? Cloud Operations (CloudOps) is the practice of managing delivery, tuning…

  • Collibra

    Collibra

    What is Collibra? Collibra is a data catalog platform and tool that helps organizations better understand and manage…

  • Microsoft Outlook

    Microsoft Outlook

    What is Microsoft Outlook? Microsoft Outlook is the preferred email client used to send and receive emails by accessing…

社区洞察

其他会员也浏览了