Map Reduce

Map Reduce

MapReduce is a Java-based, distributed execution framework within the Apache Hadoop Ecosystem.

A MapReduce program is composed of a map procedure, which performs filtering and sorting (such as sorting students by first name into queues, one queue for each name), and a reduce method, which performs a summary operation (such as counting the number of students in each queue, yielding name frequencies). The "MapReduce System" (also called "infrastructure" or "framework") orchestrates the processing by marshalling the distributed servers, running the various tasks in parallel, managing all communications and data transfers between the various parts of the system, and providing for redundancy and fault tolerance.

The model is a specialization of the split-apply-combine strategy for data analysis.It is inspired by the map and reduce functions commonly used in functional programming,although their purpose in the MapReduce framework is not the same as in their original forms.The key contributions of the MapReduce framework are not the actual map and reduce functions (which, for example, resemble the 1995 Message Passing Interface standard's reduce and scatter operations), but the scalability and fault-tolerance achieved for a variety of applications due to parallelization. As such, a single-threaded implementation of MapReduce is usually not faster than a traditional (non-MapReduce) implementation; any gains are usually only seen with multi-threaded implementations on multi-processor hardware.The use of this model is beneficial only when the optimized distributed shuffle operation (which reduces network communication cost) and fault tolerance features of the MapReduce framework come into play. Optimizing the communication cost is essential to a good MapReduce algorithm.

MapReduce libraries have been written in many programming languages, with different levels of optimization. A popular open-source implementation that has support for distributed shuffles is part of Apache Hadoop. The name MapReduce originally referred to the proprietary Google technology, but has since been genericized. By 2014, Google was no longer using MapReduce as their primary big data processing model,and development on Apache Mahout had moved on to more capable and less disk-oriented mechanisms that incorporated full map and reduce capabilities.

要查看或添加评论,请登录

NISHI KUMARI的更多文章

  • Campaign Optimization Techniques

    Campaign Optimization Techniques

    Campaign optimization is a crucial aspect of any marketing strategy, whether it be for a small business or a…

  • What is Account Management?

    What is Account Management?

    Account management is a post-sales role that focuses on nurturing client relationships. Account managers have two…

  • What is Product Analytics?

    What is Product Analytics?

    Product analytics is the process of collecting and studying data on how people use your product. It tracks user…

  • Econometrics

    Econometrics

    Econometrics is the use of statistical and mathematical models to develop theories or test existing hypotheses in…

  • What is CRUD?

    What is CRUD?

    CRUD refers to the four basic operations a software application should be able to perform – Create, Read, Update, and…

  • What is Financial Modeling and How to Build it?

    What is Financial Modeling and How to Build it?

    Financial Modeling is defined as the process of developing a mathematical model or representation of a business's…

  • What is a SQL Stored Procedure?

    What is a SQL Stored Procedure?

    A SQL Stored Procedure is a collection of SQL statements bundled together to perform a specific task. These procedures…

  • Data Analysis Expressions (DAX)

    Data Analysis Expressions (DAX)

    Data Analysis Expressions (DAX) is a formula expression language used in Analysis Services, Power BI, and Power Pivot…

  • What is Django Web Framework?

    What is Django Web Framework?

    Django is a high-level Python web framework that encourages rapid development and clean, pragmatic design. It follows…

  • What is Email Marketing?

    What is Email Marketing?

    Email marketing refers to a digital marketing strategy that uses email to promote business offerings and build…

社区洞察

其他会员也浏览了