Intuitive Explanation of "MapReduce"?
Hadoop MapReduce - simple explanation

Intuitive Explanation of "MapReduce"

How many unique words are there in this sentence which you are reading? The answer which you will say is 12 (Note: word ‘are’ is repeated twice) which was rather quite easy. You just counted all those words in that given sentence. Taking this to the next level, if I ask you to count all the words and their frequencies in this post, you’ll have to spend more time and effort doing the same stuff as before, which might look something like this:

  • For each new word add a frequency 1 (the Map part)
  • For each word that has been seen before, add 1 to the existing value (the Map part)
  • At the end of the post, sum up all the values and give the final result (the Reduce part)

While this sounds tedious, it's still achievable. This can be done within a reasonable time frame.

Taking this to the next level, if I ask you to count all the words and their respective frequencies website www.analyticsbot.ml, and if you give me the correct answer within 30 minutes, I will give you a gift. :) You’ll be tempted at first, but there is no way that you can do that manually, all by yourself, within the given time frame. Some of you might feel like this.

If you are one of the clever folks, you might follow the approach below and give back the result well within the time limit. Here’s how you might approach the problem.

  • Get the total number of pages on this website and their URLs. Let’s say there are ‘n’ pages.
  • You ask ‘n’ of your friends or relatives to take up each a page, and report the result to you, upon which you report the result to me.
  • One thing that can make the process fast is if all the same keywords are given to the same person.
  • Now each of those ‘n’ friends do the same work that we discussed before and report to you. You, being the manager report the final calculations back to me.

Sounds interesting?

You must have observed the process that were followed in both the steps. Both were same. Now if I ask you to count words and their frequencies on multiple websites, you can easily follow the above approach. Well almost. The only change you might have to do is to increase or decrease the number of friends/relatives depending on the scale of data you are dealing with. Well, the only problem is humans don’t want to do this boring and mundane work. Instead we rely on machines to do this for us. Comes Apache Hadoop MapReduce, the savior.

Taking an example of how the mappers and reducers work in the word count example we discussed above. Refer to this image on the right where we want to count the frequency of words in "Dear Boar River Car Car River Deer Car Bear". The first step is to split the data on difference nodes on which mappers would spit <word, 1> for every word. The next step is to shuffle the same set of keywords to the same reducer which would sum up the counts and send the final results.

Note: This is a part of my earlier post on LinkedIn (some users appreciated the explanation). Also published on my blog analyticsbot.ml


要查看或添加评论,请登录

Ravi Shankar的更多文章

  • How I started with Deep Learning?

    How I started with Deep Learning?

    Note: In this post, I talk about my learning in deep learning, the courses I took to understand, and the widely used…

    4 条评论
  • Measuring Text Similarity in Python

    Measuring Text Similarity in Python

    Note: This article has been taken from a post on my blog. A while ago, I shared a paper on LinkedIn that talked about…

    1 条评论
  • Getting started with Apache Spark

    Getting started with Apache Spark

    If you are in the big data space, you must have head of these two Apache Projects – Hadoop & Spark. To read more on…

  • Getting started with Hadoop

    Getting started with Hadoop

    Note: This is a long post. It talks about big data as a concept, what is Apache Hadoop, "Hello World" program of Hadoop…

    7 条评论
  • What is the Most Complex thing in the Universe?

    What is the Most Complex thing in the Universe?

    What is the most complex piece of creation (natural/artificial) in this universe? Is it the human brain? But if the…

    11 条评论
  • Automate Finding Items on Craigslist || Python & Selenium to the Rescue

    Automate Finding Items on Craigslist || Python & Selenium to the Rescue

    If necessity is the mother of invention, then laziness is sometimes its father! Craigslist, especially in the United…

    7 条评论
  • Getting Started with Python!

    Getting Started with Python!

    Note: This post is only for Python beginners. If you are comfortable with it, there might be nothing new to learn.

    2 条评论
  • L1, L2 Regularization – Why needed/What it does/How it helps?

    L1, L2 Regularization – Why needed/What it does/How it helps?

    Simple is better! That’s the whole notion behind regularization. I recently wrote about Linear Regression and Bias…

    4 条评论
  • Bias-Variance Tradeoff: What is it and why is it important?

    Bias-Variance Tradeoff: What is it and why is it important?

    What is Bias- Variance Tradeoff? The bias-variance tradeoff is an important aspect of machine/statistical learning. All…

    7 条评论
  • Understanding Linear Regression

    Understanding Linear Regression

    In my recent post on my blog, I tried to present my understanding of linear regression with charts and tables. Here's…

社区洞察

其他会员也浏览了