When MAPREDUCE is not suitable for processing?
A brief of MapReduce before answering to this question,
MapReduce is a powerful programming model in the Hadoop framework which is used to process and generate big data sets (large volume data) with a parallel and distributed algorithm on a Hadoop cluster. It has two programs - Map and Reduce
Image Credit - Edupristine.com
Now coming to question, though MapReduce works great if used as intended but there are few scenarios where it is not suitable or recommended -
领英推荐
There might be other cases as well. But it depends on how efficiently it is being used.
To discuss further, main reason behind the large response/processing time in MR is that all the intermediate results/data are stored on disk and then to process that intermediate dataset further, it is again read from disk. So there are so many read/write I/O operations happen during MapReduce processing which consume a lot of time.
This problem is taken care in Spark as it processes and keeps all the intermediate results in memory itself. By doing this simple architectural change only Spark became many folds faster than MR. Obviously there are other features in Spark which make it faster but this is very basic change. I can cover Spark in separate article.
#hadoop #mapreduce #bigdata #spark #learning
RPA Solutions Consultant specializing in Hyperautomation & Intelligent Automation
3 年Welcome to club
Associate(Technology) at Goldman Sachs | NSIT'18
3 年Apache flink overcomes these disadvantages of mapReduce.