登录查看更多内容

Incremental Computation on Hadoop and MapReduce at Scale

Dr. Ganapathi Pulipaka

发布日期: 2016年7月12日

MapReduce framework is not designed for incremental computation. Systems with incremental computation require processing of the large-scale datasets on their own that get added over to the system and the existing and historic entries get deleted or modified due to the evolving dynamics. Google’s Percolator is a tool that can perform the incremental computation. Incoop is initial and generic MapReduce framework that can be leveraged for incremental computations. The advanced data analytics tasks performed by the search engines on world wide web such as web crawl to build a web index or to run the PageRank algorithm will only detect a normal data-scale data changes with the delta mechanism of old and new data with a speed range of 10 to 1000 times.

The incremental MapReduce framework can be applied to several fields such as web crawls, PageRanking, life science computing, graph processing, text processing, machine learning, data mining, and relational data processing. The development of parallel algorithms through IncMR framework that can embed within the original APIs of MapReduce to avoid redesigning the APIs or writing new application algorithms to leverage incremental MapReduce framework. These programmatic algorithms can aid incremental data processing by detecting the data modifications in the inputs and reverse the intermediate states of the data, bolster the map and reduce functions. The algorithms are quick to detect any new inputs to trigger autonomous jobs to the master node.

Dr. Ganapathi Pulipaka的更多文章

Can US Launch Next Generation AI Weapon Program

2023年8月17日

Can US Launch Next Generation AI Weapon Program

The next generation fighter jet program in America is truly impressive. With the advancements in global technology, the…

1 条评论
10 Most Influential Artificial Intelligence Executives in 2019 On The Globe by @analyticsinme - Analytics InSight Magazine

2019年5月22日

10 Most Influential Artificial Intelligence Executives in 2019 On The Globe by @analyticsinme - Analytics InSight Magazine

Dr. Ganapathi Pulipaka is a Chief Data Scientist for AI strategy, architecture, application development of Machine…

1 条评论
The Future Of Humanity: Artificial Intelligence by Buzzfeed Magazine.

2019年5月16日

The Future Of Humanity: Artificial Intelligence by Buzzfeed Magazine.

Take note of these two words: Artificial Intelligence. They will not hear about anything else with more emphasis on the…
Data Superheroes among US: The Whole Next Level of Human Brain by Brooke Whistance via @TheOdyssey

2019年5月14日

Data Superheroes among US: The Whole Next Level of Human Brain by Brooke Whistance via @TheOdyssey

Every individual possesses a specific talent and ability and sometimes more than one skill and different abilities can…
A New Book: The Future of Data Science and Parallel Computing

2018年8月13日

A New Book: The Future of Data Science and Parallel Computing

A New book Released https://www.amazon.

1 条评论
Building a Neural Net to Visualize High-Dimensional Data in TensorFlow

2018年6月19日

Building a Neural Net to Visualize High-Dimensional Data in TensorFlow

Word embeddings and high-dimensional data are ubiquitous in many facets of deep learning research such as natural…
Installation Guide for TensorFlow on macOS High Sierra 10.13.4 for your DeepLearning w/ Java, C, and Go

2018年6月19日

Installation Guide for TensorFlow on macOS High Sierra 10.13.4 for your DeepLearning w/ Java, C, and Go

This installation particularly focuses on macOS High Sierra version 10.13.

1 条评论
Ranked as Top Business Intelligence and Analytics Influencer for 2018 by Onalytica

2018年6月18日

Ranked as Top Business Intelligence and Analytics Influencer for 2018 by Onalytica

https://www.onalytica.
Tera-Peta-Exa-Zetta-Yotta: The Road to Technological Singularity - Interview with MirrorReview

2018年6月15日

Tera-Peta-Exa-Zetta-Yotta: The Road to Technological Singularity - Interview with MirrorReview

Modern technology has unlocked the data fabric of analytics with the potential of machine intelligence in day-to-day…

3 条评论
A Data Science Guide and Predictions for Future by GP Pulipaka published by Onalytica

2018年6月14日

A Data Science Guide and Predictions for Future by GP Pulipaka published by Onalytica

Key Topics: Machine Learning, Deep Learning, Data Science, IoT, SAP, Cloud Computing, Distributed Computing, Networks…

See all articles

Incremental Computation on Hadoop and MapReduce at Scale

Dr. Ganapathi Pulipaka

Dr. Ganapathi Pulipaka的更多文章

社区洞察

其他会员也浏览了

Hadoop Vs Spark

Hadoop - Managers' snapshot

Opensource for 5G A Neanderthal’s Guide : Corral Big Data with Hadoop and Apache Spark @

Comparing Spark and MapReduce: The Pros and Cons of Two Popular Big Data Processing Frameworks on the Hadoop Ecosystem

MapReduce Explained: A Tale of Love, Betrayal, and Distributed Computing

MapReduce Program Paradigm

Day 1 - 15Day Databricks: Spark Architecture & Internal Working Mechanism

Big Data and MapReduce

Dr. Ganapathi Pulipaka的更多文章

Can US Launch Next Generation AI Weapon Program

10 Most Influential Artificial Intelligence Executives in 2019 On The Globe by @analyticsinme - Analytics InSight Magazine

The Future Of Humanity: Artificial Intelligence by Buzzfeed Magazine.

Data Superheroes among US: The Whole Next Level of Human Brain by Brooke Whistance via @TheOdyssey

A New Book: The Future of Data Science and Parallel Computing

Building a Neural Net to Visualize High-Dimensional Data in TensorFlow

Installation Guide for TensorFlow on macOS High Sierra 10.13.4 for your DeepLearning w/ Java, C, and Go

Ranked as Top Business Intelligence and Analytics Influencer for 2018 by Onalytica

Tera-Peta-Exa-Zetta-Yotta: The Road to Technological Singularity - Interview with MirrorReview

A Data Science Guide and Predictions for Future by GP Pulipaka published by Onalytica

社区洞察

其他会员也浏览了

Hadoop Vs Spark

Hadoop - Managers' snapshot

Opensource for 5G A Neanderthal’s Guide : Corral Big Data with Hadoop and Apache Spark @

Comparing Spark and MapReduce: The Pros and Cons of Two Popular Big Data Processing Frameworks on the Hadoop Ecosystem

MapReduce Explained: A Tale of Love, Betrayal, and Distributed Computing

MapReduce Program Paradigm

Day 1 - 15Day Databricks: Spark Architecture & Internal Working Mechanism

Big Data and MapReduce