登录查看更多内容

The Pillar Behind Hadoop and Data Analysis

Santhosh Parampottupadam

Research Scientist | We're Hiring | German Cancer Research Center |Generative AI | PPML??| Federated Learning

发布日期: 2015年9月18日

Douglass Read "Doug" Cutting is an advocate and creator of open-source search technology. He originated Lucene and, with Mike Cafarella, Nutch, both open-source search technology projects which are now managed through the Apache Software Foundation. He is also the creator of Hadoop.

Apache Hadoop is an open-source software framework written in Java for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures (of individual machines, or racks of machines) are commonplace and thus should be automatically handled in software by the framework.

The core of Apache Hadoop consists of a storage part (Hadoop Distributed File System (HDFS)) and a processing part (MapReduce). Hadoop splits files into large blocks and distributes them amongst the nodes in the cluster. To process the data, Hadoop MapReduce transfers packaged code for nodes to process in parallel, based on the data each node needs to process. This approach takes advantage of data locality—nodes manipulating the data that they have on hand—to allow the data to beprocessed faster and more efficiently than it would be in a more conventional supercomputer architecture that relies on a parallel file system where computation and data are connected via high-speed networking.

The base Apache Hadoop framework is composed of the following modules:

Hadoop Common – contains libraries and utilities needed by other Hadoop modules;
Hadoop Distributed File System (HDFS) – a distributed file-system that stores data on commodity machines, providing very high aggregate bandwidth across the cluster;
Hadoop YARN – a resource-management platform responsible for managing computing resources in clusters and using them for scheduling of users' applications; and
Hadoop MapReduce – a programming model for large scale data processing

要查看或添加评论，请登录

Santhosh Parampottupadam的更多文章

Cancer - A Must Read

2023年11月1日

Cancer - A Must Read

Key facts Cancer is a leading cause of death worldwide, accounting for nearly 10 million deaths in 2020, or nearly one…
TCS is HIRING ........!!!

2016年12月13日

TCS is HIRING ........!!!

TCS Hiring ..
Analytics with Amazon Kinesis

2016年11月9日

Analytics with Amazon Kinesis

Amazon Kinesis Analytics Amazon Kinesis Analytics is the easiest way to process streaming data in real time with…
Data Science Do's and Don'ts

2016年1月25日

Data Science Do's and Don'ts

Being a data scientist, as the name misappropriates, is not really an exact science, it is more of a trade. In the…
Splunk Opens New World of Opportunity for Hadoop Users

2015年10月12日

Splunk Opens New World of Opportunity for Hadoop Users

Splunk? Hadoop Connect and the Splunk App for HadoopOps Address Common Challenges Deploying and Running Hadoop O'Reilly…
Apple strikes $25m deal for US big data mapping company

2015年9月18日

Apple strikes $25m deal for US big data mapping company

Apple is beefing up its big data analytics capabilities by sealing a deal to acquire one of Silicon Valley’s most…
SCALA vs JAVA

2015年9月17日

SCALA vs JAVA

There is admittedly some truth to the statement that “Scala is hard”, but the learning curve is well worth the…
10 Mistakes Enterprises Make in Big Data Projects

2015年9月17日

10 Mistakes Enterprises Make in Big Data Projects

Avoid common pitfalls when planning, creating, and implementing big data initiatives: 1 Lacking a business case Big…

1 条评论
TCS Hiring Experienced Professionals !! Urgent Requirements...Join Us ...!!!

2015年9月14日

TCS Hiring Experienced Professionals !! Urgent Requirements...Join Us ...!!!

Send Resume along with PAN number to : Santhosh.p7@tcs.
Big data Spark vs Flink

2015年9月7日

Big data Spark vs Flink

at do they have in common? Flink and Spark are both general-purpose data processing platforms and top level projects of…

2 条评论

See all articles

The Pillar Behind Hadoop and Data Analysis

Santhosh Parampottupadam

Research Scientist | We're Hiring | German Cancer Research Center |Generative AI | PPML??| Federated Learning

Santhosh Parampottupadam的更多文章

社区洞察

其他会员也浏览了

Hadoop 3: Comparison with Hadoop 2 and Spark

Apache Spark with Hadoop - Why it Matters?

#bigdata 30e?—?Apache Flume and Sqoop

HIVE

11 Key Tuning Checklists for Apache Hadoop!

Writable and WritableComparable in Hadoop

Big Data Hadoop's Definition Explained

Best Big Data framework: Apache Spark Vs Hadoop Mapreduce

Integrating LVM with Hadoop and providing Elasticity to DataNode Storage

HADOOP

Santhosh Parampottupadam的更多文章

Cancer - A Must Read

TCS is HIRING ........!!!

Analytics with Amazon Kinesis

Data Science Do's and Don'ts

Splunk Opens New World of Opportunity for Hadoop Users

Apple strikes $25m deal for US big data mapping company

SCALA vs JAVA

10 Mistakes Enterprises Make in Big Data Projects

TCS Hiring Experienced Professionals !! Urgent Requirements...Join Us ...!!!

Big data Spark vs Flink

社区洞察

其他会员也浏览了

Hadoop 3: Comparison with Hadoop 2 and Spark

Apache Spark with Hadoop - Why it Matters?

#bigdata 30e?—?Apache Flume and Sqoop

HIVE

11 Key Tuning Checklists for Apache Hadoop!

Writable and WritableComparable in Hadoop

Big Data Hadoop's Definition Explained

Best Big Data framework: Apache Spark Vs Hadoop Mapreduce

Integrating LVM with Hadoop and providing Elasticity to DataNode Storage

HADOOP