登录查看更多内容

Apache Hadoop And Its Journey

Vinod Kumar Nerella

Data Management | ETL | Big Data | Dev Ops

发布日期: 2016年3月20日

Apache Hadoop is an open source scalable and fault tolerant frame work for distributed storing and processing of large sets of data, with a cost optimization to be used on commodity hardware. And It is benefitted being open source driven by flexibility and innovation, upgrades and the latest versions are managed properly.

Features of Hadoop:

Scalable - One machine to thousands of machines
Fault tolerant – Replication possible in a cluster
Open source - Community effort governed under the licensing of the Apache Software Foundation.
Distributed storage and processing - Large datasets are automatically split into blocks, and distributed across the cluster machines.
Supports Commodity hardware

History of Hadoop:

Hadoop journey started in early 2000’s, As Doug Cutting created a search engine project called Lucene and then built a scalable search engine called Nutch with Mike Cafarella. And in 2004, Nutch Distributed File system was built and released in Map reduce framework.

On January 28, 2006, the first Nutch (as it was then known) cluster went live at Yahoo. In January, 2006 Yahoo! employed Doug Cutting to help the team make the transition. In February 2006, Cutting pulled out GDFS and MapReduce out of the Nutch code base and created a new incubating project, under Lucene umbrella, which he named Hadoop.

Hadoop was sub-project of Lucene till the beginning of the year 2008. In January 2008, it became the top level project with dedicated team and committers at apache software foundation.

January 28, 2016 celebrated 10 years of improbable growth.

Santosh Nage

Senior Software Engineer

9 年

Good Work.......You can make it more informative.

1 次回应

要查看或添加评论，请登录

Vinod Kumar Nerella的更多文章

Generating High-Quality Synthetic Data with Python Faker

2024年12月26日

Generating High-Quality Synthetic Data with Python Faker

Creating realistic data is a common challenge when developing digital solutions. Using actual user information is risky…
Become a Big Data Engineer in 2018.

2018年2月20日

Become a Big Data Engineer in 2018.

As a Data Engineer, one should be able to understand computer science core components, then how to store and analyze…
National seminar on ‘Recent advances in Manufacturing and Supply Chain Management’

2017年6月5日

National seminar on ‘Recent advances in Manufacturing and Supply Chain Management’

National seminar on ‘Recent advances in Manufacturing and Supply Chain Management’ was successfully organized at KL…
How modern organizations can be successful with Data Lakes and Big Data?

2017年5月26日

How modern organizations can be successful with Data Lakes and Big Data?

Big Data According to Pareto principle 80-20 rule, nowadays the data that we deal today is 80 % semi-structured or…
Machine Learning Theory: An introduction #MachineLearning #DataScientist

2016年7月20日

Machine Learning Theory: An introduction #MachineLearning #DataScientist

In our day to day life, Machine Learning (ML) used in different applications to provide the intelligence using the data…
Points to consider and things to remember for the better and clean future from Solar Impulse journey

2016年7月16日

Points to consider and things to remember for the better and clean future from Solar Impulse journey

The world filled with excitement when wright brothers took off for the first time in an airplane and first human Neil…
Important Elements Of Ecommerce Platform

2016年7月15日

Important Elements Of Ecommerce Platform

E-commerce is booming business from long time and there are many e-commerce platform developments available in the…
India records highest-ever tea production in FY16 : Big Data in Agriculture can yield best results

2016年6月18日

India records highest-ever tea production in FY16 : Big Data in Agriculture can yield best results

India recorded its highest-ever tea production at 1,233 million kilos during 2015-16, while the exports crossed 230…
Start-up scaling: challenges and measures for scale

2016年5月2日

Start-up scaling: challenges and measures for scale

Scaling is one of the challenges that every start-up and growing companies faces in their journey. It is important for…

See all articles

Apache Hadoop And Its Journey

Vinod Kumar Nerella

Data Management | ETL | Big Data | Dev Ops

Vinod Kumar Nerella的更多文章

社区洞察

其他会员也浏览了

WHAT IS HADOOP

HADOOP

Task Efficiency: A Comparative Study of Hadoop MapReduce, Apache Spark

Hadoop 3: Comparison with Hadoop 2 and Spark

Apache Hadoop vs Apache Spark

Hadoop – Architecture

Is Hadoop the New HPC?

Hadoop

Apache Hadoop

Hadoop Tutorial – A Comprehensive Guide for beginners

Vinod Kumar Nerella的更多文章

Generating High-Quality Synthetic Data with Python Faker

Become a Big Data Engineer in 2018.

National seminar on ‘Recent advances in Manufacturing and Supply Chain Management’

How modern organizations can be successful with Data Lakes and Big Data?

Machine Learning Theory: An introduction #MachineLearning #DataScientist

Points to consider and things to remember for the better and clean future from Solar Impulse journey

Important Elements Of Ecommerce Platform

India records highest-ever tea production in FY16 : Big Data in Agriculture can yield best results

Start-up scaling: challenges and measures for scale

社区洞察

其他会员也浏览了

WHAT IS HADOOP

HADOOP

Task Efficiency: A Comparative Study of Hadoop MapReduce, Apache Spark

Hadoop 3: Comparison with Hadoop 2 and Spark

Apache Hadoop vs Apache Spark

Hadoop – Architecture

Is Hadoop the New HPC?

Hadoop

Apache Hadoop

Hadoop Tutorial – A Comprehensive Guide for beginners