Apache Hadoop And Its Journey
Apache Hadoop is an open source scalable and fault tolerant frame work for distributed storing and processing of large sets of data, with a cost optimization to be used on commodity hardware. And It is benefitted being open source driven by flexibility and innovation, upgrades and the latest versions are managed properly.
Features of Hadoop:
- Scalable - One machine to thousands of machines
- Fault tolerant – Replication possible in a cluster
- Open source - Community effort governed under the licensing of the Apache Software Foundation.
- Distributed storage and processing - Large datasets are automatically split into blocks, and distributed across the cluster machines.
- Supports Commodity hardware
History of Hadoop:
Hadoop journey started in early 2000’s, As Doug Cutting created a search engine project called Lucene and then built a scalable search engine called Nutch with Mike Cafarella. And in 2004, Nutch Distributed File system was built and released in Map reduce framework.
On January 28, 2006, the first Nutch (as it was then known) cluster went live at Yahoo. In January, 2006 Yahoo! employed Doug Cutting to help the team make the transition. In February 2006, Cutting pulled out GDFS and MapReduce out of the Nutch code base and created a new incubating project, under Lucene umbrella, which he named Hadoop.
Hadoop was sub-project of Lucene till the beginning of the year 2008. In January 2008, it became the top level project with dedicated team and committers at apache software foundation.
January 28, 2016 celebrated 10 years of improbable growth.
Senior Software Engineer
9 年Good Work.......You can make it more informative.