登录查看更多内容

Hadoop Tutorial – A Comprehensive Guide for beginners

Malini Shukla

Senior Data Scientist || Hiring || 6M+ impressions || Trainer || Top Data Scientist || Speaker || Top content creator on LinkedIn || Tech Evangelist

发布日期: 2016年12月29日

1.Objective

This Hadoop tutorial provides thorough introduction of Hadoop. The tutorial covers what is Hadoop, what is the need of Hadoop, why hadoop is most popular, Hadoop Architecture, data flow, Hadoop daemons, different flavours, introduction of Hadoop componenets like hdfs, MapReduce, Yarn, etc.

2.Hadoop Introduction

Hadoop is an open source tool from the ASF – Apache Software Foundation. Open source project means it is freely available and even its source code can be changed as per the requirements. If certain functionality does not fulfill your requirement, you can change it according to your need. Most of Hadoop code is written by Yahoo, IBM, Facebook, Cloudera.

It provides an efficient framework for running jobs on multiple nodes of clusters. Cluster means a group of systems connected via LAN. Hadoop provides parallel processing of data as it works on multiple machines simultaneously.

It is inspired by Google, which has written a paper about the technologies it is using like Map-Reduce programming model as well as its file system (GFS). Hadoop was originally written for the Nutch search engine project when Doug cutting and his team were working on it but very soon, it became a top-level project due to its huge popularity.

Hadoop is an open source framework which is written in Java. But this does not mean you can code only in Java. You can code in C, C++, perl, python, ruby etc. You can code in any language but it is recommended to code in java as you will have lower level control of the code.

It efficiently processes large volumes of data on a cluster of commodity hardware. Hadoop is developed for processing of huge volume of data. Commodity hardware are the low end hardware, they are cheap devices which are very economic. So hadoop is very economic.

Hadoop can be setup on single machine (pseudo distributed mode), but real power of Hadoop comes with a cluster of machines, it can be scaled to thousand nodes on the fly ie, without any downtime. We need not make any system down to add more systems in the cluster.

Hadoop consists of three key parts – Hadoop Distributed File System (HDFS), Map-Reduce and YARN. HDFS is the storage layer, Map Reduce is the processing layer and YARN is the resource management layer

3.Why Hadoop?

Let us now understand why Hadoop is very popular, why Hadoop has captured more than 90% of big data market.

Hadoop is not only a storage-system but is a platform for data storage as well as processing. It is scalable (more nodes can be added on the fly), Fault tolerant (Even if nodes go down, data can be processed by other node) and Open source (can modify the source code if required).

Following characteristics of Hadoop make is unique platform:

Flexibility to store and mine any type of data whether it is structured, semi-structured or unstructured. It is not bounded by single schema.
Excels at processing data of complex nature, its scale-out architecture divides workloads across multiple nodes. Another added advantage is that its flexible file-system eliminates ETL bottlenecks.
Scales economically, as discussed it can be deployed on commodity hardware. Apart from this its open-source nature guards against vendor lock.

Read the complete article>>

要查看或添加评论，请登录

Malini Shukla的更多文章

Top 9 Computer Vision Project Ideas for Beginners

2020年1月21日

Top 9 Computer Vision Project Ideas for Beginners

Understand the visual world around us Computer Vision Projects Computer vision is the most powerful and compelling type…
12 Cool Data Science project ideas with source code - "Strengthen your Resume"

2019年11月13日

12 Cool Data Science project ideas with source code - "Strengthen your Resume"

INTRODUCTION Data Science, a field that brings out wonders almost every second day and that’s why it is often regarded…

3 条评论
Python Coding Interview Questions for Experienced - Python FAQ's

2019年9月30日

Python Coding Interview Questions for Experienced - Python FAQ's

Firstly, If you are here, you probably already have a interview scheduled so my friend all the very best with that…
How Data Science is the Backbone of Retail?

2019年7月16日

How Data Science is the Backbone of Retail?

Data Science is having an increasing impact on business models in all industries. And in today’s digital world, data…
How to Get The Coolest & The Sexiest Job Of the Century- “Become a Data Scientist”

2019年7月9日

How to Get The Coolest & The Sexiest Job Of the Century- “Become a Data Scientist”

“The goal is to turn data into information, and information into insight” Data Scientist is an analytical data expert…
What’s the Best programming Language to Start a Career in Data Science?

2019年6月25日

What’s the Best programming Language to Start a Career in Data Science?

If you are thinking which programming languages should I learn to Master data Science in 2019? Then you are at the…

1 条评论
11 Reason Why TensorFlow is So Popular

2019年6月15日

11 Reason Why TensorFlow is So Popular

TensorFlow Features | Why TensorFlow Is So Popular TensorFlow gives us an interactive multiplatform programming…
20 Deep Learning Terminologies You Must Know

2019年6月14日

20 Deep Learning Terminologies You Must Know

Deep Learning Terminologies a. Recurrent Neuron It’s one of the best from the Deep Learning Terminologies.

2 条评论
TensorFlow Performance Optimization – Tips To Improve Performance

2019年6月12日

TensorFlow Performance Optimization – Tips To Improve Performance

Ways for TensorFlow Performance Optimization There a variety of ways through which you can optimize your hardware tools…
Top 9 Reasons Why QlikView is Best in BI

2019年6月11日

Top 9 Reasons Why QlikView is Best in BI

QlikView Features Below are the 9 Features of QlikView, which gives us the importance of QlikView, let’s discuss them:…

See all articles

Hadoop Tutorial – A Comprehensive Guide for beginners

Malini Shukla

Senior Data Scientist || Hiring || 6M+ impressions || Trainer || Top Data Scientist || Speaker || Top content creator on LinkedIn || Tech Evangelist

1.Objective

2.Hadoop Introduction

3.Why Hadoop?

Malini Shukla的更多文章

社区洞察

其他会员也浏览了

Why do we need Hadoop for Data Science - NareshIT

HADOOP

What Are The Key Differences Between Spark And Hadoop?

Hadoop Ecosystem and Their Components

Introduction to Hadoop Ecosystem: Understanding HDFS, MapReduce, and YARN

Introduction to Hadoop

Hadoop Ecosystem

Understanding Hadoop: The Backbone of Big Data Processing

Task Efficiency: A Comparative Study of Hadoop MapReduce, Apache Spark

Hadoop 3: Comparison with Hadoop 2 and Spark

1.Objective

2.Hadoop Introduction

3.Why Hadoop?

Malini Shukla的更多文章

Top 9 Computer Vision Project Ideas for Beginners

12 Cool Data Science project ideas with source code - "Strengthen your Resume"

Python Coding Interview Questions for Experienced - Python FAQ's

How Data Science is the Backbone of Retail?

How to Get The Coolest & The Sexiest Job Of the Century- “Become a Data Scientist”

What’s the Best programming Language to Start a Career in Data Science?

11 Reason Why TensorFlow is So Popular

20 Deep Learning Terminologies You Must Know

TensorFlow Performance Optimization – Tips To Improve Performance

Top 9 Reasons Why QlikView is Best in BI

社区洞察

其他会员也浏览了

Why do we need Hadoop for Data Science - NareshIT

HADOOP

What Are The Key Differences Between Spark And Hadoop?

Hadoop Ecosystem and Their Components

Introduction to Hadoop Ecosystem: Understanding HDFS, MapReduce, and YARN

Introduction to Hadoop

Hadoop Ecosystem

Understanding Hadoop: The Backbone of Big Data Processing

Task Efficiency: A Comparative Study of Hadoop MapReduce, Apache Spark

Hadoop 3: Comparison with Hadoop 2 and Spark