登录查看更多内容

Best Guide to Features and Design Principles of Hadoop

Santosh Bakliwal

Assistant Vice President at DataFlair

发布日期: 2017年1月21日

1.Objective

In this Hadoop tutorial we will discuss about the features, characteristics and design principles of Hadoop. To learn about what is hadoop and its introduction follow this guide.

To install and configure Hadoop follow this installation guide.

2.Hadoop Features and Characteristics

Apache Hadoop is the most popular and powerful big data tool, Hadoop provides world’s most reliable storage layer – HDFS, a batch Processing engine – MapReduce and a Resource Management Layer – YARN.below are the important features of Hadoop:

Open-source – Apache Hadoop is an open source project. It means its code can be modified according to business requirements.
Distributed Processing – As data is stored in a distributed manner in HDFS across the cluster, data is processed in parallel on cluster of nodes.
Fault Tolerance – By default 3 replicas of each block is stored across the cluster in Hadoop and it can be changed also as per the requirement. So if any node goes down, data on that node can be recovered from other nodes easily. Failures of nodes or tasks are recovered automatically by the framework. This is how Hadoop is fault tolerant.
Reliability – Due to replication of data in the cluster, data is reliably stored on the cluster of machine despite machine failures. If your machine goes down, then also your data will be stored reliably.
High Availability – Data is highly available and accessible despite hardware failure due to multiple copies of data. If a machine or few hardware crashes, then data will be accessed from other path.
Scalability – Hadoop is highly scalable in the way new hardware can be easily added to the nodes. It also provides horizontal scalability which means new nodes can be added on the fly without any downtime.
Economic – Hadoop is not very expensive as it runs on cluster of commodity hardware. We do not need any specialized machine for it. Hadoop provides huge cost saving also as it is very easy to add more nodes on the fly here. So if requirement increases, you can increase nodes as well without any downtime and without requiring much of pre planning.
Easy to use – No need of client to deal with distributed computing, framework takes care of all the things. So it is easy to use.
Data Locality – Hadoop works on data locality principle which states that move computation to data instead of data to computation. When client submits the algorithm, this algorithm is moved to data in the cluster rather than bringing data to the location where algorithm is submitted and then processing it.

3.Hadoop Assumptions

Hadoop is written with large clusters of computers in mind and is built around the following assumptions:

Hardware may fail, (as commodity hardware can be used)
Processing will be run in batches. Thus there is an emphasis on high throughput as opposed to low latency
Applications that run on HDFS have large data sets. A typical file in HDFS is gigabytes to terabytes in size
Applications need a write-once-read-many access model
Moving Computation is Cheaper than Moving Data

Read the complete article>>

要查看或添加评论，请登录

Santosh Bakliwal的更多文章

Data Science vs Artificial Intelligence – Eliminate your Doubts

2019年6月20日

Data Science vs Artificial Intelligence – Eliminate your Doubts

Data Science and Artificial Intelligence, are the two most important technologies in the world today. While Data…
Skills Needed to Become a Data Scientist – Learn, Grasp, Implement!

2019年6月19日

Skills Needed to Become a Data Scientist – Learn, Grasp, Implement!

Do you know – We perform 40,000 search queries every second (on Google alone), which makes it 3.5 searches per day and…
Data Science at Netflix – A Must Read Case Study for Aspiring Data Scientists

2019年6月18日

Data Science at Netflix – A Must Read Case Study for Aspiring Data Scientists

Data Science Case Study – How Netflix Used Data Science to Improve its Recommendation System? Do you remember the last…
7 Breathtaking Applications of Data Science in Finance

2019年6月17日

7 Breathtaking Applications of Data Science in Finance

1. Objective – Data Science Careers Today, in this tutorial of Future of Data Science, we will discuss what is Data…
Top Data Science Jobs & Roles for 2019: Find What Suits You Best

2019年6月15日

Top Data Science Jobs & Roles for 2019: Find What Suits You Best

“Data Scientist, the sexiest job title for the 21st century” If you have ever witnessed a discussion on data science…
Data Science Prerequisites – Top Skills Every Data Scientist Need to Have

2019年6月14日

Data Science Prerequisites – Top Skills Every Data Scientist Need to Have

Data Science is a massive sector, it is not just one standalone topic but a combination of many. Often, many of us…
Data Science vs Artificial Intelligence vs Machine Learning vs Deep Learning

2019年6月13日

Data Science vs Artificial Intelligence vs Machine Learning vs Deep Learning

1. Objective In this blog, we will discuss Data Science vs Artificial Intelligence vs Machine Learning vs Deep Learning.
20 Interesting Applications of Deep Learning with Python

2019年6月12日

20 Interesting Applications of Deep Learning with Python

1. Top Python Deep Learning Applications Today, in this Deep Learning with Python Tutorial, we will see Applications of…
Data Scientist vs Business Analyst – 5 Core Aspects to Choose Your Career

2019年6月11日

Data Scientist vs Business Analyst – 5 Core Aspects to Choose Your Career

Data Science and Business Analysis are two of the most recurring terms in the industries. Like data scientists…
14 Most Used Data Science Tools for 2019 – Essential Data Science Ingredients

2019年6月10日

14 Most Used Data Science Tools for 2019 – Essential Data Science Ingredients

A Data Scientist is responsible for extracting, manipulating, pre-processing and generating predictions out of data. In…

See all articles

Best Guide to Features and Design Principles of Hadoop

Santosh Bakliwal

Assistant Vice President at DataFlair

1.Objective

2.Hadoop Features and Characteristics

3.Hadoop Assumptions

Santosh Bakliwal的更多文章

社区洞察

其他会员也浏览了

Developing Applications with Hadoop Ecosystem

Hadoop Ecosystem and Their Components

Navigating the Hadoop Ecosystem: A Hands-On Guide

Introduction to Hadoop Ecosystem: Understanding HDFS, MapReduce, and YARN

Hadoop Architecture Made Easy!

Hadoop: Pioneering the Era of Big Data Storage Technologies

Understanding Hadoop: The Backbone of Big Data Processing

Hadoop vs Cassandra 2018

Unleashing the Power of Big Data with Hadoop

Big Data Hadoop Alternatives: What They Offer and Who Uses Them

1.Objective

2.Hadoop Features and Characteristics

3.Hadoop Assumptions

Santosh Bakliwal的更多文章

Data Science vs Artificial Intelligence – Eliminate your Doubts

Skills Needed to Become a Data Scientist – Learn, Grasp, Implement!

Data Science at Netflix – A Must Read Case Study for Aspiring Data Scientists

7 Breathtaking Applications of Data Science in Finance

Top Data Science Jobs & Roles for 2019: Find What Suits You Best

Data Science Prerequisites – Top Skills Every Data Scientist Need to Have

Data Science vs Artificial Intelligence vs Machine Learning vs Deep Learning

20 Interesting Applications of Deep Learning with Python

Data Scientist vs Business Analyst – 5 Core Aspects to Choose Your Career

14 Most Used Data Science Tools for 2019 – Essential Data Science Ingredients

社区洞察

其他会员也浏览了

Developing Applications with Hadoop Ecosystem

Hadoop Ecosystem and Their Components

Navigating the Hadoop Ecosystem: A Hands-On Guide

Introduction to Hadoop Ecosystem: Understanding HDFS, MapReduce, and YARN

Hadoop Architecture Made Easy!

Hadoop: Pioneering the Era of Big Data Storage Technologies

Understanding Hadoop: The Backbone of Big Data Processing

Hadoop vs Cassandra 2018

Unleashing the Power of Big Data with Hadoop

Big Data Hadoop Alternatives: What They Offer and Who Uses Them