登录查看更多内容

How MNC's using BigData to manage and manipulate Thousands of Terrabyte of data with High Speed.

Divyansh Rahangdale

FullStack Developer | DevOps | AWS Certified Developer

发布日期: 2020年9月16日

# What is Big Data?

According to research last year 4.13 Billion users are active on internet. Research says 2.5 quintillion bytes of data generated every day in upcoming years it will reach upto 463 exabytes of data generates per day. Data is increasing day by day and it becomes difficult to manage all these large data by traditional methods. Here BigData technology plays major role for managing huge, complex data.

Big Data is a term refers for collection and management of complex, large data sets which is difficult for processing using traditional method.

# Four V's of Big Data are

1) Volume

Volume refers to size of data. Now in every sector storage of data becomes hard it requires a way to store all these huge data and pre-ready for upcoming data storage management techniques. We can now use low cost storage hardware for storage purpose with the help of Hadoop cluster.

2) Velocity

Velocity refers for time taken by I/O to retrieve, store data and processing it. It is normal to perform I/O operation on small data set but if data coming in huge volume in small time interval it becomes hard. For these we use distributed storage management and performs all these task for faster I/O. Hadoop plays major role for distributed systems.

3) Variety

Variety refers for distinct forms of data for ex: data available in organised way or in un-organised manner. Types of variety a) structured, b) semi structured and c) unstructured format.

4) Veracity

In case of Veracity we have lot's of data coming in high speed but we don't predict data is coming from authorised source or it is correct or not.

# What is Hadoop?

Hadoop is an open source framework developed by Apache. Hadoop works for both structured and un-structured data. It helps to store and process data in distributed storage kind of system. Hadoop allows clustering multiple computer.

Hadoop creates cluster. Basically a cluster contain atleast one master node and atleast one slave node. Master node distributes single data into multiple slave node and then slave node perform data processing. This makes processing speed fast and data management becomes easy. Master node receives data and it break down data and send them into multiple slave node. So this make fast processing using hadoop cluster.

# How Big Data and Hadoop plays major role in MNC's like Facebook

“Facebook runs the world’s largest Hadoop cluster" says Jay Parikh, Vice President Infrastructure Engineering, Facebook.

Facebook has two main clusters. First one is 1100 node cluster which having 8800 cores CPU and 12 Petabyte Storage. Second one is having 300 node cluster which have 2400 cores CPU and 3 Petabyte Storage.

All these nodes process data in high speed and generate result in small time. Let's understand this with an example :- If anyone shares a post on facebook data will go through all these nodes and stores there and then multiple processing will done like how many likes post receives and what kind of post is it like technology related or political related some machine learning models will run and generates some result like what type of topics user interested according to post all these done in small time maybe in a few sec or a min.

要查看或添加评论，请登录

Divyansh Rahangdale的更多文章

Launch AWS EC2 instance with the help of AWS Command Line Interface

2020年10月16日

Launch AWS EC2 instance with the help of AWS Command Line Interface

" The AWS Command Line Interface (CLI) is a unified tool to manage your AWS services. With just one tool to download…
Configure Postfix RealyHost and send e-mail with the help of Ansible

2020年10月15日

Configure Postfix RealyHost and send e-mail with the help of Ansible

Postfix is OpenSource Mail Transfer Agent which is used to deliver E-MAIL. In this article I am going to demonstrate…
Case Study on Wipro working with Amazon Web Service Cloud Platform

2020年9月21日

Case Study on Wipro working with Amazon Web Service Cloud Platform

Wipro is an Indian Multinational company delivering Information Technology, consulting and business process services…

5 条评论
Task 6: Deploying Wordpress on Kubernetes and AWS RDS using Terraform

2020年9月7日

Task 6: Deploying Wordpress on Kubernetes and AWS RDS using Terraform

Hello everyone this is submission of my Task-6 of Hybrid Multi Cloud Computing Training under Mr. Vimal Daga sir and…
Ansible Task-3: Deploy Loadbalancer and manage webserver using Ansible

2020年9月4日

Ansible Task-3: Deploy Loadbalancer and manage webserver using Ansible

Hello everyone, I am sharing my task-3 implementation of RHCE 8 Automation using Ansible training by Mr Vimal Daga sir…

2 条评论
Ansible Task-2: Launch EC2 instance and configure webserver using Ansible

2020年8月18日

Ansible Task-2: Launch EC2 instance and configure webserver using Ansible

Hello everyone, I am sharing this article as a task given by Mr. Vimal Daga sir and LinuxWorld under DevOps using…

2 条评论
Ansible Task-1: Configure Docker and launch httpd container using Ansible

2020年8月10日

Ansible Task-1: Configure Docker and launch httpd container using Ansible

Hello everyone in this task I am going to configure Docker setup and after configuration we will launch httpd server on…
TASK-2: Automation using Jenkins for Launching website on AWS using EFS storage, S3 bucket and Cloudfront CDN with the help of Terraform

2020年7月20日

TASK-2: Automation using Jenkins for Launching website on AWS using EFS storage, S3 bucket and Cloudfront CDN with the help of Terraform

Hello everyone in this article I am demonstrating how to launch a Website on AWS EC-2 instance with EFS storage and…
Sharing my DevOps Assembly Lines training journey with LinuxWorld Informatics Pvt Ltd

2020年7月20日

Sharing my DevOps Assembly Lines training journey with LinuxWorld Informatics Pvt Ltd

LinuxWorld gives you wings. Yes LinuxWorld provides you high quality training and every concept is practically…
Task-3: Launching Wordpress with MySQL as dedicated database server on public and private subnets on AWS Cloud using Terraform

2020年7月14日

Task-3: Launching Wordpress with MySQL as dedicated database server on public and private subnets on AWS Cloud using Terraform

Hello everyone, this is my submission of TASK-3 of Hybrid Multi Cloud Computing Training by Mr. Vimal Daga Sir and…

See all articles

How MNC's using BigData to manage and manipulate Thousands of Terrabyte of data with High Speed.

Divyansh Rahangdale

FullStack Developer | DevOps | AWS Certified Developer

Divyansh Rahangdale的更多文章

社区洞察

其他会员也浏览了

Hadoop 1.x

Big Data Diagnosis: (Hadoop & Distributed Storage Clusters)

Demystifying Hadoop's Architecture and Its Crucial Role in Data Science

BigData-Hadoop

Why Learn Big Data And Hadoop – Top Reasons To Learn Hadoop

Top Reasons to Use Hadoop for Data Science

Real Time Stream Processing in Big Data Platform

Hadoop Big Data Analytics market is expected to see a growth rate of 51.84% and may see the market size of USD48.34 Billion by 2024.

Hadoop Big Data Analytics Market Size, Current and Future Growth Analysis 2023-2028

Distributed Storage Cluster and Hadoop

Divyansh Rahangdale的更多文章

Launch AWS EC2 instance with the help of AWS Command Line Interface

Configure Postfix RealyHost and send e-mail with the help of Ansible

Case Study on Wipro working with Amazon Web Service Cloud Platform

Task 6: Deploying Wordpress on Kubernetes and AWS RDS using Terraform

Ansible Task-3: Deploy Loadbalancer and manage webserver using Ansible

Ansible Task-2: Launch EC2 instance and configure webserver using Ansible

Ansible Task-1: Configure Docker and launch httpd container using Ansible

TASK-2: Automation using Jenkins for Launching website on AWS using EFS storage, S3 bucket and Cloudfront CDN with the help of Terraform

Sharing my DevOps Assembly Lines training journey with LinuxWorld Informatics Pvt Ltd

Task-3: Launching Wordpress with MySQL as dedicated database server on public and private subnets on AWS Cloud using Terraform

社区洞察

其他会员也浏览了

Hadoop 1.x

Big Data Diagnosis: (Hadoop & Distributed Storage Clusters)

Demystifying Hadoop's Architecture and Its Crucial Role in Data Science

BigData-Hadoop

Why Learn Big Data And Hadoop – Top Reasons To Learn Hadoop

Top Reasons to Use Hadoop for Data Science

Real Time Stream Processing in Big Data Platform

Hadoop Big Data Analytics market is expected to see a growth rate of 51.84% and may see the market size of USD48.34 Billion by 2024.

Hadoop Big Data Analytics Market Size, Current and Future Growth Analysis 2023-2028

Distributed Storage Cluster and Hadoop