登录查看更多内容

Big Data

Ankit Kumar

DevOps| Terraform | Linux |Azure | Azure DevOps | AWS | RH294( Ansible) | Python | Docker | Kubernetes | Grafana | ELK | Prometheus

发布日期: 2020年9月16日

Today is the world of data and most of the big companies like Facebook, Google, Microsoft, Netflix, Amazon, etc are dealing with a very huge amount of data on a daily basis.

What is big data?

It is a term that describes the large volumes of data - both structured as well as unstructured data. It is not technology, it is a problem that we have to deal with.

If we take one example of Facebook, then

According to the data revealed by Facebook,

It collects 500+ terabytes of data every day.
2.7 Billion likes and 300 million photos per day.
It scans 105 terabytes of data in every 30 minutes.

So we see the data collected by these companies on daily basis are very huge. To deal with such huge data, the companies require huge storage.

Why Big Data is important?

When you combine big data with high-powered analytics, you can accomplish business-related tasks such as:

Determining root causes of failures, issues, and defects in near-real-time.
Generating coupons at the point of sale based on the customer’s buying habits.
Recalculating entire risk portfolios in minutes.
Detecting fraudulent behavior before it affects your organization

What are the challenges that come with Big Data?

Volume
Velocity
Variety

These are the 3’V problem with the Bigdata. So lets us talk on this

Volume

Here the volume basically means storage. Companies daily receive a very huge amount of data in Terabytes or petabytes. But storage of these data is a big issue. Many big appliances companies can create harddisk but the cost of such a hard disk is very high. So companies have to think also before investing such an amount of money.

Let us try to imagine the data received by the companies receive every data.

We upload the images/photographs on Facebook. This statement does not boggle on the mind until you are not able to realize that Facebook user has more than China’s population and each user upload photos on Facebook. Facebook is storing roughly 250 billion images. Now just think about 250 billion images. In 2016, Facebook had 2.5 trillion posts.

Talking about Netflix, then it would be 400 billion events daily, and 17GB per second during peak.

The next big Company is Google,

Google now processes over 40,000 search queries every second on average, which translates to over 3.5 billion searches per day and 1.2 trillion searches per year worldwide and Google currently processes over 20 petabytes of data per day.

These are all data and it needs to store in the hard disk. So these numbers are so big that we cannot imagine. This is basically the volume vector.

Velocity

Although appliances company have capable to create storage in petabytes, but again the new problems start. This problem is I/O(Input Output).

I/O problem: Companies receive Tb’s of data flooded in every minute or hours.

Let us suppose companies somehow manage to store the data in the hard disk, but the problem arises when you save such huge data in the hard disk.

If we try to analyze the speed to save the data in the normal SATA hard disk, then roughly it takes 1min to store 1Gb of data and the companies receive the data in terabytes. So we can imagine how much time will it save the data in the hard disk. When we process the data, it again takes so much time to load on the ram. So this is basically the input-output problem with the big data. So companies invest 1 or 2 days only to save the data, then imagine the situation when you search something in google, you will get the search result after 2 days of search.

So, this is the velocity vector of big data.

Variety

As we already see above about Facebook, Netflix, Google. We upload photos, like on the photos, comment, login in, login out, and many different activities we perform on Facebook. It creates different types of data, which means variety.

When we look over the network packets while surfing over the internet.

Email messages, a legal discovery process might require sifting through thousands to millions of email messages in a collection. Not one of those messages is going to be exactly like another. Along with the email, location and timing are also attached. So basically, we see the variety of data is received by the company on the daily events. We have to process the data so that we can use it.

This is a variety vector of big data.

How to manage the Big Data

To tackle Bigdata, companies use Distributive Storage Techniques.

The core concept of the Distributive Storage is to break the chunk of data in small pieces and store over the server.

So basically whats happens due to the splitting of data and store over the server.

The data will come to the main server and the main server is connected to many small servers via the network. When the data come over the main network, it spits the data into small chunks and sends it to the small server.

When we see over the volume perspective, so when we store the data in numbers of small volumes, then, it solves the storage problem.

We store the small volumes of data in the storage means it takes less time to store. So it solves our velocity issue also. All they serve are connected to the master using networking.

This is how distributive storage help to solve the problem of big data. This is called as the master-salve model. The Master is the main server and the small server are called a slave. To create such a cluster, we need software, the name of the software is Hadoop.

Prism Project

Together with Yahoo, Facebook spearheaded the creation of Hadoop, a sweeping software platform for processing and analyzing the epic amounts of data streaming across the modern web. Facebook is staring down an even larger avalanche of data, and there are new limitations that need fixing.

PRISM PROJECT.

This project aims to solve one of the biggest problems Facebook has faced operating at its uniquely massive scale: how to create server clusters that can operate as a unit even when they’re geographically distributed. Means for managing Hadoop.

Hope you like the Article. Have a good day.

Thank You.......

要查看或添加评论，请登录

Ankit Kumar的更多文章

How Kubernetes Is Changing The Business ????

2021年3月12日

How Kubernetes Is Changing The Business ????

What is Kubernetes?? ???? Kubernetes is an open-source platform for managing containers. The product is created by…

1 条评论
Create CloudFront Architecture with AWS CLI

2020年10月27日

Create CloudFront Architecture with AWS CLI

What is AWS ? AWS is a Public-Cloud platform providing world-wide service and offer over 170 fully-featured services…

1 条评论
AWS Customer Success Story

2020年9月20日

AWS Customer Success Story

What is Cloud Computing? The company that provides their resources over the internet and charges according to the…
Flutter Music Player

2020年8月9日

Flutter Music Player

This article refers to the creation of a Music App with the use of one great Platform know as Flutter. There are many…
Integration of Ansible with Docker

2020年8月3日

Integration of Ansible with Docker

This is the first task of Ansible under Mr. Vimal Daga Sir.

1 条评论
FLUTTER ENVIRONMENT SETUP

2020年7月19日

FLUTTER ENVIRONMENT SETUP

What is Flutter? Flutter is Google’s UI toolkit for building beautiful, natively compiled applications for mobile, web,…
AWS + KUBERNATES = AMAZON E

2020年7月16日

AWS + KUBERNATES = AMAZON E

Amazon Elastic Kubernetes Service (Amazon EKS) is a fully managed Kubernetes service. Amazon EKS automatically detects…
End to End INTEGRATION Of Docker, Jenkins, Github, Git, Redhat

2020年7月15日

End to End INTEGRATION Of Docker, Jenkins, Github, Git, Redhat

DevOps task2 Let us first understand what we are going to do in this task. Task description 1.

2 条评论
TASK 1 COMPLETE INTEGRATION OF JENKINS WITH GITHUB

2020年7月11日

TASK 1 COMPLETE INTEGRATION OF JENKINS WITH GITHUB

Task Description TASK 1 JOB#1 If Developer push to dev branch then Jenkins will fetch from dev and deploy on the…
FACE RECOGNITION USING TRANSFER LEARNING

2020年6月6日

FACE RECOGNITION USING TRANSFER LEARNING

Task4 Create a project using transfer learning solving various problems like Face Recognition, Image Classification…

See all articles

Big Data

Ankit Kumar

DevOps| Terraform | Linux |Azure | Azure DevOps | AWS | RH294( Ansible) | Python | Docker | Kubernetes | Grafana | ELK | Prometheus

What are the challenges that come with Big Data?

Volume

Velocity

Variety

How to manage the Big Data

Prism Project

Ankit Kumar的更多文章

社区洞察

其他会员也浏览了

Crap data everywhere

Big-Data:- Problems of Umbrella

Big Data Applications

Data Breach? What Breach- A Watershed Moment

Data Monetization: Unlocking Value in the Digital Age

Why "Big Data" Matters Now More Than Ever

How big is big?

Applications of Data Science in Businesses

Have you Ever tried to think about the Data..an intro to BigData.

Big Data and the Data Analysis in Social Networks

What are the challenges that come with Big Data?

Volume

Velocity

Variety

How to manage the Big Data

Prism Project

Ankit Kumar的更多文章

How Kubernetes Is Changing The Business ????

Create CloudFront Architecture with AWS CLI

AWS Customer Success Story

Flutter Music Player

Integration of Ansible with Docker

FLUTTER ENVIRONMENT SETUP

AWS + KUBERNATES = AMAZON E

End to End INTEGRATION Of Docker, Jenkins, Github, Git, Redhat

TASK 1 COMPLETE INTEGRATION OF JENKINS WITH GITHUB

FACE RECOGNITION USING TRANSFER LEARNING

社区洞察

其他会员也浏览了

Crap data everywhere

Big-Data:- Problems of Umbrella

Big Data Applications

Data Breach? What Breach- A Watershed Moment

Data Monetization: Unlocking Value in the Digital Age

Why "Big Data" Matters Now More Than Ever

How big is big?

Applications of Data Science in Businesses

Have you Ever tried to think about the Data..an intro to BigData.

Big Data and the Data Analysis in Social Networks