Big Data

Big Data

Today is the world of data and most of the big companies like Facebook, Google, Microsoft, Netflix, Amazon, etc are dealing with a very huge amount of data on a daily basis.

What is big data?

It is a term that describes the large volumes of data - both structured as well as unstructured data. It is not technology, it is a problem that we have to deal with.  

If we take one example of Facebook, then

According to the data revealed by Facebook, 

  • It collects 500+ terabytes of data every day.
  • 2.7 Billion likes and 300 million photos per day.
  • It scans 105 terabytes of data in every 30 minutes.

So we see the data collected by these companies on daily basis are very huge. To deal with such huge data, the companies require huge storage.

Why Big Data is important?

When you combine big data with high-powered analytics, you can accomplish business-related tasks such as:

  • Determining root causes of failures, issues, and defects in near-real-time.
  • Generating coupons at the point of sale based on the customer’s buying habits.
  • Recalculating entire risk portfolios in minutes.
  • Detecting fraudulent behavior before it affects your organization

What are the challenges that come with Big Data?

No alt text provided for this image


  1. Volume
  2. Velocity
  3. Variety

These are the 3’V problem with the Bigdata. So lets us talk on this 



Volume

Here the volume basically means storage. Companies daily receive a very huge amount of data in Terabytes or petabytes. But storage of these data is a big issue. Many big appliances companies can create harddisk but the cost of such a hard disk is very high. So companies have to think also before investing such an amount of money. 

Let us try to imagine the data received by the companies receive every data.

facebook


We upload the images/photographs on Facebook. This statement does not boggle on the mind until you are not able to realize that Facebook user has more than China’s population and each user upload photos on Facebook.  Facebook is storing roughly 250 billion images. Now just think about 250 billion images. In 2016, Facebook had 2.5 trillion posts.

No alt text provided for this image


Talking about Netflix, then it would be  400 billion events daily, and 17GB per second during peak.


No alt text provided for this image


The next big Company is Google, 

Google now processes over 40,000 search queries every second on average, which translates to over 3.5 billion searches per day and 1.2 trillion searches per year worldwide and Google currently processes over 20 petabytes of data per day.

These are all data and it needs to store in the hard disk. So these numbers are so big that we cannot imagine. This is basically the volume vector.

Velocity

No alt text provided for this image


Although appliances company have capable to create storage in petabytes, but again the new problems start. This problem is I/O(Input Output).

I/O problem: Companies receive Tb’s of data flooded in every minute or hours.

Let us suppose companies somehow manage to store the data in the hard disk, but the problem arises when you save such huge data in the hard disk. 

If we try to analyze the speed to save the data in the normal SATA hard disk, then roughly it takes 1min to store 1Gb of data and the companies receive the data in terabytes. So we can imagine how much time will it save the data in the hard disk. When we process the data, it again takes so much time to load on the ram. So this is basically the input-output problem with the big data. So companies invest 1 or 2 days only to save the data, then imagine the situation when you search something in google, you will get the search result after 2 days of search.  

So, this is the velocity vector of big data.

Variety 

No alt text provided for this image


As we already see above about Facebook, Netflix, Google. We upload photos, like on the photos, comment, login in, login out, and many different activities we perform on Facebook. It creates different types of data, which means variety.

When we look over the network packets while surfing over the internet. 

Email messages, a legal discovery process might require sifting through thousands to millions of email messages in a collection. Not one of those messages is going to be exactly like another. Along with the email, location and timing are also attached. So basically, we see the variety of data is received by the company on the daily events. We have to process the data so that we can use it.

This is a variety vector of big data.


How to manage the Big Data 

To tackle Bigdata, companies use Distributive Storage Techniques. 

No alt text provided for this image


The core concept of the Distributive Storage is to break the chunk of data in small pieces and store over the server. 

So basically whats happens due to the splitting of data and store over the server.

The data will come to the main server and the main server is connected to many small servers via the network. When the data come over the main network, it spits the data into small chunks and sends it to the small server.

When we see over the volume perspective, so when we store the data in numbers of small volumes, then, it solves the storage problem.

We store the small volumes of data in the storage means it takes less time to store. So it solves our velocity issue also.  All they serve are connected to the master using networking. 

This is how distributive storage help to solve the problem of big data. This is called as the master-salve model. The Master is the main server and the small server are called a slave. To create such a cluster, we need software, the name of the software is Hadoop. 

Prism Project

Together with Yahoo, Facebook spearheaded the creation of Hadoop, a sweeping software platform for processing and analyzing the epic amounts of data streaming across the modern web. Facebook is staring down an even larger avalanche of data, and there are new limitations that need fixing.

PRISM PROJECT.

No alt text provided for this image

This project aims to solve one of the biggest problems Facebook has faced operating at its uniquely massive scale: how to create server clusters that can operate as a unit even when they’re geographically distributed. Means for managing Hadoop.


Hope you like the Article. Have a good day.

Thank You.......



 


 


 


要查看或添加评论,请登录

Ankit Kumar的更多文章

  • How Kubernetes Is Changing The Business ????

    How Kubernetes Is Changing The Business ????

    What is Kubernetes?? ???? Kubernetes is an open-source platform for managing containers. The product is created by…

    1 条评论
  • Create CloudFront Architecture with AWS CLI

    Create CloudFront Architecture with AWS CLI

    What is AWS ? AWS is a Public-Cloud platform providing world-wide service and offer over 170 fully-featured services…

    1 条评论
  • AWS Customer Success Story

    AWS Customer Success Story

    What is Cloud Computing? The company that provides their resources over the internet and charges according to the…

  • Flutter Music Player

    Flutter Music Player

    This article refers to the creation of a Music App with the use of one great Platform know as Flutter. There are many…

  • Integration of Ansible with Docker

    Integration of Ansible with Docker

    This is the first task of Ansible under Mr. Vimal Daga Sir.

    1 条评论
  • FLUTTER ENVIRONMENT SETUP

    FLUTTER ENVIRONMENT SETUP

    What is Flutter? Flutter is Google’s UI toolkit for building beautiful, natively compiled applications for mobile, web,…

  • AWS + KUBERNATES = AMAZON E

    AWS + KUBERNATES = AMAZON E

    Amazon Elastic Kubernetes Service (Amazon EKS) is a fully managed Kubernetes service. Amazon EKS automatically detects…

  • End to End INTEGRATION Of Docker, Jenkins, Github, Git, Redhat

    End to End INTEGRATION Of Docker, Jenkins, Github, Git, Redhat

    DevOps task2 Let us first understand what we are going to do in this task. Task description 1.

    2 条评论
  • TASK 1 COMPLETE INTEGRATION OF JENKINS WITH GITHUB

    TASK 1 COMPLETE INTEGRATION OF JENKINS WITH GITHUB

    Task Description TASK 1 JOB#1 If Developer push to dev branch then Jenkins will fetch from dev and deploy on the…

  • FACE RECOGNITION USING TRANSFER LEARNING

    FACE RECOGNITION USING TRANSFER LEARNING

    Task4 Create a project using transfer learning solving various problems like Face Recognition, Image Classification…

社区洞察

其他会员也浏览了