Big-Data?:- Problems of Umbrella

Big-Data:- Problems of Umbrella

As we know , In today's world the most important thing is data .The one who can store/collect more data is more powerful. Have you ever thought MNC’s like Google just search our query in a few seconds or how Facebook is able to show the photo that was posted a few years back and shows up when we search for it. In this article ,we see how MNC manage their big-data and what challenges are facing to store such as huge amount of data.

# What is Big-data ?

Bigdata is a term used to describe a collection of data that is huge in size and yet growing exponentially with time. Examples of Big Data generation includes stock exchanges, social media sites, jet engines, etc.

Because when we surf the internet or visit any site then there is the exchange of the data between that site and our browser. And if we upload some file like an image, document on that site then that file gets stored in the database or the data center.

Billions of people daily surf the internet, visit social media sites, and daily tonnes of data gets uploaded on the internet. Giant companies like Facebook, Google, Amazon, daily receive tonnes of data daily and this data is not in GB's, it is in hundreds and thousands of Terabytes(TB) or Petabytes(PB).

# How big data manage by Facebook

No alt text provided for this image

According to the current situation, we can strongly say that it is impossible to see a person without using social media. Because the world is getting drastic exponential growth digitally around every corner of the world. According to a report, from 2017 to 2019 the total number of social media users has been increased from 2.46 to 2.77 billion.

People are using Facebook, Instagram, WhatsApp, and other social/Messaging medium while doing their daily routines. So, this caused the average time spent on social media by an individual has been increased to 2 hours 22 minutes.

This drastic growth of social media is directly impacting the data generation. Yes, Whatever we do in social media including a like, share, retweet, comments and everything has been stored as a record, and which has been generated data.

Facts

Google processes over 3.5 billion search query’s every day. Every day, 306.4 billion emails are sent, and 5 million Tweets are made. 65 billion messages are sent on WhatsApp. Organizations need to store this data and need to retrieve data when asked by a user. This data is huge and they need to process it. Facebook cannot throw out or delete data of each profile, it has to store them somewhere forever. But is there anyone server or hard disc that stores petabytes of data every day and work super fast.


No alt text provided for this image

Facebook revealed some big, big stats on big data to a few reporters at its HQ today, including that its system processes 2.5 billion pieces of content and 500+ terabytes of data each day. It’s pulling in 2.7 billion Like actions and 300 million photos per day, and it scans roughly 105 terabytes of data each half hour. 

No alt text provided for this image

# Why Facebook store Big Data ?

The main business strategy of Facebook is to understand who their users are, by understanding their user's behaviors, interests, and their geographic locations, facebook shows customized ads on their user's timeline.

There are around billion levels of unstructured data has been generated every day, which contains images, text, video, and everything. With the help of Deep Learning Methodology ( AI), Facebook brings structure for unstructured data.

A deep learning analysis tool can learn to recognize the images which contain pizza, without actually telling how a pizza would look like?. This can be done by analyzing the context of the large images that contain pizza. By recognizing the similar images the deep learning tool will segregate the images that contain pizza. This is how data Facebook is bringing a structure to the unstructured data.

In Deep Learning There are several use cases are there

Textual Analysis

Facebook uses DeepText to analyze the text data and extract the exact meaning from the contextual analysis. This is semi-unsupervised learning, this tool won’t need a dictionary or and don’t want to explain the meaning of every word. Instead, it focused on how words are used.

Facial Recognition

The Tool used for this is DL Application, that is DeepFace which will learn itself by recognizing people's faces in photos. That's why we're getting the name of the friends while tagging them in a post. This is an advanced image recognition tool because it will recognize a person who is in two different photos is the same or not.

Target Advertisements

Facebook uses deep neural networks to decide how to target audience while advertising ads. This Artificial intelligence can learn itself to find as much as can about the audience, and cluster them to serve them ads in a most insightful way. Because of this serving the highly targeted advertising, Facebook has become the toughest competitor for the ever known search engine Google.

# Difficulties to manage big-data

To store more data ,we require more volume. So if MNCs buy more hard disk to store data , they will be charge a huge amount and also face IO issue . These difficulties are described below:-

1. Volume

The first to consider when you receive data is to analyze how much of it is there? When it comes to Big Data, high volumes of low-density, and unstructured data needs to be processed and analyzed. As it is unstructured, this data can be unknown, for example, Twitter data feeds, clickstreams on a web page, and others. For certain organizations, depending on the niche they operate in, storing such kind of data amount to more than tens of terabytes of data or even hundreds of petabytes.

2. Velocity

The velocity that indicates the rate at which the data is received on an average. The main advantage of a RAM Disk is speed. The speed of the RAM disk as compared to a Hard disk is typically up to 50 times faster for sequential reads and writes, and up to 200 times faster for small 4KB-size transfers . Generally, this highest velocity of data streams directly into memory, instead of being written on the disk. On the other hand, when it comes to smart products that are enabled by the internet, the data will be processed in real-time or near real-time, and hence, this type of data required real-time evaluation and action. That's why we face IO issues to store big data.

3. Variety

The Variety, which is basically the different types of data. The most common types of data are structured and unstructured. Structured data is more traditional and preferred as it fits neatly in a relational database. However, as the applications of big data keep increasing, data started becoming unstructured. When it comes to unstructured data, examples like text or audio could require additional preprocessing, which will help decode the meaning of the data and support metadata.

# How Facebook manage big data

MNCs uses the big data not only to store it ,they uses it to process, analysis the data. So they can predict about the user and earn more profits. To store big data ,they use the distributed storage concept.

Distributed-Storage

distributed storage system is infrastructure that can split data across multiple physical servers, and often across more than one data center. It typically takes the form of a cluster of storage units, with a mechanism for data synchronization and coordination between cluster nodes.

No alt text provided for this image

There are many ways to create the distributed storage .Here we see how Hadoop software create distributed storage cluster. The Hadoop Distributed File System (HDFS) is the primary data storage system used by Hadoop applications. It employs a NameNode and DataNode architecture to implement a distributed file system that provides high-performance access to data across highly scalable Hadoop clusters.

Simply we can say that all storage of datanode is shared to NameNode .So finally namenode get a huge storage and also data is split into datanode .So it also increase read and write speed of data. Facebook also uses same concept to store big-data...

No alt text provided for this image

Hadoop is the key tool Facebook uses, not simply for analysis, but as an engine to power many features of the Facebook site, including messaging. That multitude of monster workloads drove the company to launch its Prism project, which supports geographically distributed Hadoop data stores.

Hadoop using a subset of SQL. To make it even easier for business people, the company created HiPal, a graphical tool that talks to Hive and enables data discovery, query authoring, charting, and dashboard creation.

In terms of raw Hadoop capacity, Facebook has reached the upper limit. The company recently declared itself  as the owner of what is likely the world's largest Hadoop cluster, weighing in at 100 petabytes..

Hope now you can understand how much difficult to manage bif-data.

Thanks for reading ...

要查看或添加评论,请登录

Deepak Sharma的更多文章

  • Jenkins Dynamic Provisioning

    Jenkins Dynamic Provisioning

    Objectives In this article , We will see how we can create dynamic slave on the fly when job come and attach to the…

    1 条评论
  • OSPF Routing Protocol using Dijkstra Algorithm

    OSPF Routing Protocol using Dijkstra Algorithm

    Objectives:- In this article, We will learn about Dijkstra Algorithm and Open Short Path First(OSPF) Routing Protocol .…

    1 条评论
  • MongoDB Case study: Forbes

    MongoDB Case study: Forbes

    Objective In this article , we see how MongoDB Cloud Migration Helps World's Biggest Media Brand Continue To Set…

  • Vehicle’s Number Plate Detection using CNN model using python and Flask API…

    Vehicle’s Number Plate Detection using CNN model using python and Flask API…

    In this article, I am going to show you how you can create CNN Model or Deep Learning Model for Vehicle’s Number Plate…

    8 条评论
  • K-means Clustering and its real use cases in security domain

    K-means Clustering and its real use cases in security domain

    Objectives:- In this article, we will see about the Kmean algorithm and how Kmean algorithm helps in security domain to…

  • JavaScript:- Industry Use-cases

    JavaScript:- Industry Use-cases

    Objective In this article , we will learn about the JavaScript and the use-cases of JavaScript. How Industries utilizes…

  • Confusion Matrix and Cyber Security

    Confusion Matrix and Cyber Security

    Objectives:- In this article , we will see about confusion matrix and the use of confusion matrix . Also we see how…

  • Self-Reflection of MongoDB-Workshop

    Self-Reflection of MongoDB-Workshop

    # Day1 (1st May 2021) ?? Introduction of the file system? ??The data we will stored in file and that file we basically…

  • OpenShift case study:- Cisco

    OpenShift case study:- Cisco

    Cisco’s success depends on its ability to quickly deliver innovative IT products and solutions to customers. Delays can…

  • Industry Use cases of Jenkins:- Prepl

    Industry Use cases of Jenkins:- Prepl

    In 2021, When industries are running towards automation, adopting different DevOps tools to solve their industrial…

社区洞察

其他会员也浏览了