Big Data?: Problem and Cure

Big Data: Problem and Cure

What is Data?

Data are characteristics or information that are collected through observation. In a more technical sense, data are a set of values of qualitative or quantitative variables about one or more persons or objects.

What is Big Data?

Big Data is also data but with a huge size. Big Data is a term used to describe a collection of data that is huge in volume and yet growing exponentially with time. In short such data is so large and complex that none of the traditional data management tools is able to store it or process it efficiently.

Data Growth over the years -

No alt text provided for this image

3 Vs of Big Data :-

No alt text provided for this image

(i) Volume – The name Big Data itself is related to a size which is enormous. Size of data plays a very crucial role in determining value out of data. Also, whether a particular data can actually be considered as a Big Data or not, is dependent upon the volume of data.

(ii) Velocity – The term 'velocity' refers to the speed of generation of data. How fast the data is generated and processed to meet the demands, determines real potential in the data.

(iii) Variety – Variety in Big Data refers to all the structured and unstructured data that has the possibility of getting generated either by humans or by machines. The most commonly added data are structured -texts, tweets, pictures & videos. However, unstructured data like emails, voicemails, hand-written text, ECG reading, audio recordings etc, are also important elements under Variety. 

Facebook Stats -

Facebook revealed some stats on big data. These are the stats of Facebook for one day -

  • 2.5 Billion - Content items shared
  • 2.7 Billion - Likes
  • 300 Million - Photos uploaded
  • 100+ Petabyte - Disk space in a single HDFS Cluster
  • 105 Terabyte - Data scanned via Hive in 30 Mins
  • 70000 - Queries Executed
  • 500+ Terabyte - New data ingested

Why we need Big Data?

Data are generated incessantly containing nuggets of valuable insight, critical for business success. The challenge is how to analyse and process these data in order to derive those nuggets of the information set to strengthen business strategy, efficiency and performance -be it customer feedback, market trends, demand for a product or competitor activity. 

Big Data solutions help companies make sense out of random information, become proactive and start setting the pace instead of continuously putting out fires and following competition.

How Big Data is a Problem?

  • Big MNCs like Facebook, Google, Amazon, etc are receiving a huge amount of data per day i.e. in units of Terabyte or Petabytes and they have to store the data. They need the storage device to store this huge amount of data but think once how big volume size of storage they need to store this data and also till today no storage device is available witch such a large volume size.
  • Even we can make a single volume of such a large storage capacity but then one more problem comes up of I/O. As the size of the storage device will increase the I/O rate i.e. velocity will decrease. This leads to very high time consumption in reading or writing the data.

Solution of Big Data -

Distributed Storage System is the infrastructure that can split data across multiple physical servers, and often across more than one data centre. It typically takes the form of a cluster of storage units, with a mechanism for data synchronization and coordination between cluster nodes.

No alt text provided for this image

It works on the principle of master-slave architecture i.e. master is that system to which every other system contributes their harddisk to solve the big data problem.

Hadoop -

There are many big data tools available in the market like Apache Hadoop, Apache Spark, Flink, Apache Storm, Apache Cassandra, MongoDB, Kafka and many more. But Hadoop is one of the famous tool for Big Data.

Apache Hadoop is a collection of open-source software utilities that facilitate using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model.

Thanks, Hope you liked it!!


要查看或添加评论,请登录

Aman Jhagrolia的更多文章

  • Openshift: Overview and Case Studies

    Openshift: Overview and Case Studies

    Red Hat OpenShift Container Platform (in short RHOCP) is an open-source container application platform based on the…

    2 条评论
  • Jenkins: Overview and Case Studies

    Jenkins: Overview and Case Studies

    Jenkins is an open-source automation server that enables developers around the world to reliably build, test, and…

    2 条评论
  • Neural Networks: Overview and Case Study

    Neural Networks: Overview and Case Study

    An artificial neural network (ANN) is the piece of a computing system designed to simulate the way the human brain…

  • Amazon SQS: Overview and Case Study

    Amazon SQS: Overview and Case Study

    Amazon Simple Queue Service (SQS) is a fully managed message queuing service that enables you to decouple and scale…

  • Azure Kubernetes Service: Case Studies

    Azure Kubernetes Service: Case Studies

    Microsoft Azure commonly referred to as Azure, is a cloud computing service created by Microsoft for building, testing,…

  • Kubernetes: Expert Session

    Kubernetes: Expert Session

    On 8 March 2021, I have attended the session of Industry Use Case with Demonstration on Kubernetes. The Session was…

  • Openshift and Kubernetes: Expert Session

    Openshift and Kubernetes: Expert Session

    On 2 March 2021, I have attended the session of Industry Use Case on Openshift and Kubernetes. The Session was…

  • Automation using Ansible Tower

    Automation using Ansible Tower

    On 28 December 2020, I have attended the session of Industry Use Case on Automation Using Ansible (Practical…

  • Ansible: Overview and Case-Study

    Ansible: Overview and Case-Study

    Ansible is an IT automation tool. It can configure systems, deploy software, and orchestrate more advanced IT tasks…

  • Kubernetes: Overview

    Kubernetes: Overview

    Kubernetes in short also known as K8s, is a portable, extensible and open-source platform for managing containerized…

社区洞察

其他会员也浏览了