登录查看更多内容

How much data is used in companies and how to resolve storage issue ?

Mohan Venkatesh Ravipati

DevOps Engineer @ Infosys | ?? AWS | ?? Kubernetes | ?? Docker | ?? Jenkins | ??? Terraform | ?? Git | ?? Linux | ?? Argo CD | ? Python | Red Hat Certified Specialist in Containers & Kubernetes

发布日期: 2020年9月17日

+ 关注

What is Data??

In computing, data is information that has been translated into a form that is efficient for movement or processing. Relative to today's computers and transmission media, data is information converted into binary - digital form. It is acceptable for data to be used as a singular subject or a plural subject. Raw data is a term used to describe data in its most basic digital format.

what is bigdata ???

"Big Data" is high-volume,velocity, and variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making"

This definition clearly answers the “What is Big Data?” question – Big Data refers to complex and large data sets that have to be processed and analyzed to uncover valuable information that can benefit businesses and organizations.

Types of Big Data

Structured

Structured is one of the types of big data and By structured data, we mean data that can be processed, stored, and retrieved in a fixed format. It refers to highly organized information that can be readily and seamlessly stored and accessed from a database by simple search engine algorithms. For instance, the employee table in a company database will be structured as the employee details, their job positions, their salaries, etc., will be present in an organized manner.

Unstructured

Unstructured data refers to the data that lacks any specific form or structure whatsoever. This makes it very difficult and time-consuming to process and analyze unstructured data. Email is an example of unstructured data. Structured and unstructured are two important types of big data.

Semi-structured

Semi structured is the third type of big data. Semi-structured data pertains to the data containing both the formats mentioned above, that is, structured and unstructured data. To be precise, it refers to the data that although has not been classified under a particular repository (database), yet contains vital information or tags that segregate individual elements within the data. Thus we come to the end of types of data. Lets discuss the characteristics of data.

Characteristics of Big Data

Back in 2001, Gartner analyst Doug Laney listed the 3 ‘V’s of Big Data – Variety, Velocity, and Volume. Let’s discuss the characteristics of big data.

1) Variety

Variety of Big Data refers to structured, unstructured, and semistructured data that is gathered from multiple sources. While in the past, data could only be collected from spreadsheets and databases, today data comes in an array of forms such as emails, PDFs, photos, videos, audios, SM posts, and so much more. Variety is one of the important characteristics of big data.

2) Velocity

Velocity essentially refers to the speed at which data is being created in real-time. In a broader prospect, it comprises the rate of change, linking of incoming data sets at varying speeds, and activity bursts.

3) Volume

Volume is one of the characteristics of big data. We already know that Big Data indicates huge ‘volumes’ of data that is being generated on a daily basis from various sources like social media platforms, business processes, machines, networks, human interactions, etc. Such a large amount of data are stored in data warehouses.

How much data used in companies

Google: 40,000 Google Web Searches Per Second

More than 3.7 billion humans have regular access to and use the internet. That results in about 40,000 web searches per second— on Google alone.

Furthermore, over half of all those web searches take place on mobile devices. It is likely the web search totals will continue to grow as more and more people get their hands on mobile devices across the world.

Facebook: 500 Terabytes Per Day

In 2012, Facebook’s system was generating 2.5 billion pieces of content and more than 500 terabytes data per day. There are just as many “likes,” photos, and data scans too. It was massive then, and it’s certainly grown over time.

Today, there are two billion active users on Facebook and counting, making it the largest social media platform in existence. About 1.5 billion people are active on the network per day, all generating data and content. Five new profiles join Facebook every second, and more than 300 million photos are uploaded, too.

Twitter: 12 Terabytes Per Day

One wouldn’t think that 140-character messages comprise large stores of data, but it turns out that the Twitter community generates more than 12 terabytes of data.

That equals 84 terabytes per week and 4368 terabytes — or 4.3 petabytes — per year. That’s a lot of data certainly for short, character-limited messages like those shared on the network.

Amazon: $258,751.90 in Sales Per Minute

Amazon generates data two-fold. The major retailer is collecting and processing data about its regular retail business, including customer preferences and shopping habits. But it is also important to remember that Amazon offers cloud storage opportunities for the enterprise world.

Amazon S3— on top of everything else the company handles — offers a comprehensive cloud storage solution that naturally facilitates the transfer and storage of massive data troves. Because of this, it’s difficult to truly pinpoint just how much data Amazon is generating in total.

Instead, it’s better to look at the revenue flowing in for the company which is directly tied to data handling and storage. The company generates more than $258,751.90 in sales and service fees per minute.

General Stats: Per Minute Ratings

Snapchat: Over 527,760 photos shared by users
LinkedIn: Over 120 professionals join the network
YouTube: 4,146,600 videos watched
Twitter: 456,000 tweets sent or created
Instagram: 46,740 photos uploaded
Netflix: 69,444 hours of video watched
Giphy: 694,444 GIFs served
Tumblr: 74,220 posts published
Skype: 154,200 calls made by users

How to resolve big data issue:

one solution to all problems =============> Distributed Computing

A distributed system consists of a collection of autonomous computers, connected through a network and distribution middleware, which enables computers to coordinate their activities and to share the resources of the system, so that users perceive the system as a single, integrated computing facility.

Let us say about Google Web Server, from users perspective while they submit the searched query, they assume google web server as a single system. However, behind the curtain, google has built a lot of servers which is distributed (geographically and computationally) to give us the result within few seconds.

Advantages of Distributed Computing?

Highly efficient
Scalability
Less tolerant to failures
High Availability

Hadoop is a Tool used for implementing Distributed Computing

Hadoop is an open source software programming framework for storing a large amount of data and performing the computation. Its framework is based on Java programming with some native code in C and shell scripts.

Hadoop Ecosystem is a platform or a suite which provides various services to solve the big data problems. It includes Apache projects and various commercial tools and solutions. There are four major elements of Hadoop i.e. HDFS, MapReduce, YARN, and Hadoop Common. Most of the tools or solutions are used to supplement or support these major elements. All these tools work collectively to provide services such as absorption, analysis, storage and maintenance of data etc.

Hadoop Distributed File Storage ( HDFS)

HDFS is the primary or major component of Hadoop ecosystem and is responsible for storing large data sets of structured or unstructured data across various nodes and thereby maintaining the metadata in the form of log files.
HDFS consists of two core components i.e.

Name node
Data Node

Name Node is the prime node which contains metadata (data about data) requiring comparatively fewer resources than the data nodes that stores the actual data. These data nodes are commodity hardware in the distributed environment. Undoubtedly, making Hadoop cost effective.
HDFS maintains all the coordination between the clusters and hardware, thus working at the heart of the system.

Thank you so much vimal sir & preeti mam for giving me such oppurtunity ARTH2020

要查看或添加评论，请登录

Mohan Venkatesh Ravipati的更多文章

How to launch Linux OS from EC2-instance as GUI ? And How customer benefits from Global Accelerator?

2023年2月22日

How to launch Linux OS from EC2-instance as GUI ? And How customer benefits from Global Accelerator?

Search EC2 in aws search bar and click on EC2 Click on launch instance Give Name of the instance In AMI search bar…

2 条评论
How sharechat uses AWS Cloud?

2023年2月20日

How sharechat uses AWS Cloud?

What is Cloud Computing ? In Simplest terms, cloud computing means storing and accessing the data and programs on…
RedHat Expert Session on Kubernetes and Openshift

2021年3月2日

RedHat Expert Session on Kubernetes and Openshift

USE CASES I LEARN IN THIS SESSION ====> Importance of container in current agile world ====> How container adoption is…
Industry usecases on Automation using Ansible

2020年12月29日

Industry usecases on Automation using Ansible

what is automation ? Automation is the technology by which a process or procedure is accomplished without human…

1 条评论
How sharechat uses AWS Cloud?

2020年10月1日

How sharechat uses AWS Cloud?

What is Cloud Computing ? In Simplest terms, cloud computing means storing and accessing the data and programs on…

See all articles

How much data is used in companies and how to resolve storage issue ?

Mohan Venkatesh Ravipati

DevOps Engineer @ Infosys | ?? AWS | ?? Kubernetes | ?? Docker | ?? Jenkins | ??? Terraform | ?? Git | ?? Linux | ?? Argo CD | ? Python | Red Hat Certified Specialist in Containers & Kubernetes

What is Data??

what is bigdata ???

Types of Big Data

Structured

Unstructured

Semi-structured

Characteristics of Big Data

1) Variety

2) Velocity

3) Volume

How much data used in companies

Google: 40,000 Google Web Searches Per Second

Facebook: 500 Terabytes Per Day

Twitter: 12 Terabytes Per Day

Amazon: $258,751.90 in Sales Per Minute

General Stats: Per Minute Ratings

How to resolve big data issue:

Thank you so much vimal sir & preeti mam for giving me such oppurtunity ARTH2020

Mohan Venkatesh Ravipati的更多文章

社区洞察

其他会员也浏览了

How Are Multinational Companies (MNCs) using Big Data?

Big Data Analysis

Data Science

Success with Data Lake?

What is BIG DATA?

Big Data, Giant information from Giant Data

Big Data Future

WHAT IS BIG DATA?

How the Problem of Big Data is managed.

What is Data??

what is bigdata ???

Types of Big Data

Structured

Unstructured

Semi-structured

Characteristics of Big Data

1) Variety

2) Velocity

3) Volume

How much data used in companies

Google: 40,000 Google Web Searches Per Second

Facebook: 500 Terabytes Per Day

Twitter: 12 Terabytes Per Day

Amazon: $258,751.90 in Sales Per Minute

General Stats: Per Minute Ratings

How to resolve big data issue:

Thank you so much vimal sir & preeti mam for giving me such oppurtunity ARTH2020

Mohan Venkatesh Ravipati的更多文章

How to launch Linux OS from EC2-instance as GUI ? And How customer benefits from Global Accelerator?

How sharechat uses AWS Cloud?

RedHat Expert Session on Kubernetes and Openshift

Industry usecases on Automation using Ansible

How sharechat uses AWS Cloud?

社区洞察

其他会员也浏览了

How Are Multinational Companies (MNCs) using Big Data?

Big Data Analysis

Data Science

Success with Data Lake?

What is BIG DATA?

Big Data, Giant information from Giant Data

Big Data Future

WHAT IS BIG DATA?

How the Problem of Big Data is managed.