Big Data
Astha Goel
Salesforce Developer@Girikon Solutions| 2x Salesforce Certified | Salesforce | Sales Cloud | Apex | AI Enthusiast |KIET'23
Anything which is used as medium of communication is data.But in technical language till today what we can put inside computer is Data.
For Example:
Images,Text,Videos,Files,Audios,etc. but things like mood,emotions we can't put in computer so it's like not data for computer.
In Today's World,Information is Wealth!
There are multiple sources/forms for data
- Bar Code
- QR Code
- Survey Forms
- Google Forms
- Articles
- Blog
- Posts
All things are dependent on this small unit data.
Have you ever think how much data you can have in your phone?
I think yes,because we have limit in our mobile who tells how much data we can have in our phone and so if data exceeds more than storage so scenerio is like oh no! storage is full but how now I will put my data so we think not to worry I will transfer some of my data to other storage device so that I can store in my phone.
And It should be because Data is the basic necessity of Today's World.As world is growing up technically very faster so obviously data will always keep on increase and so we can see Day by Day Users on Internet is growing up exponentially !
See..How many no. of users increased in Just 1 Year!
2016
2017
According to Internet Statistics 2020,
There are 7.77 billion people in the world. (Worldometer), 4.54 billion of them are active Internet users. (Statista) i.e.,58.7% of people around the world have access to the internet.
The scenerio of Today's over Internet Usage is:
In US All age groups people use Internet in US.However,100% ofPeople between 18-29 Age Group use Internet.
Now you will think have only data is of that user which is connected to Internet,not of Others and what if users on the Internet increasing fay by day?
Main Thing is not that users on internet are increasing day by day and also it's not point that users who are on internet only have data...as I defined earlier those who haven't access to Internet also has data in form of photos,videos,emails,etc. Problem or u can say main concern is If no. of users who have access to the Internet are increasing exponentially so it means data is also increasing exponentially since who are connected to Internet will produce more data that is required to store.,in the databases of online platform like if we are on social media so what we want,account details,etc all will store in databases of that platform only,Our Activity is also a data for platforms we are on; so in this way those who are having connected to internet obviously produce more data to store.
All this data will only be treated as Binary Language because Computer understands Binary Language(symbols as 0 and 1) like we understand our Mother Tongue easily.
The smallest akshar/symbol in Binary Language is Bit.
Bit can be 0 and 1.
1Byte=8Bits
1KB=1024 Bytes=1024*8Bits=8192Bits
1MB=1024 KB=1024*8Bits=8192Bits
1GB=1024MB=1024*1024*8Bits=Bits
1TB=1024GB=1024*1024*1024*8=8388608Bits
1PB=1024TB=1024*1024*1024*1024*8=8589934592Bits
1EB=1024PB=1024*1024*1024*1024*1024*8=8589934592*1024Bits and so on..
Some of the Statistics related Data are below:
- 1.7MB of data is produced every second by every person during 2020.
- In the last two years alone, the astonishing 90% of the world’s data has been created.
- 2.5 quintillion bytes of data are produced by humans every day.
- 463 exabytes of data will be generated each day by humans as of 2025.
- 95 million photos and videos are shared every day on Instagram.
- By the end of 2020, 44 zettabytes will make up the entire digital universe.
- Every day, 306.4 billion emails are sent, and 5 million Tweets are made.
- Google processes over 3.5 billion search queries every day.
This whole can prove that how data is just increasing day by day exponentially!
And thus,Large Amount of Data is producing day by day revolving across globe on daily basis and so it is callled Big Data(means Data in Huge Amount).
Big Data also generates more problems like in;
Volume:
Big Data means Much Large Volumetc.More Volume leads to shortage of Storage like we had in our mobile phones when we crossed the limit of storing the data;So to store data of amount of TB,PB we require storage device of this size so is it possible and if by chance it is possible so it will cause a new problem of Input/Output with high speed.
Velocity:
Whatever we stores in the Storage it has to be loaded[OUTPUT] in RAM to perform any operation related to read,write or calculations.So,if we have big data to perform any such operations so it takes much time to load it into RAM and also if we want to save such huge data,the speed to save[INPUT] the data also decreases.
We daily or mostly uses some platforms.Have you ever think how much data they stores per day?
Let's See..How much Data the famous and Successful platforms manages Daily?
Facebook:
Facebook generates 4 petabytes of data per day—that's a million gigabytes.
Time in group calls (three or more participants) was up by more than 1,000% during the last month.
And why Not there are many benefits for facebook too due to clientsdata.
But how Facebook maintains all the data without facing any Big Data Problem?
- 500 million people use WhatsApp Status daily.
- There are 1 billion active daily WhatsApp users.
- Whatsapp has seen a 40% increase in usage that grew from an initial 27% bump in the earlier days of the pandemic to 41% in the mid-phase. For countries already in the later phase of the pandemic, WhatsApp usage has jumped by 51%.
Google now processes over 40,000 search queries every second on average, which translates to over 3.5 billion searches per day and 1.2 trillion searches per year worldwide.
Google uses its datacenters as well as collaborates with other datacenters to store their data. Each data center would cover an area of 20 Football fields combined.
And why Not....For every small thing we pick up phones and laptops and ask problems there.
But How Google manages all this data?
One wouldn't think that 140-character messages comprise large stores of data, but it turns out that the Twitter community generates more than 12 terabytes of data per day. That equals 84 terabytes per week and 4368 terabytes — or 4.3 petabytes — per year.
It was report of 2018.
Every day, 306.4 billion emails are sent, and 5 million Tweets are made.
It found that users posted 6.1 Instagram Stories per day, on average, an increase of 15% week-over-week. Stories’ impressions, meaning views, also increased by 21% during that time(in time of March15,2020 to March21,2020)
You can see how much people are connected to these platforms and suppose if only1 person stores 1KB data daily so think how much data stores daily over the globe.
Currently we have this much Big Data..
"If you aren't taking advantage of the data you are collecting and being kept in your business, then you just have a pile of a lot of data," Parikh said.
It is rightly said data is most precious thing in today's World so management of that Data is obviously indirectly or directly important for a company.
So,to keep information safely is the main purpose of Today's Company.
But still question is that How to store this Large Amount of Data?
So,Solution is Distributed Storage Cluster.
Hadoop is a specialised software for Distributed Storage Cluster Only!
Suppose where we are storing data has not sufficient storage so what we can do now?
By using Hadoop we make a cluster where slave nodes/Data Node contribute their storage to a main server/node name as Master/Name Node using Network.In this way Storage of Main Server increases in easy method.
This is Basic Structure to define Distributed Storage Cluster.
We can Share Computation also using this concept.
Some real examples to use this type of Cluster:
A place where google stores and handles all its data is a data center. Google doesn’t hold the biggest of data centers but still it handles a huge amount of data.
A data center normally holds petabytes to exabytes of data.
Google currently processes over 20 petabytes of data per day through an average of 100,000 MapReduce jobs(MapReduce is a core component of the Apache Hadoop software framework) spread across its massive computing clusters. The average MapReduce job ran across approximately 400 machines in September 2007, crunching approximately 11,000 machine years in a single month.
“Facebook runs the world’s largest Hadoop cluster" says Jay Parikh, Vice President Infrastructure Engineering, Facebook.
Basically, Facebook runs the biggest Hadoop cluster that goes beyond 4,000 machines and storing more than hundreds of millions of gigabytes.
DevOps, Cloud & Performance Engineer| DevOps Engineer
4 年Well done Astha Goel