Difference between Big-Data and Data Analytics
Himanshu U.
International Business Leader - P&C, A&H | IPMI Product, Strategy and Risk Managmeent - Foyer Group | International Business Development (Asia, India and Europe) | Expert in Turnarounds & Transformational Growth
I often come across this question, isn’t big data today same as the analytics from the past? One has always used data to forecast and predict so what is new about big data?
To be fair, yes it’s related because analytics too obtains intelligence from the data and converts it into logical content with a pattern that allows the executives to make intelligent business decisions.
However there are 3 key differences between analytics and big data.
1) Volume of data: In the last two years alone, 90% of the world’s data has been created. It is expected that by 2020, 44 zettabytes will make up the entire digital universe. Machine-generated data will account for over 40% of internet data in 2020. If we look at the data produced by humans per day in 2020, it currently stands at 2.5 quintillion bytes per person, per day. Just to give a perspective, there are 18 zeroes in a quintillion. More data crosses the internet every second than were stored in the entire internet just 20 years ago. It is estimated that Walmart collects around 2.5 petabytes of data from its customer transactions every hour. A Petabyte is the equivalent of about 20 million filing cabinet worth of text. Further it is believed that 463 exabytes of data will be generated each day by humans as of 2025.
So where is all this data coming from? The answer is simple, through social media, video sharing, communications, entertainment websites and news outlets to name a few. By 2023, there are expected to be around 1.3 billion IoT subscriptions. In 2019, the number of active IoT devices was 26.66 billion. In 2020, is is expected that 31 billion IoT devices will exist.
Internet usage growth statistics tell us that people around the world are increasingly gaining access to the internet especially in the countries with large population. Logically, the number of internet users and search queries is also increasing and it is going to continue to grow. As of 2020 there are 4.57 billion active internet users around the world. 58.7% of people around the world have access to the internet. Growth of the internet statistics indicate that Asia has the most users in the world at 50.3%. Contrastingly, the Middle East (3.9%) and Oceania / Australia have the lowest at just 0.6%.
There are around 2 billion website on the internet of which less than 1.5 billion websites on the internet are active. Google processes over 3.5 billion search queries every day with about 56.5 billion web pages indexed. Each day, 350 million photos are uploaded to Facebook, 306.4 billion emails are sent, and 5 million Tweets are made. Facebook generates 4 petabytes of data every day.
This astronomical growth in data and data sources (that continues to grow) gives the companies an opportunity to work with a large pool of data in a single data set. This access to large data pool that is growing quickly allows the companies to build algorithms allowing managers to measure accurately and thus know about their business, their customers better. This is going to bring radical changes in the value of experience, the nature of expertise and the practice of management. Smart leaders across industries will see big data for what it is: a management revolution as pointed by Andrew McAfee and Erik Brynjolfsson from the MIT’s Center for Digital Business.
2) Velocity of Data: For most applications today the speed of data creation is more important than the volume itself. Real-time data is critical for:
Monitoring Consumer Behavior: - Real-time data has flipped the script when it comes to truly understanding buyers. 73% of consumers reported that they prefer to do business with brands that use their information to customize the shopping experience. With conversational marketing tools like chatbots or digital assistants, we can collect actionable insights in real-time unlike ever before. We can understand how each individual customer travels through the purchase journey. We know their wants, their needs and their pain-points.
Uncover Product Insights Faster: - In order to make strategic business decisions, real-time data is essential. Product trends can need to be measured over days or hours instead of just weeks or months. Insights given in real-time will unveil unidentified gaps in your product selection, so you can offer your customers only the best.
Catching Performance Issues: - Detecting issues and bugs within your website can be critical to your customer journey. If something goes wrong on your website, there’s only so much time before the dreaded destruction starts. Real-time data is our defense against retention drops, quick abandonment's and crashed sessions. By integrating a real-time data analytics tool, such complications can easily be avoided.
So Real-time information allows a company to be more agile than its competitors and make critical business decisions at a faster pace. Rapid and timely insights can provide competitive advantage.
3. Variety of Data: Variety refers to heterogeneous sources and the nature of data, both structured and unstructured. During earlier days, spreadsheets and databases were the only sources of data considered by most of the applications. Nowadays, data in the form of emails, photos, videos, monitoring devices, PDFs, audio, etc. are also being considered in the analysis applications. This variety of unstructured data poses certain issues for storage, mining and analyzing data.
Many of the most important sources of big data are relatively new, Facebook was launched in 2004, twitter in 2006. Smartphones and mobile devices have stormed every household largely within the last decade and provide enormous streams of data tied to people, activities and locations. First iPhone was launched in June 2007 and first iPad was launched in April 2010. These technologies are so ubiquitous now that one easily forgets their origin which was not a long ago.
The structured databases used by corporate's are not well suited to store the astronomical big data today. On the other hand the cost of all the elements of computing have been declining such as storage, memory, processing, bandwidth to name a few thus making the data-sensitive approaches more economical. As more and more business go digital, the new source of information, cost of data, the cost of equipment combined together push us towards a more data driven era.
Each of us are now a walking data generating tool. But all the data is not structured, a large part of it is unstructured. In addition to the size being huge, un-structured data poses multiple challenges in terms of its processing for deriving value out of it. A typical example of unstructured data is a heterogeneous data source containing a combination of simple text files, images, videos etc. Organizations these days have wealth of data available with them but unfortunately, they don't know how to derive value out of it since this data is in its raw form or unstructured format. That is where analytics comes into picture and allows them to filter the noise out of the unstructured data thus helping them with relevant information useful for informed decision making.
So do the data driven companies perform better than the ones that are not?
As per a research conducted by the MIT Center of Digital Business and McKinney’s business technology office, the more companies characterized themselves as data-driven, the better they performed on objective measures of financial and operational measures.
Big data is at once simpler and more powerful. In the words of Peter Norvig, the American computer scientist and the Director of Research at Google: “We don’t have better algorithms. We just have more data.”
How Much Is the Data’s Size?
1 byte equals 0.001 kilobyte.
1 kilobyte equals 1000 bytes.
1 megabyte equals 1000000 bytes.
1 gigabyte equals around 1000 megabytes.
1 terabyte equals 1,024GB.
1 petabyte equals 1,000 terabytes.
1 exabyte equals roughly 1,000 petabytes.
1 zettabyte equals around one trillion gigabytes.
1 yottabyte equals 1,204 zettabytes.
(Source: IORG, Raconteur, Statista, IWS, Omni Core Agency, Internet Live Stats, Kinsta, Handbook of Research on Cloud Infrastructures for Big Data Analytics, Safe at Last, HBR, Marketoonist.com)