How Much Big is Big Data?
Defining big data
Before delving into the question, let’s discuss the difficulty of defining what it actually means.
There is no official definition, of course. What one person considers big data may just be a traditional dataset in another person’s eyes.
That doesn’t mean that people don’t offer up various definitions for it, however. For example, some would define it as any type of data that is distributed across multiple systems.
In some respects, that’s a good definition. Distributed systems tend to produce much more information than localized ones because distributed systems involve more machines, more services, and more applications, all of which generate more logs containing more information.
On the other hand, you can have a distributed system that doesn’t involve much. For instance, if you mount your laptop’s 500-gigabyte hard disk over the network so that you can share it with other computers in your house, you would technically be creating a distributed data environment. But most people wouldn’t consider this an example of big data.
Another way to try to define it is to compare it to “little data.” In this definition, it is any type of information that is processed using advanced analytics tools, while little data is interpreted in less sophisticated ways. The actual size isn’t important in this definition.
This is also a valid way of thinking about what it means. The problem with this approach, however, is that there’s no clear line separating advanced analytics tools from basic software scripts. If you define it only as information that is analyzed using Hadoop, Spark, or another complex analytics platform, you run the risk of excluding from your definition datasets that are processed using R instead, for instance.
So, there’s no universal definition, but there are multiple ways to think about it. That’s an important point to recognize because it highlights the fact that we can’t define it in quantifiable terms alone.
3- SOURCES OF BIG DATA-
a)- Social Networks- They provide human-sourced information from:
- FACEBOOK- Its system processes 2.5 billion pieces of content and 500+ terabytes of data each day. It’s pulling in 2.7 billion Like actions and 300 million photos per day, and it scans roughly 105 terabytes of data each half hour.
- GOOGLE- It processes over 40,000 search queries every second on average, which translates to over 3.5 billion searches per day and 1.2 trillion searches per year worldwide.
- YOU TUBE- Approximately 76 PB of video data is stored in Youtube every year.
- IoT- There are 2.5 quintillion bytes of data created each day at our current pace, but that pace is only accelerating with the growth of the Internet of Things (IoT).
b)- Traditional Business Systems- Offer customers services or products:
- Commercial transactions.
- Banking/stock records.
- E-commerce.
- Credit Cards.
- Medical Records.
etc.