Defining Big Data
Big data is a term that describes an immense volume of diverse data typically analyzed to identify patterns, trends, and associations. However, the term “big data” didn’t start out that way. In 1997, NASA researchers Michael Cox and David Ellsworth described a “big data problem” they were struggling with.
Their supercomputers were performing simulations of airflow around aircraft and generating massive volumes of data that couldn’t be processed or visualized effectively. The data were pushing the limits of their computer storage and processing, which was a problem — a big problem. In this context, the term “big data problem” was used more to describe a big?problem?than big data; NASA was facing a big, data problem, not so much a big-data problem.
A decade later, a McKinsey report entitled “Big data:The next frontier for innovation, competition, and productivity” reinforced the use of the term “big data” in the context of a problem that “leaders in every sector will have to grapple with.” The authors refer to big data as data that exceeds the capability of commonly used hardware and software.
Over time, defining big data has taken on a life and meaning of its own, beyond the context of a problem, to include the potential value of that data, as well. Now, big data poses both big problems and big opportunities. What Is Big Data?
Many organizations that start big-data projects don’t actually have big data. They may have a lot of data, but volume is just one criterion. These organizations may also mistakenly think that they have a big-data problem, because of the challenges they face in capturing, storing, and processing their data. However, data doesn’t constitute big data unless it meets the following criteria (also known as the four V’s):
A Real Big Data Problem
An interesting example of a big data problem is the challenge surrounding self-driving cars. To enable a self-driving car to safely navigate from point A to point B without running over pedestrians or crashing into objects, you would need to collect, process, and analyze a heavy stream of diverse data, including audio, video, traffic reports, GPS location data, and more, all flowing into the database in real time and at a high velocity. You would also need to evaluate which data is most reliable; for example, the historical data showing that the left lane is open to traffic, or the live video of a sign telling drivers to merge right. Is that person standing at the corner going to dart out in front of the car or wait for Walk signal? Whether the driver is human or the car is navigated by big data, a split-second decision is often required to prevent a serious accident. A driverless car would have to instantly process the video, audio, and traffic coordinates, and then “decide” what to do. That’s a big data problem.
Solving Big Data Problems
Technology is evolving to solve most big data problems, and the cloud is playing a key role in this process. The cloud offers virtually unlimited storage and compute, so organizations no longer need to bump up against limitations in their on-premises data warehouses. In addition, business intelligence (BI) software is becoming increasingly sophisticated, enabling organizations to extract value from data without requiring a high level of technical expertise from users.
Still, many organizations struggle with data problems, both big and small. Some continue to struggle to meet storage and compute limitations simply because they are reluctant to move their on-premises data warehouses to the cloud. However, most organizations that struggle with data simply don’t know what to do with the data they have and the vast amounts of diverse data that are now readily available. Their problem with data is that they haven’t developed the culture of curiosity and innovation required to put all the data available to good use. In many ways, this shortcoming in organizations poses the real big data problem.
Frequently Asked Questions
What is Big Data?
Big Data refers to extremely large data sets that may be analyzed computationally to reveal patterns, trends, and associations, especially relating to human behavior and interactions. The amount of data involved is so vast that traditional data processing software is inadequate to deal with it.
领英推荐
How does Big Data work?
Big Data works by collecting a large amount of data from various data sources, and then using advanced analytics, data management, and machine learning technologies to process, analyze, and extract valuable insights from the data. This process often involves the use of data lakes and big data tools to handle and analyze big data efficiently.
What are the different types of Big Data?
The different types of big data include structured data (organized and easily searchable data such as databases), unstructured data (data that doesn't have a predefined data model such as text, images, and videos), and semi-structured data (data that doesn't conform to a fixed schema but has some organizational properties, like JSON or XML data).
What are some challenges associated with Big Data?
Some challenges associated with big data include data quality and management, data privacy and security, the need for specialized hardware and software, the complexity of big data analysis, and the shortage of skilled professionals like data scientists and data analysts.
What is Big Data Analytics?
Big Data Analytics is the process of examining large data sets to uncover hidden patterns, correlations, market trends, customer preferences, and other useful business information. It involves using advanced analytics techniques and tools to analyze big data and extract actionable insights.
What is a data strategy in the context of Big Data?
A data strategy is a plan for collecting, managing, and analyzing big data. It defines how an organization will use data to achieve its business goals. It includes data management practices, data collection methods, and the utilization of big data technologies to analyze data effectively and derive valuable insights.
How can businesses use Big Data?
Businesses can use big data to improve decision-making, optimize operations, enhance customer experiences, and drive innovation. By analyzing big data, businesses can gain insights into market trends, customer behavior, product performance, and other critical factors that can influence their business strategy and outcomes.
What is the history of Big Data?
The history of Big Data is marked by the development of new technologies and tools designed to handle large data sets, from traditional databases to modern big data solutions and analytics platforms that make big data analysis possible.
This is my weekly newsletter that I call The Deep End because I want to go deeper than results you’ll see from searches or AI, incorporating insights from the history of data and data science. Each week I’ll go deep to explain a topic that’s relevant to people who work with technology. I’ll be posting about artificial intelligence, data science, and data ethics.?
This newsletter is 100% human written ?? (* aside from a quick run through grammar and spell check).
More Sources
Lee Ngugi | Researcher & Project Manager | Passionate about People and technology
4 个月Indeed Doug, big data is frequently misunderstood. My thought: it's same with LLMs, there is now an attempt to design smaller models that can solve context-specific issues!
International Leadership | Technology, Innovation, Digital, AI | Growth, Transformation & Winning | Business Advisor | People & Diversity. Talks about #leadership #AI #Digitaltransformation #growthmindset
4 个月Good reminder
?????
4 个月Good luck ??
Data Governance Coordinator and Evangelist/DAMA-DMBok/MDM Mgt/ DATA-DRIVEN/BI/ Project Mgt/ Data Lineage and Data Catalog from Collibra/ AWS Cloud/Azure Purview/ Data translator #digitallovers
4 个月Therefore, data governance is currently imperative.
Insightful