Big Data is dead. Data is “Just Data,” regardless of quantity, structure, or speed.

Big Data is dead. Data is “Just Data,” regardless of quantity, structure, or speed.

Innovation cycles aren’t the only things getting shorter and shorter. The hypes associated with certain terms also keep alternating more and more quickly. This is especially true in the environment of new technologies and digital business models. The IT consultation and market research institute Gartner removed the term “Big Data” from its much-noticed Hype-Cycle back in 2015. A look to the Google Trends shows how precisely the prediction matched the end of the hype around Big Data. For instance, here is a direct comparison of the search terms “Machine Learning” and “Big Data”:

Right when the catchphrase “Big Data” has just made it into the consciousness of many decision makers and boardrooms, it must now be clearly established: Big Data is “dead.” Just like Gartner, we at Alexander Thamm GmbH have determined that things are essentially different when it comes to data science projects: For us Big Data, Small Data, Little Data, Fast Data, and Smart Data are all “Just Data.” The critical success factors for the use of data do not depend on its quantity, structure, or speed – it’s about using data to create true added value!

Successful data science projects without any Big Data

We see that data science projects can be successful without any Big Data in our everyday practice. When a premium automotive manufacturer came to us with the task of increasing the repurchase rate in the leasing area, we were faced with the challenge of predicting the time of the repurchase. The problem that had thus far faced the auto retailer was that the customer approach often took place at the wrong time.

In order to increase the accuracy of the prognosis, that volume of data didn’t just increase. In fact, during the analysis we instead noticed that the data pool itself was responsible for the inaccuracy of the predictions. Our model, which is based on diagnostic and vehicle data, didn’t just lead to the manufacturer correcting 25 percent of implausible entries and being able to approach the customer at the right time. At the same time, unreliable retailers were able to be identified and their processes sustainably improved using top retailers’ best practice methods. This case shows that forecast quality does not depend on the volume of the data. Just Data means that, above all, the right data needs to be incorporated into analysis.

Just Data mindset facilitates focus on the relevant data

Another case had to do with improving the accuracy of the former forecast model for a customer from the energy sector. Energy producers need to know very precisely how high the current load is in order to adjust the power input to demand as exactly as possible. Too low or too high of a power supply can carry fines for the power suppliers. It’s therefore necessary to keep these penalties as low as possible. 

Our solution was based on a deep learning algorithm to improve the prognosis model. In the previous model, only the temperature was incorporated into the weather data. We expanded the weather data with additional parameters such as humidity, air pressure, and sun intensity. This way we were able to achieve significant improvements in the forecast and create a high degree of automation. If we had instead expanded the data sets for the current load and used data recorded over the past 30 years minute by minute for more precise prognoses, the model would have taken too long to calculate and the quality of the prognosis would only have been marginally improved.

As an example, in the following graphic you can see how the accuracy of a model only minimally increases with the addition of data from a certain point. Nevertheless, disproportionately high costs for corresponding computing capacities accrue in order to process these larger volumes of data. In many cases it’s therefore not worthwhile to increase the accuracy of a model by expanding the previous data sets. 

As an example, in the graphic you can see how the accuracy of a model can only be minimally increased by increasing the data. 

On the origin and the meaningfulness of the term “Big Data”

The term “Big Data” emerged in a time when it was becoming more and more difficult to process the exponentially growing volume of data with the hardware available at the time. From the beginning on, the Big Data phenomenon comprised more than just the volume of data. Rather, it designated an entire ecosystem. This is why the talk of the “Vs” of Big Data became established. Over the course of time, the concept became more and more refined. Initially, the Big Data ecosystem was described with three Vs: Volume, Variety, and Velocity. This concept was very quickly further expanded so that it soon was four Vs, then five Vs, then seven Vs, nine Vs, and finally ten Vs

At this point the question needs to be asked of whether the term “Big Data” still actually makes sense, or if the concept hasn’t long since become completely watered down and indistinct. The variants Small Data, Little Data, and Smart Data only represent rescue attempts for a concept that really isn’t needed anymore today. Now is the time to fundamentally reconsider the term “Big Data” and its variants and, because their definitions have become inconsistent, unclear, and unnecessary, to throw them overboard. From here emerges the crucial question of what the essential core of Big Data is or was, and what part of it is really relevant.

What actually is Big Data, at its core?

As already mentioned, Big Data was never really about the largest quantities of data possible. It was more about selecting relevant data for the respective application case, cleaning it up, and evaluating it with corresponding methods. Admittedly, it may regularly result in the data volumes being large. However, that isn’t automatically the deciding feature for successful data science projects. This is why, in many cases, companies primarily have access to such large data volumes, because they gather data at any cost. Their hope is to obtain strategic advantages from seemingly unrelated masses data, similar to top players Google, Amazon, Facebook & Co., or even the NSA. The result is gigantic data lakes in which the companies gather all possible structured and unstructured data.

However, concentrating on the quantity of data often obscures the essential essence of Big Data projects: the analytical handling of data – namely “Just Data.” Those who dedicate themselves to this task, broken down its essentials, will very quickly realize that the factors critical to the success of such projects aren’t exclusively of a technological nature. In order to transform data into valuable information, companies also require a corresponding mindset, which concerns the entire corporate culture.

Just data: Independent of the quantity, structure, and speed

Regardless of its quantity, structure, and speed, data is simply “Just Data.” Much more important that the idiosyncrasies of the data itself is properly defining the business case, embedding analysis projects into the environment of an organization, and selecting the right analytical method. This is why we’ve developed the Data Compass for the execution of data science projects.

The success of data projects frequently depends on factors that aren’t technical in nature. Companies need to have a certain learning culture to better understand certain contexts using open and inclusive (learning) processes. It sounds paradoxical: Big Data may be dead, but precisely that fact represents a major opportunity for data science projects. When we move our concentration away from the buzz phrase “Big Data,” we reach the really crucial question. That is: How can companies and organizations create added value from data?

In our data science workshops, we help companies to answer this question and develop new, data-driven business models. Learn more: https://www.alexanderthamm.com/workshops/


Lawrence Fernandes

Enterprise Sales Engineer at Actian

3 年

Excellent article Alexander! The focus should always be on obtaining usable, actionable information from data, in order to suport?business decisions. In fact, I even find the names of our profession misleading: too much enphasys on data, too little on information.

Giuliano Guerrini

Consulente presso Edimp

5 年

nice point of view , probably we must consider the possibility to get real time? communication among data produced every instant , to do what !?? .... there is so much space to imagine..

回复
John Shramek

Geospatial Project Leader, Waterfall + Agile

6 年

Love it - from Big Data, to big data, to it's all just data, so true

Bob Hazelton

Semi-retired and”Livin’ the dream”

6 年

Thank you for putting into well written article what I have been saying for several years. When no 2 people can provide a definition of "Big" that is similar then there is a lot of hype propping up that concept.

Mathieu Landry

Free Strategic Thinker | Mineral Exploration Targeting Specialist | Founder of Explospectiv

6 年

A good overview of the state of the hype surrounding #bigdata and a good core message Alexander! The purist in me agrees that data is just data, no matter the quantity, diversity & speed at which it is generated and analyzed. However, the taxonomist in me recognizes that big data is a real categorization of phenomenon – we have seen recent headlines exclaiming that up to 90% of all data have been generated in the last two years – something has changed; with technology, and with our behavior towards ‘measuring’. So, for me, the original three V’s (Volume, velocity & variety) still stand as a good definition of Big Data. It is the other added V’s IMO that are more fashionistas trying to overdo the previous iteration. Most of these V-newcomers are really fundamental attributes to any data speed & size, like veracity and variability…what big data allows is the smoothing of errors or disparity so analysis techniques can actually find value through the ‘noise’. The graph is good and interesting, and it is to be expected that an upper limit exists. However, if looking only at the 0-100k window, the gain in volume is evident. In fact, the biggest hidden value often comes from linking diverse data to reconstruct a richer context of reality. So, I must disagree with your following opinion: “As already mentioned, Big Data was never really about the largest quantities of data possible. It was more about selecting relevant data for the respective application case, cleaning it up, and evaluating it with corresponding methods.” Selecting the relevant data, cleaning it up and evaluating it is good practice with any dataset, no matter its “morphology” (big or small). Actually, this is the essence of the scientific method. Thus, in my view, the real misnomer is ‘data science’, which is really a pleonasm. What is new is that other fields such as marketing, operations, etc. now are discovering the power of data and good analysis. They’re discovering the scientific method applied to their business. So, there’s data, and there’s science to make sense of it. Of course, I agree with your premise that Big Data is not always necessary to find value, especially relative to available resources. But I feel one should start small and think big. The more data we have about something, the more we can discretize its makeup to understand its existence and context…as long as the adequate analysis capabilities are in place. Perhaps Big Data is dead. If so, make way for ‘Bigger Data’…what we need is smarter analysis, and more scientists!

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了