Facts About a Fantasy: Data, It’s Relevance and Variability
In this post I am trying to share my observations of the rising ‘passion for facts’, need-for-data and how they are being fit into the larger scheme of data-driven decision making philosophies.
Almost all pieces of information that can be classified as facts tend to hold a variable nature. Population for example, is something that would essentially include a definite number. However, the day after the counting is done, it doesn’t remain TRUE. This is because the number the next day does not account for the babies born during the day of the counting. This means even the strongest of data tend to be an estimate except they are often popularly perceived as FACTS depicting precise state of a system. The asterisk symbol and footnote go along with the data but their font color remains in the ‘no-color’ mode. This means we need to remind ourselves every time we see DATA as FACTS. Not all data are facts and not all facts are precise.
When it comes to data being used for decision making, it is a whole different ball game. Depending on what the decision is, the type of data, amount of data and a meaningful representation of the same might be required [among a host of other things which are beyond the scope of this post]. Some data provide qualitative information. Some provide quantitative information. A carefully designed representation might even deliver a clever insight providing a perspective. What needs to be remembered is that some data might give only a direction, some might give a pattern from the past and some might also give out a bit of the background from the past [depending on how it is cleaned and represented].
I see a rise in the popular opinion that decisions have to be based on DATA and DATA need to be absolute FACTS. What I am concerned about is the disturbingly high state of ignorance to the concept of RELEVANCE of DATA to FACTS and subsequently to the DECISION to be made. Both are very different things and any similarity is merely a coincidence. There could be so many ways of explaining this but I choose the following as I see this as most popular. Anything forward is almost always assumed as a product of the past and it gets even trickier when the PAST includes just the immediate past. This might work right to a certain extent within the engineering domain where investigations deal with material properties and their change over time due to an exposure to certain set of environmental changes. However it is not something that can be assumed as a standard. The situation gets even more trickier when the substantiation of such an approach includes “that’s what everyone does. Even the XXXXXX do it this way…”.
Here’s an example:
Look at these charts/data sets [a partial display of a output from public domain for the sake of this discussion]:
The red line is steadily growing, one small hiccup and that’s about it. Sounds very promising. The blue line is all over the place as it begins and somehow manages to get close to the red line but drops towards the end and is last seen in what might seem to be a dying phase. The timeline covered here is between 2-3 years which makes sense from an ‘immediate past’ perspective.
Overall the red line seems strong and reliable but the blue line seems erratic and unreliable.
Now look at the same graph with the other parts included:
The red line dropped and the blue line spiked.
Had the decision makers stuck to PURE FACTS FROM IMMEDIATE PAST, they might have decided and in essence designed so many surprises for themselves which they have to eat and endure in the coming years. Imagine what the business would have gone through if such fact-based-decisions had been executed. Yes, I asked you to imagine. It is not a crime. [I wonder what I should present as a FACT to substantiate this statement]
Now see this data plot, again partial for the sake of the discussion:
The blue line shows a decline and the grey line shows growth. However the blue line indicates a big player while the grey line indicates a smaller player, less than half the size of the bigger one. If we plot this as histogram, the decline of the blue line will seem much more positive. The grey line then will have shown something that is stuck to an envelope owing to limited capacity/capability. The blue line still shows the inherent capacity to raise the big bucks, much more reliably than the grey line. Even here, the timeline under consideration is 5 years, which is a strong contender for the ‘immediate past’ theory.
Now look at the full data-plot:
The grey line is pretty much an industry on its own and the blue line something not many would remember interacting with. Had the decision making process used the historical ability to raise revenues and assumed that to be a ‘Indicator of Capacity and Expertise’, imagine what the consequences could have been. Again, operating on the edge of business logic, please imagine what could have happened.
These are just two hypothetical circumstances that I, operating within my limited cognitive abilities, could simulate for you in a unranked and unpromoted LinkedIn long form post. There are so many such and much more critical circumstances out there where such passion for ‘exclusively-past’ and ‘exclusively-factual’ DATA are driving business decisions. If things fall in place by themselves, there will be no surprises. If not, we shouldn’t be bothered about the seat-belt. We are anyways heading at 150mph against a tree with a board that reads “Come to Big Daddy!!!”.
The question is: At what point do we admit?
There are much funnier circumstances where data for a specific instance from the past is alone considered valid. Even when factual, a single point sample can hardly define existence, never mind a direction. For decision making, especially when it covers finance, we need a combination of past facts, current projection and future estimates. Most importantly, the story behind all the three parameters. The past, as the name suggests will indicate the travel so far, the current projection will indicate how the system is heading right now and the future estimates tell the travel moving forward. The story behind tell the missing pieces of the story covering the mechanics of the travel. Both together cover both the probabilities of occurrence and the potential impacts. As we change the story behind, the data for the projection/estimates will change accordingly. If we change the scope, even the past data points will change significantly. It all depends on what decisions we are looking to make and ascertaining the level of data that is required.
The art of data-driven decision making is taking the world by a storm. There is so much happening in the name of data-science, fact-based-insights and data-mining. Artificial intelligence is just taking these trends to whole new levels. The dark side of the story, from my perspective is, we are saying yes to past-data alone allowing only a little room to a handful of statistical means that are classified as ‘well-established.’ The concept of data is just at a whole new level of metamorphosis where anything that has a mention in a URL is considered a fact. If the URL is discredited as ‘fake news’ then everything it has becomes fake and useless. What I find appalling is that we have taught ourselves to ignore the facts that for $5 anything can be made live and catchy and flashy on a URL. A more respectable format of the same might take $960 or so.
By embracing systems we assume are of ‘higher intelligence’ and ‘self-learning’, we humans are making a dangerous mistake of saying no to rational thought, critical thinking and most importantly, creativity. What if something drastic happens and a certain data-source goes from high value to nothing as a consequence. Is the system designed to accommodate obsolescence? If yes, how will it change the way it decides? Dropping a variable from an equation logistically means we are letting the self-learning system commit an electronic suicide, the consequence of which has to be faced by everyone, connected and not. Is it rational to defend such a risk with the statement ‘Every new innovation comes with that risk?' If yes, then that concludes that even for the highest class of self-learning systems, simulating human intelligence is nothing but the good-old TRIAL-AND-ERROR method! Since it is electronic/digital these days, the brute force version of the same takes relatively less time and that probably is being grossly misunderstood as the modern miracle, a consequence of which has everyone running after it to embrace.
Hundreds of data points from the past can help paint the past rather clearly but to decide for the future, we need to project the current standings and estimate the future with a forecast. That is the right way of predicting the future. When we build a strong methodology, refine its weaker elements and validate its working and output, we ensure a strong data-driven decision making tool. The art of decision-making still remains as a human-interpretation exercise that no data or data-driven-self-learning system can ever replicate. The bottom line is, if we need to decide for the future, we must be looking into the future, cognizant of the fact that the data we are looking at has an element of history, future-estimates and most importantly the variable nature. Just my thought.
What do you think? Please share your views in the comment section should you have any.
Thanks for taking time to read my post.
Best regards,
Arun