Why you should never make a decision without Data

Why you should never make a decision without Data

A friend of mine shared this video from JSConf with me last week of Interactive Visualisations on massive data sets using Palentir:

Watching Tim Slatcher present, it reminded me of some thoughts I had a few years ago concerning the problem of interacting with very large data sets. Tim talks about how the UX for exploring data needs to be intuitive as well as responsive, scaling with data volumes. I think Tim and I share a lot of common ground on these points, but we approach it from two differing angles:

You see, the problem with computers can be traced back to when they were mechanical devices. Some inputs were given, a handle was turned, some complex calculation performed, and the result was output.

Data --> Compute --> Result

and depending on the volume of data and the complexity of the calculation:

Data --> … [crickets]… --> Result

In some cases the crickets can sound for minutes or hours (days at times) before the result is presented. But what happens during that wait?

Well, the human mind thrives on inadequate information. Aside from the finely tuned rational aspects of our brain, we have a set of other psychological functions which kick in when there is limited data, and help us short-cut to the answer. What I might call ‘gut’ instinct, you might say is ‘prejudice’ and vice versa. Minority groups throughout history will have experienced this problem. A crime is perpetrated and in the absence of evidence, the finger gets pointed to one of their number. It then takes an overwhelming barrage of facts to convince people against this initial judgement, if indeed (as is often), it has been made in error.

While these evolutionary functions are very useful to surviving in the wild, they sometimes get in the way as in the example above of administering justice, or indeed simple office based decisions.

I remember a story from a client who had invested a significant amount of money into a decision support system. The system was able to aggregate their sales data from stores around the world, and management were able to slice and dice this data by varying information such as product line, country, time of day etc. One day, they had a new product to launch, but they did not have the resources to do a simultaneous launch in every country they operated in. They ran a complex query on their data to work out the optimal territory to launch first. Interestingly and despite their investment in the technology, they ignored what the data was telling them to do; the product tanked, and they pulled the launch. A competitor launched a competing product six months later, and went on to corner the market. What went wrong? A data failure? Missing dimensions from the query? No. Just a bit too much pride and emotion.

You see, while the query was running, the exec in charge of this particular initiative didn’t bother waiting for the results. In fact, the moment he or she knew that they had to make a decision about which country to launch in, their brain was already busy computing an answer. By the time they asked their business analysts to do the research, they already had a hunch. When the data came back three days later, it was too late.

“The data is wrong”
“The system never gets it right”
“There are just some variables the computer hasn’t taken into account”

Sound familiar?

The problem with a gut instinct is that once it’s formed, we start to build an emotional attachment to it. This is very useful for surviving in the wild. No one escaped a woolly mammoth by being indecisive. The odd few who made poor decisions ran very decisively directly into the path of the beast, or indeed headlong into other sources of danger; but enough made a decent snap decision, stuck to their guns, and tens of thousands of generations later – here we are.

For humans to finally accept computers in the decision support process, they need to be able to guide us towards the answer faster than the speed of thought. Unfortunately, if you explain this to a computer engineer, you often end up chasing down the wrong path.

“We need to increase the server RAM to 128Gb”
“We need to re-write the database into a columnar datastore, but even then we will be limited to a couple of hundred billion rows if you need that response time”
“Sorry, we just need to reduce the amount of data. Let’s aggregate the values somewhere – the users will never notice”

Oh dear [sigh].

*          *          *

What I think Tim Slatcher has stumbled across is something which I think is a universal truth of computing: 

“You are only ever able to maximise two of these three factors:
1. Volume of Data
2. Speed of Response
3. Level of Accuracy
All data systems must compromise one factor in order to maximise the remaining two”.

 The problem with data processing design is that we all take #3 – Level of Accuracy for granted. It’s 100% in all cases. Therefore, as #1 - Volumes of Data rises, so #2 – Speed of Response falls. Our solution? Jack up the computing power; double the number of CPUs in the array.

Tim is one of very few Computer Scientists who are advocating a new way – sacrifice the level of accuracy for speed of response. Imagine a system where response time was a constant. For any given question, it must give a result in under half a second:

“Where shall we launch our new product?”, the user asks; “Not sure, just yet. I’ll probably need a few hours to give you a fully accurate answer. I'll keep you updated on my thoughts as I go, though:

What I can say already from skimming the data, is that it’s very unlikely to be a successful launch in Europe; Australia and Canada seem like the two stand-out choices”.

Suddenly, our system begins to sound more human. The big difference being, its answers are based on data, not on gut.

John Morton CENG FBCS CiTP FIoD

CTO | CIO | CDO | Transformation | NED | Board Advisor

8 年

Charles some great points here for those embarking on a new journey. Experience has shown that those who are already using analytics have already, somewhat unwittingly used the level of Accuracy. Consider what happens when you have developed your killer algorithm and then you try to operationalise it. Compnaies in the 80s, 90s and 00's hit the problem of it taking to long, they cut out variables , created summary data and "fitted" the workload to what the business and computing capability can do!!. Tim's point. Often the business didn't know they were getting skimmed resulkts - A smoother curve. When you assess a clients existing analytics architecture, often the low hanging fruit that reaps new customers and drives efficiency, Is moving the data and algorithms to newer infrastructure and unleashing the analytics on the full data. An Economic Intelligence Unit reportted that using data on effectively in your own organisastion can drive between 7 and 15% revenue (depending on your industry).

回复
Ved Sen

Head of Innovation, TCS UK & Ireland

8 年

Great post. Very thought provoking, thanks! While agreeing with your 2 out of 3 trade-off, I guess what's happening now is that collective improvements in database tech, computing power, and access to data, is shifting the entire curve forward. So the trade off is true at a point in time, but we've come to expect that 6 months later, the curve will have shifted and some trade off will have gone away. And between the futurists, the technologists, and decision makers, I wonder if there's enough appreciation of where the trade-off points are, today. Not knowing that may itself create unrealistic expectations. I'm going to go back and read all your recent posts! Let's catch up sometime soon.

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了