登录查看更多内容

The Risk of Losing the Art of Data Science

Tom Rieger

President, NSI, Inc. and President/CEO at National Business Innovations, LLC

发布日期: 2019年6月3日

Everyone is talking about the artificial intelligence or AI revolution.

For things like complex imagery analysis, making sense of vast amounts of data in well defined spaces, or pattern recognition, it does, in fact, have the potential to revolutionize decision making. But that does not mean that AI or machine learning solutions are what’s best for absolutely every analytical need.

In recent conferences, conversations with clients, articles and emails, more than one senior level decision maker has made it clear that in their view, “We need AI to do X, Y and Z! We need machine learning to do A, B, and C! We will lose our competitive edge unless EVERYTHING moves to an AI solution!”

That last sentence was an actual quote.

As absurd as it sounds, some discussions I have had imply a belief that data analysis, personal insight, and experience will become unnecessary, and that machines will know with perfect accuracy and Oracle of Delphi-esque prescience what questions to ask, how to answer them, and what the answers mean.

But here’s a few problems, even in cases where that might be possible. (1) That type of thinking is analytically backwards, and in some cases, more unduly expensive and time consuming than other types of analysis; (2) a machine is only as good as its programming, and requires context, critical thinking, and bias-free algorithms; (3) sometimes the rules and assumptions change.

Problem One: Analytical Backwardness and Cost Effectiveness

When you define and set an objective to use AI to accomplish a goal, you have just boxed yourself into an analytical corner. Now, you have to use it no matter what. In some cases, especially where there is a vast amount of rapidly changing and evolving data that is hitting you faster than you can comprehend (e.g., processing complex signals and imagery in a jet flying faster than the speed of sound), then yes, AI will likely be a great option to consider, and may be a big part of the solution. But in the end, it’s just a methodology, and in some cases, maybe other methods would work just as well but require less time, money, and effort to set up. Or, maybe other tools can help augment the results of the AI algorithms both on the front end, in driving ingest and queries, as well as on the back end, in helping the analyst to interpret results.

Maybe you don’t need AI at all to answer a particular question. In the past 30 years, my experience has been that roughly 85% of the data required to answer most analytical needs can be quickly and inexpensively provided by just running a table or setting up a dashboard. About 10% of the remainder may require some multivariate analysis. What’s left, that’s too big, complex, and rapidly evolving to use traditional means, could be a great candidate for machine learning or AI. Of course, these ratios will shift rather dramatically in different industries and applications, but when it is “good enough” and not overly complex, you should be able to run and interpret a table in minutes, without having to spend days, weeks, or even months building out that particular capability. Doing a crosstab or producing a dashboard and actually taking a minute to interpret the table may not be sexy, but it is quick, cheap, and effective in answering most of the questions a business intelligence analyst will likely face. Especially if the question is about something new, not yet programmed, or involving just a few variables. In other words, maybe you can swat that particular analytical fly with just a flyswatter.

Even when AI could result in a better decision, you still need to ask if the right data exist, if you have enough time and money to build it, and if the return on all that investment is worthwhile for that particular situation.

A better way to frame the problem is in reverse. You need to accomplish a goal using the fastest, most effective and most cost-effective means possible, and one of the ways you might do that could very well include AI or machine learning. Just semantics? Perhaps. But it opens the door to what’s best.

Problem Two: The Need for Context, Bias-Free Programming, and Interpretation

There are over 200 types of cognitive bias. We are all subject to these traps, often without even realizing it. There is even a type of bias for thinking we can be completely free of bias (bias blind spot). Building an algorithm based on your own decision rules and assumptions may, in some cases, inadvertently capture some of those biases. The more that you need to use inputs that are qualitative and/or subject to interpretation (especially those relating to motive, grievance, and intent), the more room there is for bias to enter the decision calculus. The interpretation of data is often context-dependent. Without knowing social practices, values, expectations, and reference points, you may risk making some very faulty assumptions.

Even if you build a completely bias free algorithm of human behavior, contextual factors still play a role. Let’s say you are building a sentiment scoring tool. Interpretation in one culture may mean something completely different in another. In social media, saying someone “tanked” in video games is a good thing, signifying a hero who took on all the adversity to fight to clear a path forward. In investment banking, however, saying something tanked is a very bad thing, possibly leading to financial ruin. Calling someone “dog” or “dawg” in social media may be a compliment in some parts of the USA, but it is a horrible dehumanizing insult in other cultures, in some cases used to justify nefarious intent. Pick any culture, and you can find cases of completely different interpretations of the same social data. I suppose in theory you can program every possible cultural nuance into your AI, but that would (a) require a massive investment, (b) be difficult to do well given all of the subject matter expertise you would need to bring on board, and (c) add more “degrees of freedom” or sources of potential error in your data. If you are targeting a particular population, however, cultural customization might be exactly what you need. But a machine would not know that instinctively ahead of time. Sometimes you just need good old-fashioned behavioral science to drive programming of those codebooks.

On the front end, at some point, every algorithm is originally built by a programmer sitting at a computer. All of us are subject to human limitations and unintentional error since we can’t anticipate every permutation (as was certainly the case with the Boeing 737 MAX). In addition, the person who queries the system needs to know how to write the query. They need to know how to avoid confirmation bias (only seek what you already think is true), availability bias (go with what’s easy or top of mind), and anchoring bias (too much emphasis on the first result or finding). If the initial assumptions are wrong or incomplete, it may lead to the need for drastically different sets of variables and tests. AI is not necessarily flexible enough to predict the unforeseen scenario without the foresight to originally program it in.

On the back end, there is always a need for analysis talent and investigative skill. Know how to test assumptions. Know how to interpret data. Know what information you need to meet the mission objective. Without these things, you risk garbage in-garbage out, or stretching to use a tool that may have been designed for a different purpose. Assuming that the algorithm will replace the need for judgment and critical thinking in interpreting results and creating recommendations is a dangerous and foolish path to follow.

I have evaluated literally hundreds of resumes for data science candidates over the last five years. Many of these candidates come with a CV bristling with every software proficiency you can imagine, and case study after case study of machine learning applications. But put a set of tables in front of them and ask what the data implies an organization should or should not do, and they are lost. They may have all the knowledge to run the technical side of data analysis, but if they have not been trained or given the chance, they may lack the training or experience to fully leverage what it all means. It is similar to training someone how to make all sorts fancy cuts with a table saw, but stopping short of teaching them how to build anything.

Problem Three: Sometimes, the Rules Change

Everyone was blown away by a computer beating the Chinese national champion at the game, Go. People were equally shocked at the defeat of renowned chess champion Garry Kasporov by the IBM supercomputer Deep Blue in 2017. And yes, these were seminal ground-breaking moments that showed the potential of these systems, in well-defined settings with set rules and assumptions that don’t change. The rules of these games are fixed and permanent. The programming for these games required no cultural adaptation or shifting situational factors, no new breakthrough game-changing events, no ability to break the rules, and no requirements to not place our own biased lens to interpret the data about someone else’s situation. The real world is not so clean and consistent. It is messy, subject to shocks, and is in a constant state of change. Just ask people who worked at Blockbuster Video, US Steel, Lehman Brothers, or Sears.

Another difference between chess and go versus strategic decision planning is that rules can be broken, and assumptions can become outdated. Rule breaking or doing something completely unexpected is often a very effective tactic. The French believed they built an impenetrable barrier in the Maginot line, comprised of artillery and reinforcements along the border with Germany, based on their best assumptions about how a war would likely progress. It was an infallible solution… until Germany simply went around it, rather than through it, moving more rapidly through rough surrounding terrain than had been anticipated. Shortly thereafter, Allied forces found themselves stranded at Dunkirk in the face of an advancing enemy.

When you make decisions solely based on pre-defined rules and algorithms, you become predictable. When you become predictable, over time, you become vulnerable. Relying too much on rules-based algorithms that replace too many types of decision-making raises the risk of creating future Dunkirks of different types and in different settings.

The basic lesson of history is that unless we learn from it, it will repeat itself. In some ways, that’s already happening.

Several years ago, “big data” was all the rage, and there was an implied belief that somehow this big pile of data was going to solve all of our problems overnight. While having all those data in a way that can be easily accessed did, in fact, solve many problems (especially with the explosion of social media), it was not a panacea, and no one really ever talks about big data much anymore. It was a little like saying, “We need nails! Lots of nails!” when you are trying to build a house. Sure, nails are necessary. But they are just pieces of metal until used properly in concert with other tools, wielded by skilled and experience hands. The same is true for data. It is not of value until it becomes information, and that transformation requires analysis, the proper analytical tools, and the experience to know what actions to recommend. AI and machine learning can absolutely help to make the right data more accessible and, in many cases, more meaningful, but these techniques are not in and of themselves a cure for everything under the sun.

The moral of the story is, give unto AI what AI is due, but give the rest unto brain power and intellectually inquisitive, informed, rigorous analysis. AI and machine learning are great and increasingly necessary tools to have in your toolbox, but if you treat them like the Great and Powerful Oz for absolutely everything, in some cases, you may be disappointed when you look behind the curtain at what they actually can do.

The Risk of Losing the Art of Data Science

Tom Rieger

President, NSI, Inc. and President/CEO at National Business Innovations, LLC

更多精彩文章

社区洞察

其他会员也浏览了

Big Data Analytics Big Data & AI

A quick guide on Artificial Intelligence for data designers and curious minds.

Retrieval-Augmented Generation (RAG) Techniques

Mapping the Data World with GraphRAG

The AI Vanguard: Charting and Navigating the Course of 'Data-Driven Ops' Machine Intelligence

Decoding Machine Learning: A Business Leader's Guide to Avoiding Common Misconceptions

Getting started with AI – how much data do you need?

Data is not information, information is not knowledge...

Unlock the Future NOW: Your Ultimate Guide to Thriving in a World Ruled by AI & Digital Transformation!

AI/ML Digest | Issue 32

Building Your Video Game Economy: It’s Not Just About Compliance. It’s About Maximizing Revenue.

2021年7月19日

Most Call Center Execs Believe Wait Time Satisfaction is Driven Mostly by Wait Time... They are Wrong.

2020年8月21日

The Hidden ROI of Video Game Support Functions

2019年3月2日

Next Generation Data Science: From Information to Insight

2017年3月9日

Thoughts on the President's Recent Executive Order on Use of Behavioral Sciences from our Founder and CEO, Dr. Robert Popp

2015年9月22日