登录查看更多内容

Understanding Intelligence: Humans vs Machines - Part 3

John Thompson

Director, Data Engineering, EY

发布日期: 2019年8月14日

That is the 3rd in a short series of articles about different types of human decision making processes and their computer equivalents, inspired by my reading of 'Thinking, Fast and Slow' by Daniel Kahneman. Part 1 discussed different types of human decision making processes - System 1 is always-on, experience and memory-based and instinctive, while System 2 is requires concentration and calculation. Part 2 compared System 2 to traditional procedural programming, and System 1 to so-called machine learning. In this article I will discuss some of the problems faced by System 1 and the similar problems faced by Machine Learning.

A Machine for Jumping to Conclusions

In his 'Thinking, Fast and Slow' Daniel Kahneman describes System 1 as a 'machine for jumping to conclusions'. System 1 is seemingly purpose-designed by evolution to make decisions very rapidly, comparing only a partial set of evidence to prior experience. It is understandable in evolutionary terms that this would be so. It is better to be safe than sorry and to over-react in circumstances where a false negative will have much worse outcomes than a false positive - when identifying potential threats or the presence of predators, for example.

The problem is that System 1 doesn't stop to consider the available evidence, and whether or not it is reliable, or whether there might exist other evidence that would perhaps alter the resultant decisions. System 1 jumps to a conclusion immediately, right or wrong.

The models created by machine learning techniques have a similar problem. They have (generally) been created by feeding historical 'training' data to an algorithm designed to find patterns in the data. The patterns found are mapped by the algorithm to create 'models' of the world, in which certain patterns observed in data will drive it to familiar conclusions.

While a data scientist can check the outputs, source more data or tweak algorithm parameters to influence the model, the final model itself is 'dumb' and cannot learn or seek alternative data. This means that they might well ignore significant information and 'jump' to the conclusion that their training has pre-determined is appropriate. Notions like self-doubt are generally absent.

Characteristics of System 1 and AI

Kahneman lists a number of prominent characteristics of System 1, some of which are listed below:

operates automatically and quickly, with little or no effort and no sense of voluntary control
executes skilled responses and generates skilled intuitions, after adequate training
creates a coherent pattern of associated ideas in associative memory
links a sense of cognitive ease to illusions of truth, pleasant feelings and reduced vigilance
is biased to believe and confirm
focuses on existing evidence and ignores absent evidence ('what you see is all there is')
sometimes substitutes an easier question for difficult one (heuristics)
overweights low probabilities
frames decision problems narrowly, in isolation from each other

These characteristics of the human mind are things that many data scientists will immediately recognise as equally characteristics of AI. A model that produces good outputs after adequate training based on the notion of finding patterns and associations is, after all, what Machine Learning is largely about.

Worrying Aspects

Some aspects are a concern. The idea that AI might be prone to providing 'illusions of truth' should worry many who are planning to rely on AI to solve serious problems. Similarly the likelihood of in-built bias to 'believe and confirm' should warn us to be doubly sceptical of results and to seek additional confirmatory evidence.

There are famous examples in AI of the model answering a different question to the one being asked. Marvin Minsky tells the story of a model intended to identify composers when shown sheet music. However the pages used in training the model had additional hand-written notes on the pages, and the model learned to identify the composers by these rather than the main musical score. Other (possibly apocryphal) examples include AI that learned to identify tanks based on the time of day that the aerial photo was taken, or that identified dogs and wolves by the surrounding landscape (snow/sun, woodland/urban).

A more worrying aspect is that in the real world it is very difficult to know when a model has given an incorrect answer. A model that is 99.9% accurate is pretty good, but when applied to a population of millions will still produce thousands of wrong answers. How significant this is depends on the types of decisions being made on the back of the model. If you are identifying likely purchasers of a new brand of aftershave, it is one thing. If you are trying to diagnose a terminal illness then it is another.

In illness diagnosis, assuming that early intervention would save lives, a false negative is potentially fatal. Equally, assuming that treatment is invasive and has severe implications, a false positive is no picnic either. In circumstances like this is might be preferable to make the assessment manually and to use an AI model in parallel to highlight results that need additional checking. That is to say, when the medic and the model agree, the diagnosis is probably good, but where the two disagree, perhaps a second (third?) opinion is advisable.

Conclusion

Many data scientists work hard to compensate for and minimise inherent weaknesses in their models, with varying degrees of success. However their masters in business, having drunk the Kool-Aid of analytics, may not be sceptical enough of the accuracy of AI's conclusions, and can pressure modellers to go live with an immature product. This is not to say that we should not trust the outputs of Machine Learning at all, just that we should not trust those outputs with the same confidence as we might treat the more determinative outputs of traditional procedural programming. Where there is no procedural equivalent, then Machine Learning, while not perfect, may well be the best alternative that we have.

Questions on Data Warehousing, Data Integration, Data Quality, Business Intelligence, Data Management or Data Governance? Click Here to begin a conversation.

John Thompson is a Managing Partner with Client Solutions Data Insights Division. His primary focus for the past 15 years has been the effective design, management and optimal utilisation of large analytic data systems.

Conor Daly

Senior Manager, Data Engineering at EY

5 年

Very nice article, makes you think about where the trust in this situation should lie.

要查看或添加评论，请登录

John Thompson的更多文章

Enterprise Data - its just plumbing, right?

2023年2月15日

Enterprise Data - its just plumbing, right?

When I started as a data consultant many years ago, my first solo assignment was to resolve a number of issues a small…

7 条评论
After Big Data

2022年12月13日

After Big Data

When Distributed File Systems came on the scene in the late noughties, everyone realised that something big was…

4 条评论
The Big Power of Small Data

2022年11月29日

The Big Power of Small Data

We have all been so bombarded in recent years with information about 'Big Data' that the value of 'Small Data' is…

1 条评论
When do you not need a Data Warehouse?

2022年4月12日

When do you not need a Data Warehouse?

‘Data Warehouse’ (DWH) is the term used for the last 30 years by both technicians and business stakeholders to mean…

2 条评论
Becoming Data Centric

2022年3月14日

Becoming Data Centric

I’ve spent the last two decades working with analysts to solve data problems in a systematic way and to create…
What is Data Entropy?

2022年2月8日

What is Data Entropy?

There is a common meme that LinkedIn regulars will know well. It shows a series of pictures of Lego, one with lots of…

6 条评论
Schrems II: What Does it mean for EU Data Processors?

2020年7月28日

Schrems II: What Does it mean for EU Data Processors?

The Schrems 2 case has been long running and much discussed and its ultimate findings, while still being digested, will…
How is Data Management Different from IT Management?

2020年2月18日

How is Data Management Different from IT Management?

In a season where the Liverpool football team is about to win the Premier League for the first time in 30 years, a…
Rise of the (Data Science) Robots

2020年1月23日

Rise of the (Data Science) Robots

I started out at university studying Molecular Genetics and for a long time considered doing a doctorate and building a…

5 条评论
Choosing a BI Tool

2019年11月11日

Choosing a BI Tool

Data reporting and visualisation ‘BI’ tools come in many flavours, with a bewildering variety of features to confuse…

7 条评论

See all articles

Understanding Intelligence: Humans vs Machines - Part 3

John Thompson

Director, Data Engineering, EY

A Machine for Jumping to Conclusions

Characteristics of System 1 and AI

Worrying Aspects

Conclusion

John Thompson的更多文章

社区洞察

其他会员也浏览了

Demystifying Machine Learning: A Guided Tour of the Top 10 Algorithms

Decision Tree

Why Big Data And Machine Learning Are Important In Our Society

Top 8 Machine Learning Algorithms Explained In Less Than 1 Minute Each

Understanding Machine Learning Algorithms: Training Time and Inference Time Complexity

What are LLMs capable of?

Regularization..

Predicting Wild Blueberry Yield: A Comparative Analysis of RF, GB, and XGB Models

How Machines Learn (and Why It Matters)

A Machine for Jumping to Conclusions

Characteristics of System 1 and AI

Worrying Aspects

Conclusion

John Thompson的更多文章

Enterprise Data - its just plumbing, right?

After Big Data

The Big Power of Small Data

When do you not need a Data Warehouse?

Becoming Data Centric

What is Data Entropy?

Schrems II: What Does it mean for EU Data Processors?

How is Data Management Different from IT Management?

Rise of the (Data Science) Robots

Choosing a BI Tool

社区洞察

其他会员也浏览了

Demystifying Machine Learning: A Guided Tour of the Top 10 Algorithms

Decision Tree

Why Big Data And Machine Learning Are Important In Our Society

Top 8 Machine Learning Algorithms Explained In Less Than 1 Minute Each

Understanding Machine Learning Algorithms: Training Time and Inference Time Complexity

What are LLMs capable of?

Regularization..

Predicting Wild Blueberry Yield: A Comparative Analysis of RF, GB, and XGB Models

How Machines Learn (and Why It Matters)