登录查看更多内容

Machine Learning Classification Confidence – How Confident Should You Be?

Andrew Woo

Industrial Technology Advisor

发布日期: 2017年6月6日

If your kid got a 99% on an exam, he or she should feel pretty good about that result. However, should you feel good if your machine-learning algorithm reports 99% confidence for a specific determination or conclusion? Well, yes and no. And note that the 99% is an arbitrary number used throughout this post, not a statistic for machine learning algorithms.

What is machine learning at a high level?

There has been a lot of hype with machine learning, so let’s go over how machine learning works. In my current domain, I am exposed to computer vision applications more often, so much of my machine learning experience is with convolutional neural networks (CNN), and will talk about CNNs in this post. However, my comments generalize to machine learning algorithms.

A good way to think of a machine learning algorithm is that it is very good at automatically finding signatures or patterns from many example source data (where “many” indicates the scale of big data), so that it can make a determination or conclusion when given a new example. For instance, a CNN takes the input of an image, and makes a determination, such as “is it an image of a cat?” It does this by evaluating many images of cats, and automatically finds the small signatures that a cat should have and where. When the CNN is given a new image, a 99% confidence means the new image exhibits the small signatures in the right place, for 99% of the signatures that the CNN determined should be there “to be a cat”.

How does it find these signatures? The CNN starts with assigning an arbitrary probability to each small section of an image. With each example image it processes, it adjusts the probability, and starts converging on whether it might be a signature. So if you have heard a machine learning algorithm being a probabilistic approach, this is exactly why. And if you hear the hype of machine learning nearing the intelligence state of Skynet (from the movie Terminator), this is also why you should not believe the hype.

Why you should feel good about 99%

What makes a CNN algorithm significantly better than a deterministic approach, is that a deterministic approach requires that the human teach the machine what and where all those small signatures must be, and often we humans don’t know ourselves. If a human can’t consciously recognize all the small signatures, we cannot possibly teach the deterministic approach to be very good. Even if a human can recognize all the small signatures, the deterministic approach would require way too much complexity in the code. The CNN algorithm can find those signatures automatically by learning from the many examples, and thus have seen reliability and efficiency of these algorithms in doing a much better job over its deterministic predecessors.

Having gone through many examples, the probabilities would have converged, resulting in many signatures. Then having many different unique signatures found, a 99% match against those signatures is feeling pretty good.

Why you should not be over-confident with 99%

Who says the probabilities have to converge? Some may not converge, or even converge optimally, and may result in few signatures. With much fewer signatures, a 99% confidence does not sound too good. However, the honest software developer would have told you that the machine learning approach has not worked well for the particular application.

While the above situation is easy to detect or verify, there are two scenarios which can be more difficult to deal with. The statistics are quite different on how often each scenario happens for the different applications and the different configurations of CNN.

False positives: when the CNN concludes the image is a cat (99% confidence), but it is not. Even if the CNN indicates that the many signatures are needed to prove the image is of a cat, it does not mean that there are no other things (in this known world) that exhibit the same signatures. The Google Photo incident of classifying a person as a gorilla is an example of a false positive. And unfortunately, CNN is a global algorithm, meaning it cannot easily handle exception cases. For instance, you cannot tell the CNN to take a slightly different and localized path to fix a specific problem such as a false positive.
False negatives: when the CNN concludes the image is not a cat (low % confidence), but it is. This usually happens when there is noise in the image that distorts the signature enough to be not considered a signature. This scenario can be bad if security applications can be fooled into missing the signatures. If you are a Star Wars fan, a non-machine-learning example of a false negative would represent a missed opportunity to capture the droids (see image).

A more appropriate number should be known, such as the classification accuracy. Classification accuracy is basically the percentage that we know for sure the CNN got things right when given large number of images to process, done through lots of time-consuming human verification. Getting this accuracy to 99% then would be much more comforting, but not still fool-proof.

So where do we stand with machine learning?

In many applications, machine learning algorithms have significantly outperformed its deterministic predecessors, as the classification accuracy should prove. However, it is “not human intelligent” – it is just converging on probabilities. And they cannot be assumed to be perfect (even with 99% classification accuracy), though it is noteworthy that many of the errors (false positives and false negatives) would not fool the human. As such, a good application understands the limitations, and would apply human intervention to avoid catastrophic outcomes from the errors. For example, if the application is to track and arrest specific felons, any hits should first be confirmed by a human (instead of an automatic arrest command without human confirmation) that it is indeed the felon they are seeking, to avoid false positives resulting in catastrophic outcomes in arresting the wrong guy.

And does this mean it’s the end of the deterministic approach in AI? No – if there are small number of identifiable and unique signatures needed, the deterministic approach is likely more reliable and easier to manage.

要查看或添加评论，请登录

Andrew Woo的更多文章

Is This (Topic) on the Exam?

2020年8月21日

Is This (Topic) on the Exam?

I am sure this question has been asked often by high school and university students, so that students only need to…
Empathy - What’s the Point?

2018年3月24日

Empathy - What’s the Point?

As a person who is interested in better understanding leadership, I find that the word empathy pops up often in the…

3 条评论
Dangerous Words in the Work Environment

2017年9月13日

Dangerous Words in the Work Environment

In the work environment, there are certain words said that may make your Spidey-sense go off. In fact, if those words…
To Code or not to Code

2017年2月12日

To Code or not to Code

I was reminiscing about one of my previous posts, where I was giving advice to a business student, who was trying to…
3D Geometric Representations: There is no Silver Bullet

2016年12月27日

3D Geometric Representations: There is no Silver Bullet

Back in the mid-to-late 1980’s, a debate was raging in the ray tracing community. This was a time when ray tracing was…

1 条评论
Situational Leadership Deployment Lessons

2016年12月2日

Situational Leadership Deployment Lessons

How I learned about Situational Leadership (SL): quite a number of years ago, I wanted to do a team-bonding session…
Myths and Truths of AR and VR

2016年11月11日

Myths and Truths of AR and VR

I was recently talking to someone who asked me what important AR/VR concepts should be conveyed to business students…

10 条评论
How Many Realities Do We Want Anyway?

2016年7月25日

How Many Realities Do We Want Anyway?

I consider myself an expert in 3D graphics and augmented reality (AR), having done research, development, papers and…

8 条评论
Training and Development – A R&D Leader’s Perspective

2016年7月19日

Training and Development – A R&D Leader’s Perspective

“Our Product is Steel; Our Strength is People.” If you lived in Ontario, you would recognize this slogan.
An Advisor’s Advice about Giving Advice

2016年7月1日

An Advisor’s Advice about Giving Advice

If you thought I was reminiscing about Frank Burns’ (from M*A*S*H) line of “it’s nice to be nice to the nice” when I…

1 条评论

See all articles

Machine Learning Classification Confidence – How Confident Should You Be?

Andrew Woo

Industrial Technology Advisor

Andrew Woo的更多文章

社区洞察

其他会员也浏览了

TensorFlow - Aamir?P

The misguided intuition I had to unlearn to come to grips with modern machine learning

Mix It Up!!!!

Interpretability and Transparency of Machine Learning Models

Qdrant – Shaping the Future of Neural Search & Metric Learning

From Perceptrons to Transformers: The Swift Evolution of Machine Learning

Top 15 books to make you a Deep Learning Hero

Machine Learning Series

Feature Scaling & Normalization – The Effect of Standardization for Machine Learning Algorithm

Andrew Woo的更多文章

Is This (Topic) on the Exam?

Empathy - What’s the Point?

Dangerous Words in the Work Environment

To Code or not to Code

3D Geometric Representations: There is no Silver Bullet

Situational Leadership Deployment Lessons

Myths and Truths of AR and VR

How Many Realities Do We Want Anyway?

Training and Development – A R&D Leader’s Perspective

An Advisor’s Advice about Giving Advice

社区洞察

其他会员也浏览了

TensorFlow - Aamir?P

The misguided intuition I had to unlearn to come to grips with modern machine learning

Mix It Up!!!!

Interpretability and Transparency of Machine Learning Models

Qdrant – Shaping the Future of Neural Search & Metric Learning

From Perceptrons to Transformers: The Swift Evolution of Machine Learning

Top 15 books to make you a Deep Learning Hero

Machine Learning Series

Feature Scaling & Normalization – The Effect of Standardization for Machine Learning Algorithm