Machine Learning Classification Confidence – How Confident Should You Be?

Machine Learning Classification Confidence – How Confident Should You Be?

If your kid got a 99% on an exam, he or she should feel pretty good about that result. However, should you feel good if your machine-learning algorithm reports 99% confidence for a specific determination or conclusion? Well, yes and no. And note that the 99% is an arbitrary number used throughout this post, not a statistic for machine learning algorithms.

What is machine learning at a high level?

There has been a lot of hype with machine learning, so let’s go over how machine learning works. In my current domain, I am exposed to computer vision applications more often, so much of my machine learning experience is with convolutional neural networks (CNN), and will talk about CNNs in this post. However, my comments generalize to machine learning algorithms. 

A good way to think of a machine learning algorithm is that it is very good at automatically finding signatures or patterns from many example source data (where “many” indicates the scale of big data), so that it can make a determination or conclusion when given a new example. For instance, a CNN takes the input of an image, and makes a determination, such as “is it an image of a cat?” It does this by evaluating many images of cats, and automatically finds the small signatures that a cat should have and where. When the CNN is given a new image, a 99% confidence means the new image exhibits the small signatures in the right place, for 99% of the signatures that the CNN determined should be there “to be a cat”. 

How does it find these signatures? The CNN starts with assigning an arbitrary probability to each small section of an image. With each example image it processes, it adjusts the probability, and starts converging on whether it might be a signature. So if you have heard a machine learning algorithm being a probabilistic approach, this is exactly why. And if you hear the hype of machine learning nearing the intelligence state of Skynet (from the movie Terminator), this is also why you should not believe the hype.

Why you should feel good about 99%

What makes a CNN algorithm significantly better than a deterministic approach, is that a deterministic approach requires that the human teach the machine what and where all those small signatures must be, and often we humans don’t know ourselves.   If a human can’t consciously recognize all the small signatures, we cannot possibly teach the deterministic approach to be very good. Even if a human can recognize all the small signatures, the deterministic approach would require way too much complexity in the code. The CNN algorithm can find those signatures automatically by learning from the many examples, and thus have seen reliability and efficiency of these algorithms in doing a much better job over its deterministic predecessors.

Having gone through many examples, the probabilities would have converged, resulting in many signatures. Then having many different unique signatures found, a 99% match against those signatures is feeling pretty good.

Why you should not be over-confident with 99%

Who says the probabilities have to converge? Some may not converge, or even converge optimally, and may result in few signatures. With much fewer signatures, a 99% confidence does not sound too good. However, the honest software developer would have told you that the machine learning approach has not worked well for the particular application.

While the above situation is easy to detect or verify, there are two scenarios which can be more difficult to deal with. The statistics are quite different on how often each scenario happens for the different applications and the different configurations of CNN.

  • False positives: when the CNN concludes the image is a cat (99% confidence), but it is not. Even if the CNN indicates that the many signatures are needed to prove the image is of a cat, it does not mean that there are no other things (in this known world) that exhibit the same signatures. The Google Photo incident of classifying a person as a gorilla is an example of a false positive. And unfortunately, CNN is a global algorithm, meaning it cannot easily handle exception cases. For instance, you cannot tell the CNN to take a slightly different and localized path to fix a specific problem such as a false positive.
  • False negatives: when the CNN concludes the image is not a cat (low % confidence), but it is. This usually happens when there is noise in the image that distorts the signature enough to be not considered a signature. This scenario can be bad if security applications can be fooled into missing the signatures. If you are a Star Wars fan, a non-machine-learning example of a false negative would represent a missed opportunity to capture the droids (see image).

A more appropriate number should be known, such as the classification accuracy. Classification accuracy is basically the percentage that we know for sure the CNN got things right when given large number of images to process, done through lots of time-consuming human verification. Getting this accuracy to 99% then would be much more comforting, but not still fool-proof. 

So where do we stand with machine learning?

In many applications, machine learning algorithms have significantly outperformed its deterministic predecessors, as the classification accuracy should prove. However, it is “not human intelligent” – it is just converging on probabilities.  And they cannot be assumed to be perfect (even with 99% classification accuracy), though it is noteworthy that many of the errors (false positives and false negatives) would not fool the human. As such, a good application understands the limitations, and would apply human intervention to avoid catastrophic outcomes from the errors. For example, if the application is to track and arrest specific felons, any hits should first be confirmed by a human (instead of an automatic arrest command without human confirmation) that it is indeed the felon they are seeking, to avoid false positives resulting in catastrophic outcomes in arresting the wrong guy. 

And does this mean it’s the end of the deterministic approach in AI? No – if there are small number of identifiable and unique signatures needed, the deterministic approach is likely more reliable and easier to manage.

要查看或添加评论,请登录

Andrew Woo的更多文章

  • Is This (Topic) on the Exam?

    Is This (Topic) on the Exam?

    I am sure this question has been asked often by high school and university students, so that students only need to…

  • Empathy - What’s the Point?

    Empathy - What’s the Point?

    As a person who is interested in better understanding leadership, I find that the word empathy pops up often in the…

    3 条评论
  • Dangerous Words in the Work Environment

    Dangerous Words in the Work Environment

    In the work environment, there are certain words said that may make your Spidey-sense go off. In fact, if those words…

  • To Code or not to Code

    To Code or not to Code

    I was reminiscing about one of my previous posts, where I was giving advice to a business student, who was trying to…

  • 3D Geometric Representations: There is no Silver Bullet

    3D Geometric Representations: There is no Silver Bullet

    Back in the mid-to-late 1980’s, a debate was raging in the ray tracing community. This was a time when ray tracing was…

    1 条评论
  • Situational Leadership Deployment Lessons

    Situational Leadership Deployment Lessons

    How I learned about Situational Leadership (SL): quite a number of years ago, I wanted to do a team-bonding session…

  • Myths and Truths of AR and VR

    Myths and Truths of AR and VR

    I was recently talking to someone who asked me what important AR/VR concepts should be conveyed to business students…

    10 条评论
  • How Many Realities Do We Want Anyway?

    How Many Realities Do We Want Anyway?

    I consider myself an expert in 3D graphics and augmented reality (AR), having done research, development, papers and…

    8 条评论
  • Training and Development – A R&D Leader’s Perspective

    Training and Development – A R&D Leader’s Perspective

    “Our Product is Steel; Our Strength is People.” If you lived in Ontario, you would recognize this slogan.

  • An Advisor’s Advice about Giving Advice

    An Advisor’s Advice about Giving Advice

    If you thought I was reminiscing about Frank Burns’ (from M*A*S*H) line of “it’s nice to be nice to the nice” when I…

    1 条评论

社区洞察

其他会员也浏览了