Testing Data and Drawing the Threshold (3/5)

Testing Data and Drawing the Threshold (3/5)

In my previous articles, I introduced Machine Learning (ML), training data, and the source of accuracy and bias and I made assertions about building “better” algorithms. Now, let’s unpack “better” and how to measure algorithmic performance.?

Remember, in this series, “chihuahua” can stand in for anything you seek to discover. You created a large sample of properly labeled data, the training data, and fed that to an algorithm, creating a chihuahua algorithm.?

The output of any ML algorithm is a distribution. Along the x- or horizontal axis, you have a measure of chihuahua-ness, sometimes referred to as the algorithm’s confidence in “predicting” the entity to be chihuahua. Along the y- or vertical axis, you have the count of entities.?

A simple graph distirbution shows chihuahuas and muffins, ranked by score of chihuahua-ness. Along the x- or horizontal axis, you have a measure of chihuahua-ness, sometimes referred to as the algorithm’s confidence in “predicting” the entity to be chihuahua. Along the y- or vertical axis, you have the count of entities.

Once the algorithm creates the distribution, the human must perform the single most important task:? draw the threshold. In my decade-plus of working with ML systems, this is perhaps the most misrepresented aspect of the art of deploying AI/ML technologies into high-consequence operational environments.?

Machines have no conscious awareness of right and wrong; humans must do this. How many images must be treated as “alerts” and sent for human review? A data scientist might say that the algorithm “predicted” what entity is of interest to the operator. But the prediction requires a threshold. And a threshold depends upon particular risk profiles and risk preferences. The machine only creates the distribution using training data provided by humans. The human makes the next move of drawing a threshold.??

In a small population, a person can easily identify the chihuahuas from the no-chihuahuas. But finding the sought-after pattern across the large data set, the ML goes back to work by using the labeled data and creating test data.?

A post-algorithm distribution graph shows 500 entities of chihuas and muffins. The graph displays the number of flagged chihuahua entities amongth the muffins. A threshold line is drawn between 7 & 8 (scores goes up to 10 on the x axis), which marks the probability of more flagged chihuahuas to the right of the threshold line.

In the image here of 500 entities in a post-algorithm distribution, the test data, only the labeled chihuahua images appear in color for the purpose of this article; the computer can “see” the label. The human-drawn threshold tells the system to treat scores of eight and above as-if chihuahua, and seven and below as-if not-chihuahua. Now, we can measure performance.??

First, we count True Positives, False Positives, True Negatives, and False Negatives.??

Above (right of) the threshold = Predicted Positive

Chihuahuas above the threshold = True Positive

Not-chihuahuas above the threshold = False Positive


Below (left of) the threshold = Predicted Negative

Chihuahuas below the threshold = False Negatives

Not-Chihuahuas below the threshold = True Negative


Looking at the image, every chihuahua above the threshold that is actually positive is called a true positive – the algorithm-human team got it right. Everything above the threshold that isn’t a chihuahua is a false positive.

Similarly, “not-chihuahuas” below the threshold are true negatives – a win for team algorithm-human. All chihuahuas below the threshold are false negatives – human traffickers and money launderers that evaded us, again. Counting and some simple math gets us to the measurement of accuracy:? effectiveness and efficiency. In the next article, I will dive into accuracy in more detail.?


This is the second article of a 5-part series. See my video short on this topic or read Article 2 here.

要查看或添加评论,请登录

Gary M. Shiffman的更多文章

社区洞察

其他会员也浏览了