登录查看更多内容

Deep learning: human-level performance

Chen Yang??????

Machine & Deep Learning | Big Data Cloud

发布日期: 2018年3月3日

In the last few years, there were a lot of talks about comparing the machine learning systems to human-level performance. Because of advances in deep learning machine learning algorithms are suddenly working much better and become much more feasible in a lot of application areas to actually become competitive with human-level performance, second, it turns out that the workflow of designing and building machine learning system is much more efficient when you're trying to do something that humans can also do, so, in those setting it becomes natural to talk about comparing or trying to mimic human-level performance.

When you're working on a problem progress tends to be relatively rapid as you approach human-level performance but then after a while, the algorithm surpasses the human-level performance and then progress in accuracy actually slows down maybe it keeps going better but after surpassing human-level it can still be getting better but the slope of how rapid accuracy is going up often slows down. And hope it achieves some theoretical optimal level of performance human-level. And over time as you keep training your algorithm maybe bigger and bigger model and more and more data the performance approaches but never surpasses some theoretical limit which is called Bayes optimal error.

The performance is for many tasks not that far from Bayes optimal error because people are very good at looking at images and telling if there's a cat or listening to an audio and transcribing it. So, by the time you surpass the human-level performance maybe there's not much headroom to still improve. But the second reason is so long as your performance is worse than human-level performance then there're actually certain tools you could actually use to improve performance. They are harder to use once your performance surpasses the human-level performance.

For tasks that humans are quite good at, so long as your machine learning algorithm is still worse than the human you can get labeled data from the human so you can have more data to fit your learning algorithm. Or you can use manual error analysis, you can ask people to look at the examples that your algorithm is giving it wrong, try to get insight in terms of why a person get it right but the algorithm gets it wrong. And also you can get a better analysis of bias and variance.

We talk about you want your learning algorithm to do well on your training set but sometimes you don't want to do too well and knowing human level performance can tell you exactly how well but not too well you want your algorithm to do on the training set.

Here we still use cat classification as the example, given a picture let's say humans have near perfect accuracy, so suppose the human-level error is 1%, in that case, if your learning algorithm achieves 8% training error and 10% dev error then maybe you want to do better on the training set. So, the fact is the huge gap between how well your algorithm does on your training data versus how well humans do, it shows your algorithm isn't fitting your the training set well, so, in terms of tools of bias and variance, in this case, you will focus on reducing bias so you want to do things like to find a bigger neural network. But now let's imagine the human-level error is not 1%, it's 7.5% maybe the images in your dataset are so blurry and even humans can't tell whether it is a cat, in this case you see that maybe you're actually doing just fine on the training set because it is doing only a little bit worse than human-level performance, and maybe you want to focus on reducing the variance in your learning algorithm so you might try regularization to try to bring your dev error closer to your training error.

So, in earlier courses discussion on bias and variance, we were mainly assuming there were tasks where Bayes error is nearly zero, so to explain what it happens here for our cat classification example, think of the human-level error as a proxy or an estimate for Bayes optimal error. And for computer vision tasks this is a pretty reasonable a proxy because humans are actually very good at computer vision so whatever human can do maybe not too far from Bayes error. By definition human-level error is worse than Bayes error because nothing could be better than Bayes error but the human-level error may not be too far from Bayes error.

Let's see how to define 'human-level' a bit more precisely and in particular use the definition that is most useful for helping you drive progress in your machine learning project. Let's say you want to look at a radiology image and make a diagnosis classification decision. Suppose a typical human, untrained human achieved 3% in this task. And a typical doctor, maybe radiology doctor achieves 1% error and an experienced doctor does even better 0.7%, and a team of experienced they consensus their opinion to achieves 0.5% error. So, if you want a proxy or an estimate of Bayes error and given that team of experienced doctor achieves 0.5% we know that Bayes error should not large than 0.5%, we don't know how better it is maybe there's a larger team of even more experienced doctor could do better so maybe it's a little better than 0.5% but we know the optimal error cannot be higher than 0.5%. So, what I would do in this setting is to use 0.5% as the estimate for Bayes error so we'll define human-level performance as 0.5%.

The gap between human-level error and training error is taken as avoidable bias and the gap between training error and dev error we take it as variance. Like in previous slide we talked when avoidable bias is bigger than variance we'll focus on reducing bias like training a bigger neural network, whereas if the variance is much bigger then we'll focus on variance reduction techniques such as regularization or getting bigger training set. Where it's really doing matter is when your training error approach 0.7% and dev error is 0.8%, unless it's very careful about estimating Bayes error you might not know how far away you're from Bayes error and therefore how much you should be trying to reduce avoidable bias. In fact, if you all you knew was that a single experienced doctor achieves 0.7% instead of the team experienced doctors achieving even better 0.5%, it might be very difficult to know if you should be trying to fit your training set even better. And this problem arose only when you're doing very well on your problem already, you know 0.7% and 0.5% really close to the human-level performance.

Lots of teams often find it exciting to surpass human-level performance on the specific recognition and classification. Look at the following example, you have a team of humans discussing and debating and achieve 0.5% error, single human achieve 1% error. So, in case of your training error is 0.6% it's easy to answer what the avoidable bias is. Because you'll take 0.5% as the estimate of Bayes error so your avoidable bias is not gonna use 1% as reference. And in this case, there's maybe more to do to reduce your variance than your avoidable bias. moreover, if your training error is already better, say 0.3%, than even a team of humans looking at and discussing and debating then it's also harder to rely on human intuition to tell your algorithm what are ways that your algorithm could still improve the performance. So, in this example, once you surpass 0.5% threshold your options that make progress on the machine learning problem are just less clear. It doesn't mean you cannot make progress you might still make significant progress but some the tools you have for pointing you in a clear direction just don't work as well.

There are many problems where machine learning significantly surpasses human-level performance. And notice something for these problems are actually learning from structural data, and these are not natural perception problems so these are not computer vision or speech recognition or natural language processing tasks because humans tend to be very good at perception task. And finally all of these problems where there are teams that have access to huge amount of data, so, for example, the best system for all these applications have probably looked at far more data of that application than any human could possibly look at, so it can better find the statistical pattern than even a human might.

Chen Yang

要查看或添加评论，请登录

Chen Yang??????的更多文章

Practice on using ansible 2.4 to deploy HDP 2.6.4.0

2018年4月18日

Practice on using ansible 2.4 to deploy HDP 2.6.4.0

I'm practicing ansible installation of hdp 2.6.
Deep learning--CNN: localization in object detection (1/2)

2018年4月3日

Deep learning--CNN: localization in object detection (1/2)

Deep learning has been successfully applied to computer vision, speech recognition, online advertising, logistics many…
Deep learning--CNN: classic ConvNet, residual networks, inception network

2018年3月20日

Deep learning--CNN: classic ConvNet, residual networks, inception network

There are some classic neural network architectures LeNet-5, AlexNet, and VGG-16. First, let's look at the following…

1 条评论
Deep learning--CNN: Padding, strided convolution, convolution over volume, pooling layer

2018年3月12日

Deep learning--CNN: Padding, strided convolution, convolution over volume, pooling layer

In order to build deep neural networks, one modification to the basic convolutional operation that you need to really…
Deep learning--CNN: Edge detection

2018年3月11日

Deep learning--CNN: Edge detection

Computer vision is one of the areas advancing rapidly thanks to deep learning. Deep learning is now helping the…
Deep learning: End-to-end deep learning

2018年3月7日

Deep learning: End-to-end deep learning

One of the exciting recent developments in deep learning has been a rise of end-to-end deep learning. Basically, there…
Deep learning: Transfer learning, multitask learning

2018年3月6日

Deep learning: Transfer learning, multitask learning

One of the powerful ideas of deep learning is that sometimes you can take knowledge, the neural network has learned…

1 条评论
Deep learning: Training and testing on different distributions

2018年3月5日

Deep learning: Training and testing on different distributions

If you're working on a brand new machine learning application, one of the pieces of advice is that you should build…
Deep learning: Error analysis

2018年3月4日

Deep learning: Error analysis

You've heard about orthogonalization, how to set up your dev and test, human-level performance as a proxy for Bayes…
Deep learning: orthogonalization, evaluation metrics, train/dev/test set

2018年3月2日

Deep learning: orthogonalization, evaluation metrics, train/dev/test set

In the example of the earlier TV set, orthogonalization refers to that the TV designers had designed these knobs so…

See all articles

Deep learning: human-level performance

Chen Yang??????

Machine & Deep Learning | Big Data Cloud

Chen Yang??????的更多文章

社区洞察

其他会员也浏览了

AIML 08- 23 Deep Learning Papers To Get You Started

What is deep learning and how does it work?

Deep Learning Reading List: The Essentials

Ensemble methods in Deep Learning

Machine Learning: A Comprehensive Overview

Deep Learning for Churn Prediction.

Deep Learning: Providing Big Benefits Like Predictive Modeling

The Secret Layer Behind Every Successful Deep Learning Model: Representation Learning and Knowledge Quality

Navigating the Labyrinth: Unveiling the Secrets of the Bias-Variance Tradeoff in Deep Learning

Deep Learning vs Traditional Machine Learning... Which one I should use?

Chen Yang??????的更多文章

Practice on using ansible 2.4 to deploy HDP 2.6.4.0

Deep learning--CNN: localization in object detection (1/2)

Deep learning--CNN: classic ConvNet, residual networks, inception network

Deep learning--CNN: Padding, strided convolution, convolution over volume, pooling layer

Deep learning--CNN: Edge detection

Deep learning: End-to-end deep learning

Deep learning: Transfer learning, multitask learning

Deep learning: Training and testing on different distributions

Deep learning: Error analysis

Deep learning: orthogonalization, evaluation metrics, train/dev/test set

社区洞察

其他会员也浏览了

AIML 08- 23 Deep Learning Papers To Get You Started

What is deep learning and how does it work?

Deep Learning Reading List: The Essentials

Ensemble methods in Deep Learning

Machine Learning: A Comprehensive Overview

Deep Learning for Churn Prediction.

Deep Learning: Providing Big Benefits Like Predictive Modeling

The Secret Layer Behind Every Successful Deep Learning Model: Representation Learning and Knowledge Quality

Navigating the Labyrinth: Unveiling the Secrets of the Bias-Variance Tradeoff in Deep Learning

Deep Learning vs Traditional Machine Learning... Which one I should use?