课程: TensorFlow: Neural Networks and Working with Tables
Neural network intuition - TensorFlow教程
课程: TensorFlow: Neural Networks and Working with Tables
Neural network intuition
- [Tutor] So let's get a general intuition of our neural network problem and what we're trying to achieve. In later videos, we'll get into the details of how everything works under the hood. So we start off with the original image, which is a 28 by 28 grayscale image. And as you can see, this is an image of an ankle boot. Now, grayscale images have values between 0 and 255. The grid of numbers is a numeric representation of the 28 by 28 image. Now, if you squint your eyes, you should be able to make out the image of the ankle boot in the grid of numbers, the zeros correspond to black, and the closer you get to 255, the whiter the image is in that section. Now machine learning algorithms don't perform well when we have a wide range of numbers. One common mathematical technique is called Min-Max scaling. So the values are shifted and rescaled so that they end up ranging between zero and one. Now, in our case, we'll take all of the values and divide them by 255 so all of the values are between zero and one. So we take the 28 by 28 image, row by row and we flatten it. So we're taking the first row, we then concatenate the second row, we then concatenate the third row to the first and second row and so on. And at the end of this flattening process, we have one long row which has 784 numbers. So let's get an overview of what our neural network looks like. So we start off with 784 input nodes and each of these nodes correspond to a pixel of the image, and we have two hidden layers with 128 nodes and 64 nodes with ReLU in brackets. The final output has 10 nodes. Now, these Orlando's fashion amnesty data set has 10 classes. So if you wanted to know the probability that the neural network predicts, it's one of the different classes, you can just find which class has the highest number. So the higher the probability, the more likely the neuro network predicts that that image belongs to that class of objects. The input layer needs to be 784 nodes because these are the number of pixels in the input image. Similarly, the output layer has to have 10 nodes because there are 10 classes in the fashion amnesty dataset. So why use 128 nodes and 64 nodes in the middle? Well, these are arbitrary numbers that I've selected. Well, what if instead we had 784 nodes for hidden layer one and hidden layer two? In a fully connected neural network. Each node connects to every other node. So when we start to train our neural network, we adjust the weights so parameters from each node in one layer to another. This means for network with 784 nodes for hidden layer one and hidden layer two, we have almost 1.24 million parameters that have to be tracked. So let's look at the calculation for just the first section. The input layer has 784 nodes going into a hidden layer of 784 nodes. So the total number of weights is 784 times 784, which is 614,656. Now each of the layers on the right, will have one bias per node. And since there are 784 nodes in the hidden layer, that's an additional 784 nodes. So that's how we get a total of 615,440 per layer. Now our neural network is actually pretty small. So increasing the number of nodes per layer, won't make tensorflow work significantly slower to make these calculations. You'll also find that there isn't a significant difference in the accuracy overall. Another hyper parameter or change you can make to your neural network that you can try and adjust is adding more hidden layers. So why only have two? What if we had five or 10 or 20? Well, firstly, there's a trade off calculation that you need to make. So are you getting much better accuracy for a neural network where you have millions more parameters and perhaps it takes longer for you to train that neural network. Now let's look at some of the terms we've had in brackets. So ReLU stands for rectified linear unit, and it's a type of activation function. Now the purpose of these activation functions is to introduce some non-linearity into our model. Now, what do we mean by that? This is a plot of these Orlando class in 3D, so we can visualize the problem. Now let's plot all of the ankle boots, the sport shoes and the sandals. Now, if this was a linearly separable problem, then we could draw a couple of planes and the planes would separate each of the different classes. So imagine a piece of paper that's cutting through these, these different classes. Now, the problem is that some of the ankle boots look quite similar to the sport shoes and some of the sports shoes look quite similar to the sandals. So you'll find that most problems are more complex than linear problems. And this is why non-linear activation functions allow the nodes to learn more complex structures in an image. So visually this is what ReLU looks like and mathematically it's defined as take whichever value is greater zero or the value X, and this introduces nonlinearity into our neural network model.
随堂练习,边学边练
下载课堂讲义。学练结合,紧跟进度,轻松巩固知识。