Deep Learning – what is it? Why does it matter?
Have you ever wondered just what this phrase Deep Learning is referring to and why it matters? If so then this post is for you!
In my last post, I demystified a variety of buzzwords, and explained that Deep Learning is a subset of Machine Learning. This post explores the world of deep learning for non-mathematicians.
In doing so it:
- Explains a little about what an Artificial Neural Network is;
- Touches on convolutional neural networks as one type of Deep Learning Neural network;
- Explains the impact Deep Learning is having on Cognitive Computing;
- Outlines a few examples of Cognitive Computing (Deep Learning) in action.
Starting with Artificial Neural Networks
To understand Deep Learning, you must first understand a little about Artificial Neural Networks. Don’t worry. I am not going to describe the mathematics behind it all. That means no talk beyond this sentence of weighting, back propagation, activation functions and more.
In fact the main thing you need to understand is: What is an Artificial Neural Network generally being used to do?
In answering, I will keep it simple, and say that normally an Artificial Neural Network is used to classify an input.
Classification refers to us wanting to label something (in computer terms an input). For example, as humans, we classify things every day such as fruits, vegetables and vehicles. Beyond that we classify them in more detail as apples, oranges, cars and trucks normally by understanding what they look like. In the case of a Vehicle we recognize it is a vehicle from a distance, using general characteristics, and as it nears, and we get more detail, we further classify it to be a bus or a tuck or a car. This is exactly the basic principle behind an Artificial Neural Network at a high level.
Once we can classify something we can then use that understanding to take an action.
Taking a small step back
To demonstrate why Neural Networks help lets step back and look at this great image from Luis Serrano. This image lets you understand the concept of isolating regions so you get more and more detailed in your classification and reduce errors.
In this image Luis shows inputs consisting of 2 values (x and y). The historical training data we have tells us if the result of a set of x and y values is blue or red (arbitrary outcome but red or blue is the label). To show that visually you can plot those values as shown above if you visualize an X and Y axis.
Now at this stage we could simply try to come up with a single model that gives an output. We could try logistic regression, quadratic regression, and more to come up with a fixed model. Given the limited number of data points it is probably easy to say some form of quadratic regression would work out for us. But what if we had billions of data points? What if things were super complicated in terms of where blue was and where red was? What if there was a sudden change in data with reds appearing at the top right of the plot?
In the first cases modelling would get harder and harder but in the case where new data was introduced we would have to spend time modifying our model and perhaps totally changing it. That is where a Neural Network comes in. Not only can it adapt, learn, based on the data it has got over time (retraining) but it can handle massive amounts of data. Please take this discussion, and image, as a simple example to highlight the concepts.
Detecting Blue
The aim of this Artificial Neural network is to help detect blue outcomes. To do that the computer would (by trial and error of various approaches) come up with two different regression models. Each of those regression models comes up with probabilities that a given point is blue or red (the plots with straight lines above). Normally this is never perfect. This is why in the image you can see that both models result in some errors with reds in the blue zone and blues in the red zone. The computer will optimize to reduce those errors as much as possible but a straight line is rarely going to be the most accurate and data is not normally as clean as shown here.
The clever part comes next. By using two regressions the Neural Network can then combine the probability of something being blue from the two regression models (with some other magic) to come up with the region shown on the very right. Here you can see blue is in the right place as are reds and we have a curved (almost quadratic) line helping us classify the data. That third region dissects the data more precisely and will help deliver a more accurate probability of something being blue or not. Of course this example shows 100% perfection which is very hard to get to in reality.
The regression parts of this network are known as Hidden Layers. This network has 2 hidden layers. A hidden layer can be thought about just as the steps the Neural Network is using to classify something which is normally "blackbox".
Back to what is an Artificial Neural Network
Artificial Neural Networks work to separate and classify data so that when
new data is presented you classify it with a high degree of accuracy.
In the example above what this means is that given a new x and y value, which has never been seen before, you can push it through the network. The network will deliver you probability of it being a blue outcome with a high degree of confidence. By definition, if the choice is binary, you also know that if the probability of blue is very low it is red in this example. This is a pretty simple example but illustrates the concept.
So what is a Deep Learning Neural Network?
Deep Learning normally revolves around the use of Artificial Neural Networks with more than two hidden layers. The theory is that the more hidden layers you have the more you can isolate specific regions of data to classify things.
Caveat to that point on many layers. Those deep in this space can have long
discussions about how many layers makes sense. It is possible to have
hundreds of hidden layers. Many argue that doing so does not always make a
massive difference to the accuracy and in fact may even reduce accuracy. The truth is the answer lies in what you are doing. As with all modelling trying a few different approaches and then selecting the best after comparison is
the way forwards.
What is clear is that:
- The more hidden layers and nodes you have the more computation power you need for training of the model and each subsequent execution as you use that model to classify something.
- The more data you want to provide in training the model the more computation power will be required. On the flip side the more data you have the more accurate the model will be in general. Models will also improve over time as they get more and more data.
These two things have lead to the rise of GPU (Graphics Processing Unit) based processing in the world of Deep Learning. GPU based processing allows for parallel execution, on large numbers of relatively cheap processors, especially when training an artificial neural network with many hidden layers and a lot of input data. The follow on to this is FPGA (Field Programmable Fate Array) which then allows you to execute custom models very quickly.
Pictorial Representation of an Artificial Neural Network
Below is another pictorial representation of an Artificial Neural Network. It is broadly comprised of three things. An input layer, one or more hidden layers and an output layer. You could see those same three layers in the sample from Luis I used earlier.
- On the left-hand side, you see the Input Layer. In this case 4 things are input to the network. Each input node connects to every node in the first hidden layer. In the example from Luis used earlier we had 2 inputs in the input layer being the x and y values.
- Then there is generally one, or two, hidden layers with a number of nodes that are all connected to every input and every output node. Normally we tell the network how many nodes we want in the hidden layer, and how many layers we want, but the computer handles the determination of which models it will use based on the data supplied during training. This is the self learning part of things and where a network can change over time as more data is fed in (so that a better ultimate classification happens). Each node is modelling something based on the input before (so probability of being blue using regression models in the example I used from Luis). Essentially the network at this point is “splitting” and “classifying” data and providing a probability to the next node normally.
- On the right-hand side, you see the output layer with several output nodes. Each of those nodes will deliver an output which helps classify the input. If the Network output was different breeds of dogs, and we were looking for 3 specific breeds, then the model above would share the probability of the input being that specific breed of dog. Those probabilities are fed then to an application that can either present them all or simply use the highest probability to drive a decision. In the example from Luis it was simply the probability of the input being blue. Incidentally.. if you then provided a breed of dog never seen before the network would do its best but it will classify it with probability as one of those you have trained it to find. It does not know what it does not know!
In this artificial neural network all hidden nodes are connected to their predecessor and successor with edges. This sort of Neural Network is known as a Fully Connected Network. There are ways to do something called pruning so that you do not have everything connected. For this blog I am going to skip over that. Any pruning would normally be done only after you have trained the model to improve accuracy and performance.
Now that you understand what an Artificial Neural Network looks like, and what it is trying to do, I hope you can now understand that a Deep Neural Network is just a special case. The main difference is it normally contains 2 or more hidden layers (some people think you need more than 2 hidden layers for your model to be said to be a Deep Learning model). Deep Learning opens up new possibilities to uniquely classify things with very high levels of accuracy. This is perhaps most evident in the rapidly developing world of speech, text and visual classification although there are many other uses for Deep Learning.
Some real world Neural Networks
There is a great blog post that looks at Alpha Go in detail. That is the Narrow AI that was deigned by Google to win at the game of Go. One part of that shows the Deep Learning Neural Network behind how it mastered the game after it played millions of games. The network contains very few layers as you can see. It ultimately comes up with just 6 outputs which were used to make the decision on what to do based on the game situation.
Autonomous cars also get a lot of attention nowadays but it is a very similar concept. Check out this blog from David Simpleton which shows how he built a self driving model car. Below is the image of his Neural Network. Again you can see it does not have many layers but each layer has a lot of nodes. Ultimately though it provides outputs as to the probability as to what the car should do next based on a given input.
Eagle eyed readers might have spotted a word convolution in the AlphaGo network image. That is because these two examples are using something called Convolutional Neural Networks. This is a type of neural network created back in the 80s proven to work very well in handling speech, text and images. Both examples I have shown above are using visual data! While not a new concept we now have the processing power to fully realize it along with the data!
What is a Convolutional Neural Network?
In short Convolutional Neural Networks break down images into smaller parts (convolution) and then to try to identify specific features using filters that look for specific patterns in the image. This then builds something called a convolution layer which can be shrunk using something called pooling. Based on what is found that is then used in a fully connected network to determine what is in the image. If you really want to understand take a look at this video from Brandon Rohrer which I think really explains it well. It would take me a long time and a lot of words to explain it in this post. Convolutional Neural Networks are behind a lot of the breakthroughs we are seeing in Narrow AI today!
Pause
If you have made it this far you are through the hard part and you are on your way to understand a lot of the smarts behind Narrow AI today. We have covered of a lot already. Ultimately you just need to remember that we are using these networks to classify things via a variety of methods.
Impact on Cognitive Computing
I plan to write a whole post on the importance of cognitive computing but for now you should be clear that Deep Learning is behind many of the advancements we are seeing with Cognitive Computing today.
Cognitive Computing deals with enabling computers to interact with us in a humanlike manner. That means having them able to understand images, understand speech, understand text etc. and reciprocate accordingly.
Convolutional Neural Networks have been the key to that progress along with the vast quantities of data and the massive compute we now have available. To succeed in Cognitive Computing you need THREE things (assuming you have people and tools to build the model).
- You need a great deal of labelled input data to train a good network. For example to build a great network that can recognize objects you need thousands and thousands of images that contain that object plus the same that do not. There is a great Ted talk by Fei-Fei Li which explains why.
- You need a lot of compute power to make that training happen in a sensible time period and to continue to power the evolution of the model over time.
- You need compute power to use the trained model in your application to classify new inputs.
It is clear to me that cognitive computing will make sense, for most, to be delivered as a service which you can embed into your applications. I think that will be the frontier of all application developers infusing AI to their applications.
Developing a lot of this in-house will be difficult given the training data and compute requirements to make it accurate. The nice thing is companies like Microsoft let you ride on what we have already done and extend those models. Take a look here. Now, even if you have custom parts you want to recognize, you can build on what others have done to bring unique AI capabilities to your business with a few photos and a few clicks!
Usage today
Today we see companies using things like the Microsoft Cognitive Services in their applications to add “narrow” artificial intelligence. There are basically two things.
- Pre-trained models, being continuously updated, that people can exploit as a service without any knowledge of all we have covered in this blog. Examples might be visual recognition of common objects/celebrities or speech and text recognition.
- Black box services you can use to train models to detect specific things which may be proprietary to you. Examples might be facial recognition of specific objects/people or identifying someone from their own speech.
These two capabilities offer maximum flexibility. The first set are hard to replicate yourself and the second set provides you with a quick means to develop the Narrow AI services you need without needing an army of skilled data scientists.
Below are two examples of Cognitive Services in action which have introduced narrow AI to specific applications/business processes.
- Uber has introduced Real-Time ID Check, an additional security feature that periodically prompts drivers to share a selfie with Uber before they go online to start accepting ride requests.Real-Time ID Check uses Microsoft Cognitive Services intelligence to instantly compare the selfie to the photo corresponding with the driver’s photo on file. If the two photos don’t match, the driver’s account can be temporarily deactivated while Uber looks into the situation.This feature prevents fraud and protects drivers’ accounts from being compromised. It also protects riders by building in another layer of accountability to the Uber app to let passengers know that the right person is behind the wheel.
- McDonalds is using Microsoft Cognitive Services to help them with understanding orders at Drive through restaurants. To do that they use a service which translates speech into text.
Beyond these two examples we are starting to see the infusion of narrow AI services into all sorts of applications. Often people have no idea they are getting the benefits of AI. For many of us that is the way it will always be. AI helping to empower each and every one of us and each and every organization to achieve and understand more. AI will be omnipresent and often invisible.
Wrapping up
I hope this post has helped demystif the buzz around Deep Learning and set you on a path to learn more. It has shown that Deep Learning is nothing more than an Artificial Neural Network with two or more hidden layers. It has explained that Convolutional Neural Networks are fuelling the Cognitive Computing boom we see today. Finally the post shared a few cases where narrow AI to be infused into applications and business processes.
I am sure there are many experts out there and I would love to hear if you would change anything in this post. For the rest please share if this post helped or how I can try simplify further to clear up any confusion.