Machine Learning Fundamentals for Self-Driving Cars
Sebastian Thrun, who in addition to being my boss has as good a claim as anyone to being the father of the self-driving car, likes to say that perception is 80% of the challenge of building self-driving cars.
Fortunately, perception accuracy and speed has increased dramatically in recent years, largely thanks to deep learning. Deep neural networks (which are synonymous with "deep learning") have transformed our ability to work with camera data, and they have the potential to transform work on other parts of the self-driving car stack, as well.
Over the coming weeks, I'll be covering different aspects of deep learning. Today I'll start with the fundamentals of machine learning. In coming weeks, I'll write about deep neural networks, convolutional neural networks, deep learning frameworks, transfer learning, reinforcement learning, and possibly a few more topics.
As with my previous Back\Line posts, these will be high-level and conceptual. If you are interested in learning how to actually build deep neural networks, I might modestly suggest Udacity's Self-Driving Car Engineer Nanodegree Program, or Udacity's School of Artificial Intelligence.
And please subscribe to Back\Line to keep up with the posts there!
Taxonomy
Deep learning is a type of machine learning, which is, in turn, a type of artificial intelligence.
Artificial Intelligence
Starting at the top, artificial intelligence describes "agents" that follow the perception-action cycle. These agents are often computer algorithms at their core. They perceive the environment around them, often plan how to best reach their goals, and then act to achieve those goals.
For example, imagine an agent whose goal is to determine whether an image is of a stop sign, or not. The agent might follow a simple algorithm: if the image contains a red background with white text, classify that image as a stop sign. That's not the most sophisticated algorithm, and it won't be right 100% of the time. But it's an agent that is following the perception-action cycle, and thus it demonstrates artificial intelligence.
Machine Learning
Within the broad umbrella of artificial intelligence lies machine learning. Machine learning is a class of algorithms that learn from data to achieve artificial intelligence.
Let's revisit our stop sign classification agent. Imagine, instead of looking for a red background with white text, it instead learns from a giant collection of images. Some of those images are of stop signs, and some aren't, and over time the agent just learns what a stop sign looks like.
This is a little bit like how a human learns to perceive the environment. We see lots of things and we build up an intuition over time. Notice that we haven't specified how the agent learns to distinguish stop signs from other images - there are many different algorithms it could be using for that, all of which are "machine learning".
Deep Learning
Deep learning is a type of machine learning that uses a specific tool, called a neural network, to learn from data.
Neural networks contain layers of "artificial neurons", each of which is connected to other artificial neurons in other layers of the network. Each neuron takes input from part of the network, performs its own calculations, and passes those results on to other parts of the network.
Neural networks (sometimes called deep neural networks) are just one of many approaches to machine learning, but they've become critically important in the last six years. They work very well on modern parallel computing chips, especially graphical processing units (GPUs). GPUs were originally designed to output images to computer monitors.
Roughly speaking, if you think about pixels on a screen, they're all doing roughly the same thing at the same time, just with slightly different values. Similarly, those layers of artificial neurons in a neural network are all doing roughly the same thing at the same time, just with slightly different values. One of the happy coincidences of engineering :-)
Goals
Machine learning agents have different types of outputs. There are four types of outputs that are particularly important for self-driving cars: regression, classification, localization, and segmentation.
Regression
The canonical output for a machine learning agent is a number. This number might be how far away a pedestrian is, or how hard to press the accelerator on a car, or it might be the coefficient for a third-order polynomial that describes a lane line on the road.
Classification
Some machine learning agents output discrete classes, instead of continuous numbers. For example, an agent might classify whether a traffic sign is a stop sign, yield sign, speed limit sign, or any one of tens or hundreds of other types of street signs.
Localization
A different type of network might localize objects within an image. For example, before a network can classify what type of traffic sign appears in an image, it must first identify where in the image that traffic sign is, whether a traffic sign appears at all, and whether there are multiple traffic signs in an image.
Localization agents typically output the coordinates of objects within in an image, so in some ways they resemble regression networks. But their purposes is sufficiently unique to think about them as their own class of agents.
Segmentation
Segmentation agents classify individual pixels within an image. Some pixels represent the road, others represent vehicles, others represent pedestrians, other free space, others the sky, and so forth. Classifying all of the pixels within an image helps us to understand where the free space is in the environment for us to drive.
Just like localization agents are a specialized type of regression agent, segmentation agents are a specialized type of classification agent. Instead of outputting a single class for the whole image, each pixel in the image gets its own class.
Coming Up
I've only just scratched the surface of machine learning here. Next week I'll return to machine learning fundamentals and describe the process of training, validating, and testing a machine learning agent. After that, we'll move on to deep neural networks, and look at how they've revolutionized self-driving cars.
Subscribe to Self-Driving Cars on Back\Line to follow along, and check back in next week!
Head of Information Systems Department, CTO at Smart City
6 年Great work
Machine Learning Engineer | Kaggle Competitions Master
6 年Great work David Silver. Looking forward to whole series