登录查看更多内容

Computer Vision Speed Challenge

Omar Husain

Head of Systems Engineering at AVSS - Aerial Vehicle Safety Solutions Inc.

发布日期: 2021年2月4日

Introduction

After listening to a podcast featuring the CEO of the self-driving car company comma.ai, and loving his vision and attitude, I decided to look at their website and learn more. On their careers page I found an old programming challenge that they created for potential hires. The goal was to create a model that would take dash-cam video footage from a car and predict the speed of the car in each frame.

With little formal machine learning experience, I decided to try it out for myself...

My Process

I started by messing around with OpenCV (a computer vision library) using python. I had some experience making little neural networks based on video files before so I was able to get started with loading the frames pretty quickly.
My intuition was that since speed is a relative measure, the model would need to look at data from at least two successive frames in order to learn and make predictions. I didn't know exactly what to feed the model at this point, but my first thought was to feed the model the differences between successive frames.
OpenCV is well documented and I found 3 options for tracking differences between the two frames: i) Background Subtraction, ii) Sparse Optical Flow, and iii) Dense Optical Flow.

Original frame image and then background subtracted image and dense optical flow field

I decided to try out background subtraction and Dense Optical Flow and avoid Sparse Optical Flow, as I was worried that tracking specific points on the road might not be as generalizable.
Background subtraction takes two frames and highlights the main differences in the frames, which essentially leaves a foreground mask with no background.

Figure 2. Background Subtraction Flow Chart from OpenCV Documentation

Optical Flow computes the apparent movement of image objects in successive frames. It creates a 2D vector field that shows the magnitude and direction of motion of these image objects. For Dense Optical Flow, the movement of every pixel in the image is calculated.

Optical Flow Image from Opencv documentation

Figure 3. Optical Flow Demonstration for Image Object From OpenCV Documentation

At this point my frames were loaded in as successive pairs and they were processed either through Background Subtraction or Dense Optical Flow. The next step was to create my Convolutional Neural Network (CNN) structure.
The only knowledge I had about CNNs was from taking Harvard's CS50 Intro to AI class online. We built a simple CNN to classify road signs based on images of road signs and the labels provided. I knew that the speed challenge was not quite a classification problem, but I tried to use this simple CNN and replace the last fully-connected classification layer of the model with a simple one layer output, which would output the predicted mean speed of the two frames.

The results were pretty bad... The model took 8 hours to train and the mean squared error loss function was a dismal 19.37 on the test validation data.

History of training for simple CNN with background subtraction

Figure 4. History of Simple CNN Training with Background Subtraction

At this point I was a little discouraged and felt overwhelmed by the task. But I decided to keep with it because I really enjoyed the process. I spent about 2-3 hours every night for a couple days and I didn't want to give up now.

Back To the Drawing Board

I decided to do some more research and see if there were any CNN models out there that might be optimized for this type of problem. I found an article by developers at NVIDIA with a detailed description of a CNN architecture that was found to be optimal for self-driving cars. So I decided to create a model that would use their architecture. I had to understand each layer line-by-line and figure out how to implement it.

Figure 5. CNN Architecture of NVIDIA Developers (Bojarski et. al, 2017)

The model consists of a normalization layer, convolution layers, and fully-connected hidden layers. It turns out that the normalization layer is really important. It normalizes the weights of the network, which helps to avoid massive swings in the values and greatly speeds up the model (this model took 10 min to train). The new model looks like this:

I had to re-familiarize myself with each of these layers and their functions. Convolution layers are a type of filter that is applied to the pixels in an image in certain regions (kernels) at a time. These filters help to bring out features of the image.
The fully-connected layers turn out to be crucial (I tried a 50% dropout of the 100 node layer to try to avoid overfitting and my model never got under a loss of 190). These hidden layers create nodes and weights between the input and output layers. The error in the output layer during training gets back-propagated to the previous layers and the weights of these layers are adjusted accordingly.

Results

MSE Loss vs epoch for backgrounded subtracted and dense optical flow models

Figure 6. Plot of Model Lose (MSE) with Epoch Number for CNN on Optical Flow

The results using the NVIDIA CNN structure were amazing. The loss for the optical flow-fed model was under 2.0 after 4 epochs (it only took about 10 min of training).
What this is highlighting for me is that the model architecture is the most important part of this problem. I fed the model the background subtracted frames and I also got a similarly good result with under 2 MSE loss.

Takeaways

CNN Architecture is important. There are a bunch of different dials and knobs to be turned on any neural network, but once you get the right combination you can start to make accurate predictions. One thing I can do to test this hypothesis further is to feed the model unprocessed frame-pairs and see how well it performs. That would indicate how important the architecture is vs. the actual input.
Normalization layers in CNNs are great. Adding this one layer sped up the training of my model by several orders of magnitude.
Python Programming: Generators are awesome! I had never used them before but they are incredibly handy when you need to minimize memory use and speed things up. It took me about a half an hour to really understand how to use them, but then it was easy to set up for loading in my data. Essentially they are functions that return lazy iterators, which are objects like lists, except they don't store their contents in memory . So you can iterate over a generator without having to store in memory the entire contents of what you want to iterate over.
Life lesson: It's cheesy, but if you love working on a problem, don't quit. Even if you get overwhelmed and you are full of self-doubt, just keep going. Break it down into little steps and test different methods. Do some research and see how others tackled the problem. Try to understand their methods and apply it to your problem.

My Code

Omar Husain Github Repo for comma.ai speed challenge

References

Bojarski, M., Firner, B., Flepp, B., Jackel, L., Muller, U., Zieba, K., & Testa, D. (2016, August 17). End-to-end deep learning for self-driving cars. Retrieved February 04, 2021, from https://developer.nvidia.com/blog/deep-learning-self-driving-cars/
OpenCV. (n.d.). How to use background subtraction methods. Retrieved February 03, 2021, from https://docs.opencv.org/4.5.1/d1/dc5/tutorial_background_subtraction.html
OpenCV. (n.d.). Optical flow. Retrieved February 03, 2021, from https://docs.opencv.org/4.5.1/d4/dee/tutorial_optical_flow.html
Stratis, K. (2021, January 08). How to use generators and yield in Python. Retrieved February 04, 2021, from https://realpython.com/introduction-to-python-generators/

Nikhil P.

Engineer at Bluelight Machines

4 年

Nice writeup! I think we were thinking along the same lines with our solutions ... I tried a slightly different approach to background subtraction if your interested https://github.com/NikhilPeri/speedchallenge

1 次回应

查看更多评论

要查看或添加评论，请登录

Omar Husain的更多文章

Making sense of the Honeywell MPR Series Pressure Sensors with SPI

2021年4月1日

Making sense of the Honeywell MPR Series Pressure Sensors with SPI

If you ever intend on using the Honeywell MPR Series micropressure sensors for a project, then this article is for you.…

2 条评论
The Cutest Desk Fan DIY

2021年3月21日

The Cutest Desk Fan DIY

I needed a small desk fan to blow away the fumes while I solder at my desk, so I thought I'd make my own with any…

2 条评论
Car Suspension Modelling

2021年1月18日

Car Suspension Modelling

Introduction It's been over a year now that I am out of school but I still find myself taking classes in my spare time.…

Computer Vision Speed Challenge

Omar Husain

Head of Systems Engineering at AVSS - Aerial Vehicle Safety Solutions Inc.

Introduction

My Process

Back To the Drawing Board

Results

Takeaways

My Code

References

Omar Husain的更多文章

社区洞察

其他会员也浏览了

The Encoder Component of the Transformer Architecture: Source code Demystified

From Content to Art: An Introduction to Neural Style Transfer using Python and TensorFlow

From scratch to XAI - A personal 1-week experience developing a simple explainable artificial neural network

6 Best Open-Source Projects for Real-Time Face Recognition

TensorFlow-Keras using Mnist Dataset

August 04, 2022

How I Built a Self-driven Car With Neural Network: Using Javascript, CSS and HTML

Unlock Computer Vision with AlexNet: Step-by-Step Tutorial

AI's Quantum Leap: OpenAI's 01 Models Redefine Possibilities

Chapter 2.2 : Self-Driving Car [Intro to TensorFlow & Deep Neural Network]

Introduction

My Process

Back To the Drawing Board

Results

Takeaways

My Code

References

Omar Husain的更多文章

Making sense of the Honeywell MPR Series Pressure Sensors with SPI

The Cutest Desk Fan DIY

Car Suspension Modelling

社区洞察

其他会员也浏览了

The Encoder Component of the Transformer Architecture: Source code Demystified

From Content to Art: An Introduction to Neural Style Transfer using Python and TensorFlow

From scratch to XAI - A personal 1-week experience developing a simple explainable artificial neural network

6 Best Open-Source Projects for Real-Time Face Recognition

TensorFlow-Keras using Mnist Dataset

August 04, 2022

How I Built a Self-driven Car With Neural Network: Using Javascript, CSS and HTML

Unlock Computer Vision with AlexNet: Step-by-Step Tutorial

AI's Quantum Leap: OpenAI's 01 Models Redefine Possibilities

Chapter 2.2 : Self-Driving Car [Intro to TensorFlow & Deep Neural Network]