Learning Computer Vision with OpenCV
Object Detection using YOLOv3 and OpenCV. Picture taken at a burger joint in Anderson, TX (2017).

Learning Computer Vision with OpenCV

I like to learn new things. It's good to have hobbies and creative outlets. One of my most recent interests over the past year, and especially during the past several months has been Computer Vision. What exactly is Computer Vision (CV)? It's actually a rather broad field, but simply put it focuses on replicating human vision and pattern recognition using computers and digital imagery. This is an incredibly trivial task for humans. As soon as we are born we begin learning what various objects and patterns are and we are able to associate those to things we have never seen before and comprehend what we are seeing. I don't need to see every chair in the world to know a chair when I see it, regardless of the angle and lighting in which I view it. Where we might see a picture of a chair, however; a computer "sees" a matrix of values where each element represents a pixel color. Computer Vision is used in a multitude of ways, some of which include self-driving cars, assembly line QC, surveillance systems, medical imaging, and sports analytics. A lot of computer vision is actually just image processing, which is not dissimilar from seismic processing & attribute generation, which is a big part of my day job.

No alt text provided for this image

Pictured to the right is an example of what we as humans see versus what a computer sees and how it is displayed on a screen. Where we see an aerial photograph in grey scale, a computer "sees" a matrix consisting of pixel values between 0 and 255. (In fact, all the computer really knows are bytes and machine language instructions, so this is perhaps not quite accurate but will suffice.)

We live in a time where Computer Vision research is truly blossoming. There are a number of different frameworks/tools/libraries for implementing Computer Vision applications, but perhaps one of the best established and easiest to use libraries is OpenCV. OpenCV is an open-source library initially developed by Intel in 1999. The library is written in C++ (originally in C) but has wrappers for Python and Java, as well as Matlab, JavaScript, C#, and a number of other popular languages. While the syntax is specific to each language, the overall API is the same, so if you know how to use the library in one language, using it in another is a matter of learning the other language.

Unlike Data Science where Python Jupyter Notebooks rule the day, OpenCV applications are typically written in the traditional manner of a stand-alone executable program. That's not to say you can't write an OpenCV application in a Jupyter Notebook because you certainly can, but Notebooks in my opinion really are better for rapid prototyping and exploring ideas. While most of my recent coding endeavors have been in Python, I've taken learning OpenCV as an opportunity to dive back in to C++. While pointers still frustrate me, I think I am slowly getting the hang of them.

I like to code-along with reference books and tutorials, working through examples and exercises as I learn the basics. Many times, I end up modifying these programs to do more than what is intended or asked for simply because I want to add functionality and I'm just curious. However, I think the best way to learn a new skill, especially when it involves programming, is to have specific problems you want to solve. It focuses your attention to the tasks you truly need to learn and it gives you a target. I find it also makes it easier to identify the steps needed to solve the problem. It is also really easy to get trapped in an endless loop of following tutorials and never making any real progress.

Now for a little about one of my project goals. My wife is one of the directors and the girls head coach of a local non-profit soccer club. While I grew up playing and loving baseball, soccer has been her lifelong passion and one that I am learning to share. Some of my goals for learning the OpenCV library are to be able to build tools that help the soccer club better analyze their game video footage as well as compiling useful statistics which players can cite to make them more competitive to college recruiters. The current method for analyzing game video footage is manual review and logging. This is a very time intensive task, and for a club made up of volunteers, there isn't really much time to devote to that type of analysis. Theagarajan et al. (2018) put forth a very compelling workflow by training a CNN to detect players with and without the ball leading to automated player statistics generation.

My first test was to write a program that takes a video and passes each frame through a CNN for object detection. Eventually, I will want to use custom training data, but initially I used a very popular pre-trained Object Detection model called YOLOv3. Here is a still frame from a video of my wife juggling a soccer ball in our backyard:

No alt text provided for this image

And here is the same still frame after passing the video through YOLOv3 using an OpenCV program I wrote for object detection:

No alt text provided for this image

The program reads in an image, a video, or a webcam device and then processes each frame through the YOLOv3 Object Detection CNN. It outputs each resulting frame with bounding boxes around the items that were detected along with the object label and confidence in the object label classification (on a 0.0 to 1.0 scale).

Out of the box, YOLOv3 can only detect objects in the COCO data set (Common Objects in COntext) from which it is pre-trained. COCO has 80 different classes, two of which are detected in the image above ("Person" and "Sports Ball"). YOLOv3 is directly loaded into my program using OpenCV's DNN module via the cv::dnn::readNetFromDarknet() function. This function takes two inputs: 1) the model configuration, and 2) the model weights. Darknet is the name of the neural network framework used in training YOLOv3 and is written in C. To learn more, see the link below. (You could instead call cv::dnn:readNetFromTensorflow() function if you had trained a NN using TensorFlow. Other frameworks supported by OpenCV include Caffe, Torch, DLDT, & ONNX).

There's a lot more to do and a long way to go, but I'm excited! If you want to see some code examples, check out my GitHub "learning-opencv" repository (a link to my GitHub can be found on my profile page).

Find out more here:

OpenCV Library:

YOLOv3 Object Detection

COCO data set:

Computer Vision:

References:

https://ai.stanford.edu/~syyeung/cvweb/tutorial1.html

R. Theagarajan, F. Pala, X. Zhang, and B. Bhanu, "Soccer: Who has the ball? Generating Visual Analytics and Player Statistics", In Proc. IEEE/CVF Conference on Computer Vision & Pattern Recognition Workshops, pages 1862 - 1870, Jun. 2018.

Carter Timbel

Senior Geologist

4 年

Very cool, Ben!

回复
G. Peter Kuijper

GPK Project Services

4 年

Good luck with the development of your Computer Vision Ben. Potential future applications as I understand are endless but analyzing soccer videos will definitely be of interest to colleges and the big teams. Would your CVision benefit improved layouts of facilities, plants to platforms in conjunction with current 3D software, as well as highway and pipeline alignment designs?

要查看或添加评论,请登录

社区洞察

其他会员也浏览了