登录查看更多内容

Robotic vision: Recognizing shapes, objects and people

Chris Fotache

发布日期: 2015年8月12日

In my first article, I spoke about the basic elements of making a robot autonomous: self-localization, path planning and motion. But in order to determine where it is, and which way it needs to go, the robot needs to see what's around it, and that is called Computer Vision (CV). It's a very large field, and the algorithms are extremely complex, but fortunately there is a free library containing most of the functions we need - OpenCV.

Once you install the library, most image recognition operations are reduced to just a few lines of code. And by the way, all the example below are extracted from Python programs I wrote. OpenCV is not very hard to learn, but some knowledge about image processing is useful. For example, most OpenCV functions work better and faster on grayscale images, so that would be one of the first functions to run:

grayimg = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

Especially when detecting lines and other rectangular shapes, the Canny edge-detection algorithm is very helpful, and all its stages are encompassed into a single OpenCV function:

edgeimg = cv2.Canny(grayimg, 50, 150)

And now for the most basic image recognition, let's see how to detect straight lines. The algorithm is based on Hough transforms, and is as simple as this:

lines = cv2.HoughLines(edgeimg, 1, pi/180, 200)

We can also apply a Hough transform to detect a more complicated shape - circles:

circles = cv2.HoughCircles(gray, cv2.cv.CV_HOUGH_GRADIENT, 1.2, 50)

Now let's see how we can detect more complex shapeslike, for example, human bodies. The basic algorithm is simple: we use HOG (histogram of oriented gradients) to identify continuous shapes inside the image (which can be a person, a tree, a car, etc.). Then we take these HOG descriptors and we feed them to an image recognition program based on supervised machine learning, like SVM (support vector machine). SVM will try to classify each HOG shape based on images from the training database, and return all the matches. Easy, isn't it? And OpenCV makes it even easier by reducing all these to 3 lines of code:

hog = cv2.HOGDescriptor()
hog.setSVMDetector(cv2.HOGDescriptor_getDefaultPeopleDetector() )
found = hog.detectMultiScale(img, winStride=(8,8), padding=(32,32), scale=1.05)

Haar Cascades are another machine learning technique that allows detecting objects, and OpenCV comes with pre-built classifiers for face detection. (it also provides functionality for training it with any other objects). So detecting a face in an image is as simple as this:

face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
faces = face_cascade.detectMultiScale(grayimg, 1.3, 5)

If you want to go further and detect the eyes on that face, you must first create a region of interest limited to the face area, and then use the other pre-built cascade file:

eye_cascade = cv2.CascadeClassifier('haarcascade_eye.xml')
eyes = eye_cascade.detectMultiScale(roi_img)

Now the cascades will just detect the presence of a human face, but don't do any facial recognition. If you want to go further, look up the Eigenfaces algorithm. That one is based on classifying training images of various people (at least 20 pictures of each), and then, when supplied with a new face image, it will pick the person with the best match.

Note that for all these methods I mentioned here, you can use the video feed from a camera or, for faster development, start with local images saved on your drive. Once it all works with saved images, you can switch to the actual camera, and then try to reduce the resolution to the lowest one that would still let your program work. Computer Vision is very compute-intensive and the less data you have, the faster it will work.

Shakti Dhar Sharma

Onsite and Offshore Project Management, Robotics, AI, Mobile

8 年

I am searching all over and missed the spark from my existing network. And I am reaching this from a google search. Thank you for this excellent starting point.

1 次回应

Mourad Bousserouel

Fullstack Developer & CTO @ Siine

8 年

I am currently working on a project that requires a lot of object recognition (the object being domino tiles). I'm not used to OpenCV/Python, so I've been struggling... But I do admit that the CV2 is quite the amazing thing, you know! I think this article may provide a lot of insight in that matter, so thank you for sharing!

查看更多评论

要查看或添加评论，请登录

Chris Fotache的更多文章

Text Classification in Python: Pipelines, NLP, NLTK, Tf-Idf, XGBoost and more

2018年5月9日

Text Classification in Python: Pipelines, NLP, NLTK, Tf-Idf, XGBoost and more

In this first article about text classification in Python, I'll go over the basics of setting up a pipeline for natural…
Identifying Land Features from Aerial Video/Images with Deep Learning

2017年1月31日

Identifying Land Features from Aerial Video/Images with Deep Learning

I recently got the idea that if a drone can identify the environment it's flying over, it can both find a safe landing…

14 条评论
How I built my first drone - Part 1 (the hardware)

2016年3月9日

How I built my first drone - Part 1 (the hardware)

After starting building robots last summer, towards the end of the year I went the extra dimension and started building…

10 条评论
SLAM your robot or drone with Python and a $150 Lidar

2016年1月13日

SLAM your robot or drone with Python and a $150 Lidar

As I mentioned in my previous article, there are 2 basic steps in an autonomous robot's life: Localization and Motion…

14 条评论
My introduction to autonomous mobile robots

2015年7月29日

My introduction to autonomous mobile robots

As some people know, I've recently developed a sudden interest in a completely new field for me - robotics. On a…

2 条评论
The Walking Dead Invade New Jersey – Walker Stalker Con Review

2014年12月22日

The Walking Dead Invade New Jersey – Walker Stalker Con Review

This weekend I had the pleasure to attend the inaugural Walker Stalker Con 2014 at the Meadowlands Expo Center in New…

1 条评论
Great Deals on Jersey Shore Rentals in August and September 2014

2014年8月4日

Great Deals on Jersey Shore Rentals in August and September 2014

It’s August already, but if you didn’t book a NJ shore vacation yet, there are still many last minute deals and special…
Jersey Shore Events on the Weekend of May 30-June 1, 2014

2014年5月29日

Jersey Shore Events on the Weekend of May 30-June 1, 2014

Here’s a selection of events at the Jersey Shore during the upcoming weekend, May 30 – June 1, 2014. Weekend-long…

See all articles

Robotic vision: Recognizing shapes, objects and people

Chris Fotache

Chris Fotache的更多文章

社区洞察

其他会员也浏览了

Thursday: A CVPR 2024 Paper on OpenCV Live Webinar

Community Friday #11: Gesture-based interactions + The Past and Future of "AI Art"

Journey to Autonomy: The Power of Xwing Air Data for Deep Learning

Computer Vision Wrapped

Change in Computer Vision Technologies Begins!

Steering Safer Streets: The Role of AI and Synthetic Data in Understanding Pedestrian Behavior

Robotic Foundation Models and Physical AI Models: Innovations, Applications, Ethical Challenges, and the Future of Generalized Robotics

Top 5 Papers of 2024: Leading the Way in Deep Learning and Computer Vision

The Cognitive Revolution in Aerospace and Defense: Artificial Intelligence as a Catalyst for Strategic Transformation

Agentic Systems - from Roomba to Mars

Chris Fotache的更多文章

Text Classification in Python: Pipelines, NLP, NLTK, Tf-Idf, XGBoost and more

Identifying Land Features from Aerial Video/Images with Deep Learning

How I built my first drone - Part 1 (the hardware)

SLAM your robot or drone with Python and a $150 Lidar

My introduction to autonomous mobile robots

The Walking Dead Invade New Jersey – Walker Stalker Con Review

Great Deals on Jersey Shore Rentals in August and September 2014

Jersey Shore Events on the Weekend of May 30-June 1, 2014

社区洞察

其他会员也浏览了

Thursday: A CVPR 2024 Paper on OpenCV Live Webinar

Community Friday #11: Gesture-based interactions + The Past and Future of "AI Art"

Journey to Autonomy: The Power of Xwing Air Data for Deep Learning

Computer Vision Wrapped

Change in Computer Vision Technologies Begins!

Steering Safer Streets: The Role of AI and Synthetic Data in Understanding Pedestrian Behavior

Robotic Foundation Models and Physical AI Models: Innovations, Applications, Ethical Challenges, and the Future of Generalized Robotics

Top 5 Papers of 2024: Leading the Way in Deep Learning and Computer Vision

The Cognitive Revolution in Aerospace and Defense: Artificial Intelligence as a Catalyst for Strategic Transformation

Agentic Systems - from Roomba to Mars