Robotic vision: Recognizing shapes, objects and people
In my first article, I spoke about the basic elements of making a robot autonomous: self-localization, path planning and motion. But in order to determine where it is, and which way it needs to go, the robot needs to see what's around it, and that is called Computer Vision (CV). It's a very large field, and the algorithms are extremely complex, but fortunately there is a free library containing most of the functions we need - OpenCV.
Once you install the library, most image recognition operations are reduced to just a few lines of code. And by the way, all the example below are extracted from Python programs I wrote. OpenCV is not very hard to learn, but some knowledge about image processing is useful. For example, most OpenCV functions work better and faster on grayscale images, so that would be one of the first functions to run:
grayimg = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
Especially when detecting lines and other rectangular shapes, the Canny edge-detection algorithm is very helpful, and all its stages are encompassed into a single OpenCV function:
edgeimg = cv2.Canny(grayimg, 50, 150)
And now for the most basic image recognition, let's see how to detect straight lines. The algorithm is based on Hough transforms, and is as simple as this:
lines = cv2.HoughLines(edgeimg, 1, pi/180, 200)
We can also apply a Hough transform to detect a more complicated shape - circles:
circles = cv2.HoughCircles(gray, cv2.cv.CV_HOUGH_GRADIENT, 1.2, 50)
Now let's see how we can detect more complex shapeslike, for example, human bodies. The basic algorithm is simple: we use HOG (histogram of oriented gradients) to identify continuous shapes inside the image (which can be a person, a tree, a car, etc.). Then we take these HOG descriptors and we feed them to an image recognition program based on supervised machine learning, like SVM (support vector machine). SVM will try to classify each HOG shape based on images from the training database, and return all the matches. Easy, isn't it? And OpenCV makes it even easier by reducing all these to 3 lines of code:
hog = cv2.HOGDescriptor()
hog.setSVMDetector(cv2.HOGDescriptor_getDefaultPeopleDetector() )
found = hog.detectMultiScale(img, winStride=(8,8), padding=(32,32), scale=1.05)
Haar Cascades are another machine learning technique that allows detecting objects, and OpenCV comes with pre-built classifiers for face detection. (it also provides functionality for training it with any other objects). So detecting a face in an image is as simple as this:
face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
faces = face_cascade.detectMultiScale(grayimg, 1.3, 5)
If you want to go further and detect the eyes on that face, you must first create a region of interest limited to the face area, and then use the other pre-built cascade file:
eye_cascade = cv2.CascadeClassifier('haarcascade_eye.xml')
eyes = eye_cascade.detectMultiScale(roi_img)
Now the cascades will just detect the presence of a human face, but don't do any facial recognition. If you want to go further, look up the Eigenfaces algorithm. That one is based on classifying training images of various people (at least 20 pictures of each), and then, when supplied with a new face image, it will pick the person with the best match.
Note that for all these methods I mentioned here, you can use the video feed from a camera or, for faster development, start with local images saved on your drive. Once it all works with saved images, you can switch to the actual camera, and then try to reduce the resolution to the lowest one that would still let your program work. Computer Vision is very compute-intensive and the less data you have, the faster it will work.
Onsite and Offshore Project Management, Robotics, AI, Mobile
8 年I am searching all over and missed the spark from my existing network. And I am reaching this from a google search. Thank you for this excellent starting point.
Fullstack Developer & CTO @ Siine
8 年I am currently working on a project that requires a lot of object recognition (the object being domino tiles). I'm not used to OpenCV/Python, so I've been struggling... But I do admit that the CV2 is quite the amazing thing, you know! I think this article may provide a lot of insight in that matter, so thank you for sharing!