First-principles approach to feature/blob Detection using SIFT
Krishna Yogi Kolluru
Data Scientist | ML Architect | GenAI | Sagemaker | Speaker | ex-Microsoft | IIT - NUS Alumni | AWS Certified ML / Data Engineer
Image detection aka feature detection is hard for computers like very very hard in fact one can say impossibly hard as computers can only reach a probabilistic outcome and humans somehow have an intuitive understanding of pictures (even though a lot of it involves heavy neural computations too if you dig deeper ).
To be fair, it's humans who label images and it's the computer that needs to learn from that so computers always have to catch up!
With this disclaimer in place, let's talk about SIFT - Scale-invariant Feature Transform, A magical technique that can extract features from images, compare them, align them correctly if needed and even do image stitching.
The reason why it's magical is that computers only speak binary ( 0 or 1 ) and simple maths (addition/subtraction/division/multiplication) and images are simply made of numbers too ( when stored on a computer). so computers somehow need to make sense of the numbers in these images and understand features that are relevant to us humans ( coz those are irrelevant to computers :) )
This is an area of research that took more than 2 decades by some of the best research scientists in the world.
The goal is to compare similar features between two similar pictures that are not identical.
There could be a variation of scale, there could be rotation in images, there could be a difference in lighting, etc.
When two images are fed to a SIFT algorithm, it first identifies what are called Blobs or Areas of interest at different scales ( this is important to take care of scale ). The second step is to identify an alignment amongst the pictures i.e a sense of direction/alignment and then scale factor, then finally we can compare these blobs/areas of interest and see if these match, then and only then are we done with the feature matching/image detection and comparison.
Now let's review the math of all the steps
First step: Identifying Blobs at various scales.
领英推荐
For this step, we use 2-d Gaussian filter smoothing across entire images and extract Normalised Log gaussian values across different sigmas ( sigma is the value of guassian variance ), it so happens that, these Gaussians are actually pretty useful in identifying blobs at different scales. some blobs can be identified at 1 sigma, others at n sigma, and so on.
There is an interesting feature with the Gaussian Filters, for the same blob on two images at different scales, the sigma ( where the peak appears ) happens to be proportional to the scale which can be used to identify the same blobs amongst two pictures, In this respect, the gaussian filters feel like god-send :)
The next step is identifying orientation:
For identifying orientation ( i.e a picture could be tilted compared to the other), we employ edge detection techniques like 'Canny Edge detection' for each pixel in the entire picture, after we identifying edges at the pixel level, we plot a histogram of directions ( roughly 8 different directions vs pixel count. The direction with the largest count will be considered the orientation of the blob.
Note: even though this technique might not truly identify the orientation of a blob, this is good enough for comparing the orientations of two different pictures assuming they are the same.
The final step is 'similar blob identification' (after correcting for scale and orientation )
For correct blob identification ( in an automatic fashion, without human intervention of course ), we need to extract some sort of unique signature for each blob. Turns out , we can use the histogram of the directions ( that we created previously) as signature as well.
This time, however, we plot the direction histogram of the full blob ( all 4 quadrants of it ) in a single chart. we shall be considering the entire direction histogram as the signature of the blob. Do note that this signature function can only be used after aligning the two pictures correctly ( for which the previous step is useful )
So the SIFT algorithm has delivered what we intended it to do, it has helped us in identifying interesting features, then helped to identify the orientation, and finally identified the exact blobs on different pictures.
Thanks for reading!