#6 Coding an Object Detection Model with FasterRCNN + InceptionResNetV2
Revisiting the fundamentals of an image
To dive into the basics of image processing, we first understand that an image can be seen as a function f where each pixel holds a value representing its intensity. In grayscale images, this value ranges from 0 to 255, where 0 signifies full black and 255 denotes full white. Hence, f(x,y) gives us the intensity at a pixel location (x,y).
When we talk about coloured images, each pixel has three channels for Red, Green and Blue colours. So, each pixel is no longer a single value between 0 to 255, but instead a combination of RGB.
Common Image Edits
In image manipulation, we often alter either the range of the image, changing its color, or simply adjust the positions of pixels without changing their colors.
Filters
Filters are a fundamental part of image processing. They alter the pixel values within an image, thereby changing its appearance. Common filters include:
Project:
Object Detection Model using FasterRCNN+InceptionResNetV2
The InceptionResNetV2 feature extractor was trained on ImageNet and fine-tuned with FasterRCNN head on OpenImages V4 dataset, containing 600 classes.
领英推荐
The module performs non-maxima suppression inside the module. The maximal number of detections outputted is 100. Detections are outputted for 600 boxable categories.
Input and Output
The input for the model is a three-channel image of variable size. The output includes bounding box coordinates, detection class names, indices, and scores.
Implementation
GitHub Repo:
Sources: