登录查看更多内容

Region-Based Object Detection (R-CNN Object Detection)

AYOUB KIROUANE

ML Engineer

发布日期: 2022年12月8日

What is RCNN?

R-CNN (Region-based Convolutional Neural Network), or?Regions with CNN Features, is an object detection model that uses high-capacity CNNs to bottom-up region proposals in order to localize and segment objects. It uses?selective search?to identify a number of bounding-box object region candidates “regions of interest” and then independently extracts features from each region for classification.

How does the R-CNN model work?

As can be seen in the image above before passing a shot through a network, we need to extract region proposals or regions of interest using an algorithm such as selective search. Then, we need to resize all the extracted crops and pass them through a network.

Finally, a network assigns a category from C + 1, including the ‘background’ label, categories for a given crop. Additionally, it predicts delta Xs and Ys to shape a given crop.

Region Proposals :

The region proposal step selects about 2000 areas (bounding boxes) in the image where objects are likely to be. They chose the?Selective Search?method because it is easy to compare the performance with the previous research that uses the same method.

R-CNN is agnostic to region proposal methods because the region proposal step and the subsequent CNN feature extraction step are independent. It first selects regions and then applies CNN feature extraction to each region.

Since selected rectangle areas have various sizes, all areas are warped (resized) to a fixed size of 227×227 pixels. It ensures that the bounding box dimensions are constant and that the features extracted by the CNN layers all result in the exact dimensions.

CNN Feature Extraction :

R-CNN uses five convolutional layers and two fully connected layers in AlexNet to extract features from each area (227×227 pixels) into a 4096-dimensional vector. It then uses the features to classify the images in each area. It is transfer learning, where the final part is SVM, which makes the judgment for image classification. SVM was a widely used method, and they followed the same approach for the last classification step.

SVM Classification :

领英推荐

A Comprehensive Overview of Classification Methods

Utpal Dutta 8 个月前

Decoding the Future: A Deep Dive into Time Series…

Iain Brown PhD 1 年前

Revolutionizing Predictive Analytics: DeepMind's…

Jamilu Adamu, FIMC, FIMS (UK), FAIPA, FICA, CMC, TML 7 个月前

Now we will need to classify those feature vectors. We want to detect what class of object those feature vectors represent. For this, we use an SVM classification. We have one SVM for each object class and we use them all. This means that for one feature vector we have n outputs, where n is the number of different objects we want to detect. The output is a confidence score.

NOW, how we trained those different SVMs?

Well, we train them on feature vectors created by AlexNet. That means, that we have to wait until we fully trained the CNN before we can train the SVM. The training is not parallelizable.

Non Maximum Suppression :

NMS is a greedy algorithm that?loops over all the classes, and for each class, it checks for overlaps (IoU — Intersection over Union) between all the bounding boxes. If the IoU between two boxes of the same class is above a certain threshold (usually 0.7), the algorithm concludes that they refer to the same object, and?discards the box with the lower confidence score?(which is a product of the objectness score and the conditional class probability).

Intersection over Union (IoU) :

The?Intersection over Union (IoU)?metric, also referred to as the?Jaccard index, is essentially a method used usually to quantify the percent overlap between the ground truth BBox (Bounding Box) and the prediction BBox. However, in NMS, we find IoU between two prediction Bounding Box instead.

IoU?in mathematical terms can be represented by the following expression,

Intersection Over Union(IoU) =

(Target ∩ Prediction) / (Target U Prediction)

RCNN PAPER

MY GITHUB

要查看或添加评论，请登录

AYOUB KIROUANE的更多文章

Mixture-of-Agents Enhances Large Language Model Capabilities: A Comprehensive Overview

2024年6月14日

Mixture-of-Agents Enhances Large Language Model Capabilities: A Comprehensive Overview

Introduction Recent advancements in large language models (LLMs) have significantly enhanced the field of natural…

2 条评论
REINFORCE: A Simple and Effective Approach to LLM Alignment

2024年6月13日

REINFORCE: A Simple and Effective Approach to LLM Alignment

Reinforcement Learning from Human Feedback (RLHF) has emerged as a crucial technique for aligning large language models…
The AI Mind Revealed: Decoding the Hidden Language of Large Language Models

2024年6月12日

The AI Mind Revealed: Decoding the Hidden Language of Large Language Models

Large language models (LLMs) have revolutionized artificial intelligence by excelling in tasks like language…

2 条评论
Grokked Transformers: Implicit Reasoners on the Edge of Generalization

2024年6月9日

Grokked Transformers: Implicit Reasoners on the Edge of Generalization

Large language models (LLMs) are powerful, but they struggle with a fundamental skill: implicit reasoning. this means…
Grokking: A Deep Dive into Delayed Generalization in Neural Networks

2024年6月8日

Grokking: A Deep Dive into Delayed Generalization in Neural Networks

The world of deep learning is full of mysteries. One of the most intriguing is the phenomenon of grokking, where neural…

2 条评论
TimesFM: A Foundation Model Revolutionizing Time-Series Forecasting

2024年5月15日

TimesFM: A Foundation Model Revolutionizing Time-Series Forecasting

Time-series data, like stock prices or weather patterns, is everywhere. Predicting the future of this data –…

4 条评论
Back to the Future: xLSTM Revives the Power of Long Short-Term Memory for Large Language Models

2024年5月12日

Back to the Future: xLSTM Revives the Power of Long Short-Term Memory for Large Language Models

The world of large language models (LLMs) has been dominated by Transformers since their introduction in 2017. But…
Dora : Addressing Limitations in LoRA Fine-Tuning and Enhancing Model Performance

2024年4月8日

Dora : Addressing Limitations in LoRA Fine-Tuning and Enhancing Model Performance

DoRA: Weight-Decomposed Low-Rank Adaptation paper presents a novel weight decomposition analysis inspired by Weight…

2 条评论
Revolutionizing Large Language Models with 1-Bit Transformers: BitLinear and BitNet b1.58

2024年3月3日

Revolutionizing Large Language Models with 1-Bit Transformers: BitLinear and BitNet b1.58

Introduction: Large language models have shown impressive results in natural language processing tasks, but their…
Expanding Sequence Handling: Ring Attention with Block-wise Transformers for Enhanced Contextual Modeling

2024年2月23日

Expanding Sequence Handling: Ring Attention with Block-wise Transformers for Enhanced Contextual Modeling

Vanilla Transformers, which compute self-attention by materializing the attention matrix and compute the feed-forward…

1 条评论

See all articles

Region-Based Object Detection (R-CNN Object Detection)

AYOUB KIROUANE

ML Engineer

What is RCNN?

How does the R-CNN model work?

Region Proposals :

CNN Feature Extraction :

SVM Classification :

领英推荐

Non Maximum Suppression :

Intersection over Union (IoU) :

AYOUB KIROUANE的更多文章

社区洞察

其他会员也浏览了

Visualizing the Impact of Feature Attribution Baselines

Graph Convolution Networks

Understanding Oversquashing in Graph Neural Networks (GNNs)

Damaged Road Detection: Mask R-CNN on Supervisely

Are GARCH models still relevant today for investing and analyzing the volatility of commodity prices? Pt 4/5

Spatio-temporal Prediction Based On Graph Neural Network

Glass Classification using Neural?Networks

Advancing Object Detection: Unveiling the Evolution of R-CNN

Potholes and Plain road images classification using CNN(with code)

What is RCNN?

How does the R-CNN model work?

Region Proposals :

CNN Feature Extraction :

SVM Classification :

领英推荐

Non Maximum Suppression :

Intersection over Union (IoU) :

AYOUB KIROUANE的更多文章

Mixture-of-Agents Enhances Large Language Model Capabilities: A Comprehensive Overview

REINFORCE: A Simple and Effective Approach to LLM Alignment

The AI Mind Revealed: Decoding the Hidden Language of Large Language Models

Grokked Transformers: Implicit Reasoners on the Edge of Generalization

Grokking: A Deep Dive into Delayed Generalization in Neural Networks

TimesFM: A Foundation Model Revolutionizing Time-Series Forecasting

Back to the Future: xLSTM Revives the Power of Long Short-Term Memory for Large Language Models

Dora : Addressing Limitations in LoRA Fine-Tuning and Enhancing Model Performance

Revolutionizing Large Language Models with 1-Bit Transformers: BitLinear and BitNet b1.58

Expanding Sequence Handling: Ring Attention with Block-wise Transformers for Enhanced Contextual Modeling

社区洞察

其他会员也浏览了

Visualizing the Impact of Feature Attribution Baselines

Graph Convolution Networks

Understanding Oversquashing in Graph Neural Networks (GNNs)

Damaged Road Detection: Mask R-CNN on Supervisely

Are GARCH models still relevant today for investing and analyzing the volatility of commodity prices? Pt 4/5

Spatio-temporal Prediction Based On Graph Neural Network

Glass Classification using Neural?Networks

Advancing Object Detection: Unveiling the Evolution of R-CNN

Potholes and Plain road images classification using CNN(with code)