登录查看更多内容

Going Deeper with Convolutions (Inception | GoogLeNet)

AYOUB KIROUANE

ML Engineer

发布日期: 2022年11月9日

1 . What is an inception model?

Inception is?an image recognition model that has been shown to attain greater than 78.1% accuracy on the ImageNet dataset. The model is the culmination of many ideas developed by multiple researchers over the years.

The model comprises symmetric and asymmetric building blocks, including convolutions, average pooling, max pooling, concatenations, dropouts, and fully connected layers. Batch normalization is used extensively throughout the model and applied to activation inputs. Loss is computed using Softmax.

2 . Inception V1 :

When multiple deep layers of convolutions were used in a model it resulted in the overfitting of the data. To avoid this from happening the Inception-V1 model uses the idea of using multiple filters of different sizes on the same level. Thus in the inception models instead of having deep layers, we have parallel layers thus making our model wider rather than making it deeper.

The above-depicted Inception module simultaneously performs 1 * 1 convolutions, 3 * 3 convolutions, 5 * 5 convolutions, and 3 * 3 max pooling operations.

Thereafter, it sums up the outputs from all the operations in a single place and builds the next feature. The architecture does not follow the Sequential model approach where every operation such as pooling or convolution is performed one after the other.

The Inception module with dimension reduction works in a similar manner as the na?ve one with only one difference. Here features are extracted on a pixel level using 1 * 1 convolutions before the 3 * 3 convolutions and 5 * 5 convolutions. When the 1 * 1 convolution operation has been performed the dimension of the image is not changed. However, the output achieved offers better accuracy.

Inception architecture :

?3 . Inception-V2 :

In the Inception-V2 architecture. The?5×5?convolution is replaced by the two?3×3?convolutions. This also decreases computational time and thus increases computational speed because a?5×5?convolution is 2.78 more expensive than a?3×3?convolution. So, Using two?3×3?layers instead of?5×5?increases the performance of architecture.

This architecture also converts?nXn?factorization into 1xn and nx1 factorization. As we discussed above that a 3×3 convolution can be converted into?1×3?then followed by a 3×1 convolution which is 33% cheaper in terms of computational complexity as compared to?3×3.

To deal with the problem of the representational bottleneck, the feature banks of the module were expanded instead of making it deeper. This would prevent the loss of information that causes when we make it deeper.?

3 . Inception-V3 :

Inception-v3 mainly focuses on burning less computational power by modifying the previous Inception architectures. This idea was proposed in the paper?Rethinking the Inception Architecture for Computer Vision, published in 2015. It was co-authored by Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, and Jonathon Shlens.

领英推荐

Paper Review: Chameleon: Mixed-Modal Early-Fusion…

Andrey Lukyanenko 10 个月前

YOLOv10: The New Benchmark in Object Detection?

Ritesh Kanjee 10 个月前

Paper Review: Depth Pro: Sharp Monocular Metric Depth…

Andrey Lukyanenko 5 个月前

Inception-v3 architecture:

Inception-v3 is similar to Inception-v2 with some updates in loss functions, optimizer, and batch normalization.

What’s new ?

These are some updates in Inception-v3 concerning inception-v2 :

RMS prop optimizer is used
Batch normalization is used in the Auxilary classifier
Label Smoothing (A type of regularizing component added to the loss function that prevents the network from overfitting).

4 . Inception-v4 :

The architecture of the network was made deeper in Inception v4 with the change in the stem part (stem refers to the starting part of Inception architecture) and made uniform choices for the Inception blocks.

What’s new?

Change in the stem part
The number of Inception modules is increased.
Inception modules are made more uniform i.e. same number of filters are used in modules.
Three types of inception modules are named A, B, and C ( similar inception modules as that in inception-v2 ).

5 . Inception ResNet v2 :

Inspired by the performance of the?ResNet,?residual connections?are introduced in inception modules.

Input and concatenate output after several operations should have the same dimension, therefore the?padding?is applied in each operation, and?at the end, 1*1 convolution is applied?to make the number of channels equal as shown below.

Performance of Inception :

MY GITHUB

Inception-V1 Paper

Inception-V3 Paper

Inception-V4 Paper

要查看或添加评论，请登录

AYOUB KIROUANE的更多文章

Mixture-of-Agents Enhances Large Language Model Capabilities: A Comprehensive Overview

2024年6月14日

Mixture-of-Agents Enhances Large Language Model Capabilities: A Comprehensive Overview

Introduction Recent advancements in large language models (LLMs) have significantly enhanced the field of natural…

2 条评论
REINFORCE: A Simple and Effective Approach to LLM Alignment

2024年6月13日

REINFORCE: A Simple and Effective Approach to LLM Alignment

Reinforcement Learning from Human Feedback (RLHF) has emerged as a crucial technique for aligning large language models…
The AI Mind Revealed: Decoding the Hidden Language of Large Language Models

2024年6月12日

The AI Mind Revealed: Decoding the Hidden Language of Large Language Models

Large language models (LLMs) have revolutionized artificial intelligence by excelling in tasks like language…

2 条评论
Grokked Transformers: Implicit Reasoners on the Edge of Generalization

2024年6月9日

Grokked Transformers: Implicit Reasoners on the Edge of Generalization

Large language models (LLMs) are powerful, but they struggle with a fundamental skill: implicit reasoning. this means…
Grokking: A Deep Dive into Delayed Generalization in Neural Networks

2024年6月8日

Grokking: A Deep Dive into Delayed Generalization in Neural Networks

The world of deep learning is full of mysteries. One of the most intriguing is the phenomenon of grokking, where neural…

2 条评论
TimesFM: A Foundation Model Revolutionizing Time-Series Forecasting

2024年5月15日

TimesFM: A Foundation Model Revolutionizing Time-Series Forecasting

Time-series data, like stock prices or weather patterns, is everywhere. Predicting the future of this data –…

4 条评论
Back to the Future: xLSTM Revives the Power of Long Short-Term Memory for Large Language Models

2024年5月12日

Back to the Future: xLSTM Revives the Power of Long Short-Term Memory for Large Language Models

The world of large language models (LLMs) has been dominated by Transformers since their introduction in 2017. But…
Dora : Addressing Limitations in LoRA Fine-Tuning and Enhancing Model Performance

2024年4月8日

Dora : Addressing Limitations in LoRA Fine-Tuning and Enhancing Model Performance

DoRA: Weight-Decomposed Low-Rank Adaptation paper presents a novel weight decomposition analysis inspired by Weight…

2 条评论
Revolutionizing Large Language Models with 1-Bit Transformers: BitLinear and BitNet b1.58

2024年3月3日

Revolutionizing Large Language Models with 1-Bit Transformers: BitLinear and BitNet b1.58

Introduction: Large language models have shown impressive results in natural language processing tasks, but their…
Expanding Sequence Handling: Ring Attention with Block-wise Transformers for Enhanced Contextual Modeling

2024年2月23日

Expanding Sequence Handling: Ring Attention with Block-wise Transformers for Enhanced Contextual Modeling

Vanilla Transformers, which compute self-attention by materializing the attention matrix and compute the feed-forward…

1 条评论

See all articles

Going Deeper with Convolutions (Inception | GoogLeNet)

AYOUB KIROUANE

ML Engineer

1 . What is an inception model?

2 . Inception V1 :

Inception architecture :

?3 . Inception-V2 :

3 . Inception-V3 :

领英推荐

Inception-v3 architecture:

What’s new ?

4 . Inception-v4 :

What’s new?

5 . Inception ResNet v2 :

Performance of Inception :

AYOUB KIROUANE的更多文章

社区洞察

其他会员也浏览了

Paper Review: Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures

How to Set Up and Run DeepSeek-R1 Locally Using Docker and Docker Compose

FOD#46: What is Mamba and can it beat Transformers?

Paper Review: Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture

A deep dive into DeepSeek-V3 architecture

YOLOv8 VS YOLOv9 VS ResNet (Residual Networks) VS VGG (Visual Geometry Group Network) VS Inception (GoogleNet)

Topic 13: What is OLMoE?

Decoding GenAI Terminologies: A short intro into Architectures, Models and Prompt Engineering

StyleGAN2: the imagination of AI (Code Included)

What are Diffusion Models?

1 . What is an inception model?

2 . Inception V1 :

Inception architecture :

?3 . Inception-V2 :

3 . Inception-V3 :

领英推荐

Inception-v3 architecture:

What’s new ?

4 . Inception-v4 :

What’s new?

5 . Inception ResNet v2 :

Performance of Inception :

AYOUB KIROUANE的更多文章

Mixture-of-Agents Enhances Large Language Model Capabilities: A Comprehensive Overview

REINFORCE: A Simple and Effective Approach to LLM Alignment

The AI Mind Revealed: Decoding the Hidden Language of Large Language Models

Grokked Transformers: Implicit Reasoners on the Edge of Generalization

Grokking: A Deep Dive into Delayed Generalization in Neural Networks

TimesFM: A Foundation Model Revolutionizing Time-Series Forecasting

Back to the Future: xLSTM Revives the Power of Long Short-Term Memory for Large Language Models

Dora : Addressing Limitations in LoRA Fine-Tuning and Enhancing Model Performance

Revolutionizing Large Language Models with 1-Bit Transformers: BitLinear and BitNet b1.58

Expanding Sequence Handling: Ring Attention with Block-wise Transformers for Enhanced Contextual Modeling

社区洞察

其他会员也浏览了

Paper Review: Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures

How to Set Up and Run DeepSeek-R1 Locally Using Docker and Docker Compose

FOD#46: What is Mamba and can it beat Transformers?

Paper Review: Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture

A deep dive into DeepSeek-V3 architecture

YOLOv8 VS YOLOv9 VS ResNet (Residual Networks) VS VGG (Visual Geometry Group Network) VS Inception (GoogleNet)

Topic 13: What is OLMoE?

Decoding GenAI Terminologies: A short intro into Architectures, Models and Prompt Engineering

StyleGAN2: the imagination of AI (Code Included)

What are Diffusion Models?