登录查看更多内容

Xception — With Depthwise Separable Convolution

AYOUB KIROUANE

ML Engineer

发布日期: 2022年11月15日

1 / What is Xception?

Another variant of the GoogLeNet architecture is also worth noting: Xception was proposed in 2016 by Fran?ois Chollet (the author of Keras) and it significantly outperformed Inception-v3 on a huge vision task (350 million images and 17,000 classes). Just like Inception-v4, it also merges the ideas of GoogLeNet and ResNet, but it replaces the inception modules with a special type of layer called a depthwise separable convolution (or separable convolution for short ). These layers had been used before in some CNN architectures (MobileNets), but they were not as central as in the Xception architecture.

2 / Depthwise Separable Convolution :

1. Regular Convolutions:

look at both channel & spatial correlations simultaneously

2. Depthwise separable convolution:

look at channel & spatial correlations independently in successive steps.
spatial convolution: 3x3 convolutions for each channel.
depthwise convolution: 1x1 convolutions on concatenated channels.

3. Modified Depthwise Separable Convolution in Xception :

The modified depthwise separable convolution is the?pointwise convolution followed by a depthwise convolution. This modification is motivated by the inception module in Inception-v3 that?1×1 convolution is done first before any n×n spatial convolutions. Thus, it is a bit different from the original one. (n=3?here since 3×3 spatial convolutions are used in Inception-v3.)

Separable convolutions use less parameters, less memory and less computations than regular convolutional layers, and in general they even perform better, so you should consider using them by default (except after layers with few channels).

4. Two minor differences:

The order of operations: As mentioned, the original depthwise separable convolutions as usually implemented perform first channel-wise spatial convolution and then perform 1×1 convolution whereas the modified depthwise separable convolution?perform 1×1 convolution first then channel-wise spatial convolution. This is claimed to be unimportant because when it is used in stacked setting, there are only small differences appeared at the beginning and at the end of all the chained inception modules.

领英推荐

Choice architecture makes its way into anti-trust:…

Rita McGrath 7 个月前

Hierarchical Physical Design

Insemi Technology Services Pvt. Ltd. 1 年前

Architecture Weekly #161 - 8th January 2024

Oskar Dudycz 1 年前

The Presence/Absence of Non-Linearity: In the original Inception Module, there is non-linearity after first operation.?In Xception, the modified depthwise separable convolution,?there is NO intermediate ReLU non-linearity.

3 / What does it look like ?

Xception stands for “extreme inception”, it takes the principles of Inception to an extreme. In Inception, 1x1 convolutions were used to compress the original input, and from each of those input spaces we used different type of filters on each of the depth space. Xception just reverses this step. Instead, it first applies the filters on each of the depth map and then finally compresses the input space using 1X1 convolution by applying it across the depth. This method is almost identical to a depthwise-separable convolution, an operation that has been used in neural network design as early as 2014. There is one more difference between Inception and Xception. The presence or absence of a non-linearity after the first operation. In Inception model, both operations are followed by a ReLU non-linearity, however Xception doen't introduce any non-linearity.

The data first goes through the entry flow, then after than it. goes through the middle flow (repeating itself 8 times in this middle flow), and finally through the exit flow.

Xception implemented using the TensorFlow framework by Google and trained on 60 NVIDIA K80 GPUs each.

Table below shows that Xception outperforms every model in ImageNet dataset.

Validation accuracy is also higher for Xception than inception model shown below.

The graph below shows that having no non-linearity in between Xception performs better than having any kind on non-linearity.

xception original paper

MY GITHUB

要查看或添加评论，请登录

AYOUB KIROUANE的更多文章

Mixture-of-Agents Enhances Large Language Model Capabilities: A Comprehensive Overview

2024年6月14日

Mixture-of-Agents Enhances Large Language Model Capabilities: A Comprehensive Overview

Introduction Recent advancements in large language models (LLMs) have significantly enhanced the field of natural…

2 条评论
REINFORCE: A Simple and Effective Approach to LLM Alignment

2024年6月13日

REINFORCE: A Simple and Effective Approach to LLM Alignment

Reinforcement Learning from Human Feedback (RLHF) has emerged as a crucial technique for aligning large language models…
The AI Mind Revealed: Decoding the Hidden Language of Large Language Models

2024年6月12日

The AI Mind Revealed: Decoding the Hidden Language of Large Language Models

Large language models (LLMs) have revolutionized artificial intelligence by excelling in tasks like language…

2 条评论
Grokked Transformers: Implicit Reasoners on the Edge of Generalization

2024年6月9日

Grokked Transformers: Implicit Reasoners on the Edge of Generalization

Large language models (LLMs) are powerful, but they struggle with a fundamental skill: implicit reasoning. this means…
Grokking: A Deep Dive into Delayed Generalization in Neural Networks

2024年6月8日

Grokking: A Deep Dive into Delayed Generalization in Neural Networks

The world of deep learning is full of mysteries. One of the most intriguing is the phenomenon of grokking, where neural…

2 条评论
TimesFM: A Foundation Model Revolutionizing Time-Series Forecasting

2024年5月15日

TimesFM: A Foundation Model Revolutionizing Time-Series Forecasting

Time-series data, like stock prices or weather patterns, is everywhere. Predicting the future of this data –…

4 条评论
Back to the Future: xLSTM Revives the Power of Long Short-Term Memory for Large Language Models

2024年5月12日

Back to the Future: xLSTM Revives the Power of Long Short-Term Memory for Large Language Models

The world of large language models (LLMs) has been dominated by Transformers since their introduction in 2017. But…
Dora : Addressing Limitations in LoRA Fine-Tuning and Enhancing Model Performance

2024年4月8日

Dora : Addressing Limitations in LoRA Fine-Tuning and Enhancing Model Performance

DoRA: Weight-Decomposed Low-Rank Adaptation paper presents a novel weight decomposition analysis inspired by Weight…

2 条评论
Revolutionizing Large Language Models with 1-Bit Transformers: BitLinear and BitNet b1.58

2024年3月3日

Revolutionizing Large Language Models with 1-Bit Transformers: BitLinear and BitNet b1.58

Introduction: Large language models have shown impressive results in natural language processing tasks, but their…
Expanding Sequence Handling: Ring Attention with Block-wise Transformers for Enhanced Contextual Modeling

2024年2月23日

Expanding Sequence Handling: Ring Attention with Block-wise Transformers for Enhanced Contextual Modeling

Vanilla Transformers, which compute self-attention by materializing the attention matrix and compute the feed-forward…

1 条评论

See all articles

Xception — With Depthwise Separable Convolution

AYOUB KIROUANE

ML Engineer

1 / What is Xception?

2 / Depthwise Separable Convolution :

1. Regular Convolutions:

2. Depthwise separable convolution:

3. Modified Depthwise Separable Convolution in Xception :

4. Two minor differences:

领英推荐

3 / What does it look like ?

AYOUB KIROUANE的更多文章

社区洞察

其他会员也浏览了

Residuality and the Rejection of Volatility

Architecture Weekly #174 - 8th April 2024

Clean, Onion or Vertical Slice Architecture. Which one is better?

The secret of great technology architecture is . . . timing

Architecture Weekly #142 - 28st August 2023

eShopOnWeb Architecture (2/16) - uses Value Objects to model domain concepts without identity

Understanding Microkernel Architecture: A Foundation for Modular and Secure Systems

Architecture Weekly #116 - 27th February 2023

Residuality, Model Drift, and Philosophy

Transformers Architecture - Part 2: English Version

1 / What is Xception?

2 / Depthwise Separable Convolution :

1. Regular Convolutions:

2. Depthwise separable convolution:

3. Modified Depthwise Separable Convolution in Xception :

4. Two minor differences:

领英推荐

3 / What does it look like ?

AYOUB KIROUANE的更多文章

Mixture-of-Agents Enhances Large Language Model Capabilities: A Comprehensive Overview

REINFORCE: A Simple and Effective Approach to LLM Alignment

The AI Mind Revealed: Decoding the Hidden Language of Large Language Models

Grokked Transformers: Implicit Reasoners on the Edge of Generalization

Grokking: A Deep Dive into Delayed Generalization in Neural Networks

TimesFM: A Foundation Model Revolutionizing Time-Series Forecasting

Back to the Future: xLSTM Revives the Power of Long Short-Term Memory for Large Language Models

Dora : Addressing Limitations in LoRA Fine-Tuning and Enhancing Model Performance

Revolutionizing Large Language Models with 1-Bit Transformers: BitLinear and BitNet b1.58

Expanding Sequence Handling: Ring Attention with Block-wise Transformers for Enhanced Contextual Modeling

社区洞察

其他会员也浏览了

Residuality and the Rejection of Volatility

Architecture Weekly #174 - 8th April 2024

Clean, Onion or Vertical Slice Architecture. Which one is better?

The secret of great technology architecture is . . . timing

Architecture Weekly #142 - 28st August 2023

eShopOnWeb Architecture (2/16) - uses Value Objects to model domain concepts without identity

Understanding Microkernel Architecture: A Foundation for Modular and Secure Systems

Architecture Weekly #116 - 27th February 2023

Residuality, Model Drift, and Philosophy

Transformers Architecture - Part 2: English Version