Xception — With Depthwise Separable Convolution

Xception — With Depthwise Separable Convolution

1 / What is Xception?

Another variant of the GoogLeNet architecture is also worth noting: Xception was proposed in 2016 by Fran?ois Chollet (the author of Keras) and it significantly outperformed Inception-v3 on a huge vision task (350 million images and 17,000 classes). Just like Inception-v4, it also merges the ideas of GoogLeNet and ResNet, but it replaces the inception modules with a special type of layer called a depthwise separable convolution (or separable convolution for short ). These layers had been used before in some CNN architectures (MobileNets), but they were not as central as in the Xception architecture.

2 / Depthwise Separable Convolution :

No alt text provided for this image

1. Regular Convolutions:

  • look at both channel & spatial correlations simultaneously

2. Depthwise separable convolution:

  • look at channel & spatial correlations independently in successive steps.
  • spatial convolution: 3x3 convolutions for each channel.
  • depthwise convolution: 1x1 convolutions on concatenated channels.

No alt text provided for this image

3. Modified Depthwise Separable Convolution in Xception :

No alt text provided for this image

The modified depthwise separable convolution is the?pointwise convolution followed by a depthwise convolution. This modification is motivated by the inception module in Inception-v3 that?1×1 convolution is done first before any n×n spatial convolutions. Thus, it is a bit different from the original one. (n=3?here since 3×3 spatial convolutions are used in Inception-v3.)

Separable convolutions use less parameters, less memory and less computations than regular convolutional layers, and in general they even perform better, so you should consider using them by default (except after layers with few channels).

4. Two minor differences:

The order of operations: As mentioned, the original depthwise separable convolutions as usually implemented perform first channel-wise spatial convolution and then perform 1×1 convolution whereas the modified depthwise separable convolution?perform 1×1 convolution first then channel-wise spatial convolution. This is claimed to be unimportant because when it is used in stacked setting, there are only small differences appeared at the beginning and at the end of all the chained inception modules.

The Presence/Absence of Non-Linearity: In the original Inception Module, there is non-linearity after first operation.?In Xception, the modified depthwise separable convolution,?there is NO intermediate ReLU non-linearity.


3 / What does it look like ?

No alt text provided for this image

Xception stands for “extreme inception”, it takes the principles of Inception to an extreme. In Inception, 1x1 convolutions were used to compress the original input, and from each of those input spaces we used different type of filters on each of the depth space. Xception just reverses this step. Instead, it first applies the filters on each of the depth map and then finally compresses the input space using 1X1 convolution by applying it across the depth. This method is almost identical to a depthwise-separable convolution, an operation that has been used in neural network design as early as 2014. There is one more difference between Inception and Xception. The presence or absence of a non-linearity after the first operation. In Inception model, both operations are followed by a ReLU non-linearity, however Xception doen't introduce any non-linearity.

The data first goes through the entry flow, then after than it. goes through the middle flow (repeating itself 8 times in this middle flow), and finally through the exit flow.

Xception implemented using the TensorFlow framework by Google and trained on 60 NVIDIA K80 GPUs each.

Table below shows that Xception outperforms every model in ImageNet dataset.

No alt text provided for this image

Validation accuracy is also higher for Xception than inception model shown below.

No alt text provided for this image

The graph below shows that having no non-linearity in between Xception performs better than having any kind on non-linearity.

No alt text provided for this image
xception original paper
MY GITHUB

要查看或添加评论,请登录

AYOUB KIROUANE的更多文章

社区洞察

其他会员也浏览了