Crash Course: GoogLeNet (2014)
GoogLeNet (2014) aka Inception v1

Crash Course: GoogLeNet (2014)

In this first post out of a series of posts, I will highlight the key features and innovations of GoogLeNet, which was presented in the paper Going Deeper with Convolutions, in 2014 by researchers connected with Google. Link to paper

GoogLeNet had several key innovations with respect to VGG network by Zisserman. The key innovation that GoogLeNet introduced was the idea of the inception module, which utilised sparse connections to enable deeper networks, which are achieved with smaller sized convolutions.

There are no fully-connected (FC) layers in GoogLeNet, whereas the VGG model has a densely connected structure with three FC layers at the end of the network, the first of which alone accounts for 100M parameters!

Each inception module can extract input feature at different scales (1x1, 3x3, and 5x5) in parallel, followed by concatenation to cluster these features, whereas VGG only use 3x3 filters.

Finally, the idea of stacking blocks instead of individual layers has inspired future networks.

How does it reduces the number of parameters in a block?

GoogLeNet achieves a reduction in total parameters (4 million vs 138 million in VGG) due to the way it implements sparse connections, and dimension reduction before applying the larger filters.

A 1x1 convolution filter called ’bottleneck’ is used to ’compress’ the input channel depth before the 3x3 and 5x5 convolutions, and thereby keeping the same number of spatial locations with less features.

This saving allows reduces both the number of parameters and computation time, while allowing for wider and deeper networks.

This mechanism helps improve accuracy! GoogLeNet captures details from three different scales, as described previously. This allows it to detect smaller as well are larger details in an image much better than VGG, thereby increasing generalizability of the model.

While there are no FC layers here, global average pooling is able to effectively average out values in the feature map without reducing the accuracy due to compression.

For further details, you can see the original paper, or this medium post which goes further into the mathematical details.

Please like, share among your network, so that I'll be motivated to continue writing blog posts like this.


要查看或添加评论,请登录

社区洞察

其他会员也浏览了