登录查看更多内容

Crash Course: GoogLeNet (2014)

Shikhar Mishra

Assistant Director, Data Science | Storyteller and problem solver using data | Fellow of Royal Statistical Society

发布日期: 2023年11月29日

In this first post out of a series of posts, I will highlight the key features and innovations of GoogLeNet, which was presented in the paper Going Deeper with Convolutions, in 2014 by researchers connected with Google. Link to paper

GoogLeNet had several key innovations with respect to VGG network by Zisserman. The key innovation that GoogLeNet introduced was the idea of the inception module, which utilised sparse connections to enable deeper networks, which are achieved with smaller sized convolutions.

There are no fully-connected (FC) layers in GoogLeNet, whereas the VGG model has a densely connected structure with three FC layers at the end of the network, the first of which alone accounts for 100M parameters!

Each inception module can extract input feature at different scales (1x1, 3x3, and 5x5) in parallel, followed by concatenation to cluster these features, whereas VGG only use 3x3 filters.

Finally, the idea of stacking blocks instead of individual layers has inspired future networks.

How does it reduces the number of parameters in a block?

GoogLeNet achieves a reduction in total parameters (4 million vs 138 million in VGG) due to the way it implements sparse connections, and dimension reduction before applying the larger filters.

领英推荐

FHSS, DSSS, OFDM, OFDMA, MIMO, MU-MIMO, and TWT

Eshwar kumar 6 个月前

A Problem Larger Than the Universe

Meinolf Sellmann 8 个月前

Poatek newsletter

TELUS Digital Brazil 1 年前

A 1x1 convolution filter called ’bottleneck’ is used to ’compress’ the input channel depth before the 3x3 and 5x5 convolutions, and thereby keeping the same number of spatial locations with less features.

This saving allows reduces both the number of parameters and computation time, while allowing for wider and deeper networks.

This mechanism helps improve accuracy! GoogLeNet captures details from three different scales, as described previously. This allows it to detect smaller as well are larger details in an image much better than VGG, thereby increasing generalizability of the model.

While there are no FC layers here, global average pooling is able to effectively average out values in the feature map without reducing the accuracy due to compression.

For further details, you can see the original paper, or this medium post which goes further into the mathematical details.

Please like, share among your network, so that I'll be motivated to continue writing blog posts like this.

Crash Course: GoogLeNet (2014)

Shikhar Mishra

Assistant Director, Data Science | Storyteller and problem solver using data | Fellow of Royal Statistical Society

领英推荐

社区洞察

其他会员也浏览了

Precision and Grace

477: One Thousand New Instructions with Kwabena Agyeman

Understanding Grounding Dino's Thresholds: A Deeper Dive

Kaggle ICR: results

What Are Happy Numbers And How To Find Them

Welcome to our Newsletter

For The Love of Computing - it's Blooming Great!

?? Comparison R-CNN, Fast R-CNN & Faster R-CNN (Object Detection)

Tasty Generative Papers | December 2024