登录查看更多内容

#14 Coding U-Net Architecture from Scratch

Riya Chhikara

Data Scientist at The Economist | Guest Teacher at LSE

发布日期: 2024年3月23日

Now that we have a good foundation on Image Segmentation, we will look into another model that is used for such tasks. U-Net, a convolutional neural network was proposed in 2015 in a paper by Olaf Ronneberger, Philipp Fischer, and Thomas Brox. It excels at achieving precise segmentation even with limited training data, a common challenge in medical imaging.

The paper builds on the structure proposed by Ciresan, Gambardella, Giusti and Schmidhuber (2012). The new U-Net model by Olaf, Brox and Fischer outperformed the existing model in the segmentation of neuronal structures in EM stacks.

The 2012 model trained a network in a sliding-window setup to predict the class label of each pixel by providing a local region (path) around that pixel. The advantage is that this network can be localised. Also, the training data in terms of patches is much larger than the number of training examples. Two drawbacks: Slow as the network must be run separately for each patch. There are overlapping patches so a lot of redundancy. Secondly, there is a trade-off between localization accuracy and the use of context. Large patches need more max-pooling layers that reduce the localization accuracy. Smaller patches allow the network to see only little content.

U- shaped

U-Net's name is a true representative of its U-shaped architecture. There is symmetry in the contracting path (encoder) and the expanding path (decoder). he encoder progressively captures image features while reducing its resolution. This is achieved through repeated applications of convolutional layers with increasing numbers of filters and max-pooling operations. Imagine a 128x128 image fed into the encoder. After passing through convolutional layers and a max-pooling operation, the image size might be reduced to 64x64, capturing essential features while discarding less important details.

Encoder

On the left of U is the Encoder. The input image size is 128 x 128. It is passed through two convolutional layers with 64 filters each. The subsequent images when pooled will be 64 x 64. This is because of the MaxPooling layer, with stride 2 and 2 x 2 window size. This reduces the image size by half.

They are passed through two layers of convolution with 128 filters. Then Maxpooled with 32 x 32.
This is passed through two convolution layers of 256 each. Then, pooled to 16 x16.
This is passed through two convolution layers of 512 filters each.
At the fifth level, they are pooled to 8 x 8.

The encoder has downsampled our original image (128 x128) to a size of (8 x 8).

Connecting Paths: The Bottleneck

A bottleneck layer acts as a bridge between the encoder and decoder. There is no pooling layer so dimensions remain the same. It maintains the spatial resolution obtained by the encoder while extracting even more features from the data.

领英推荐

Join Our Intel Edge Software Hub Webinar, Neural…

OpenCV 2 年前

BQP and materialsIN Partner to Demonstrate Benefits of…

BosonQ Psi (BQP) 6 个月前

??Top ML Papers of the Week

DAIR.AI 1 年前

Upsampling and Recovering Resolution

The decoder path takes over from the bottleneck. It upsamples the image at each level to increase the feature map dimensions. However, it's not the only operation at each level. The decoder also has skip connections. These connections are from the corresponding encoder level. The upsampled features are concatenated with the corresponding high-resolution features taken from the encoder at the same level. This merge allows the decoder to recover precise spatial details while maintaining the learned features.

Bottleneck: The 8 x 8 got upsampled to 16 x 16.
You use the encoder layer at the same level. They have the same height and width of 16 x 16. They also have the same number of filters: 512. You concatenate the filters from the encoder with the filters of the decoder.
You then pass this layer with 1024 filters into two convolutional layers. You upsample the blocks to 32 x 32. You concatenate the layers from the encoder on the same level.
Upsample to 64 x 64. Moved up.
Upsample to 128 x 128. Moved up.

Segmentation Output

The final step involves applying a 1x1 convolution to the upsampled features. The number of filters in this convolution layer corresponds to the number of classes you want to segment. For instance, if you want to classify each pixel as belonging to one of 10 different tissue types, you would use 10 filters. This final step produces a segmentation map, assigning a class label (e.g., a specific tissue type) to every pixel in the original image.

Advantages and Applications

U-Net offers several advantages:

Efficient Training: It requires fewer training samples compared to other approaches, making it ideal for situations with limited data. Data augmentation is essential to teach the network the desired invariance and robustness properties when only a few training samples are available.
Precise Segmentation: Skip connections allow the decoder to recover detailed spatial information. This leads to more accurate segmentation.
Well-Suited for Biomedical Imaging: U-Net addresses the challenge of limited training data frequently encountered in medical image analysis.

GitHub:

https://github.com/RiyaChhikara/100daysofComputerVision/blob/main/Day14_UNet.ipynb

Resources:

100 Days of Computer Vision

835 位关注者

要查看或添加评论，请登录

Riya Chhikara的更多文章

#57 Vintage Watch Finder: AI in Luxury Watch Shopping

2024年10月21日

#57 Vintage Watch Finder: AI in Luxury Watch Shopping

Got a cool idea ! We have Google Lens where you can upload images to search for the items. I want to build a…
#56 Connecting the app to AWS S3 bucket

2024年9月22日

#56 Connecting the app to AWS S3 bucket

Now that QualScan works well, and we have integrated Postgres tables into the workflow, we have one more thing left to…
#55: How to build a solid backend for a scalable app?

2024年9月22日

#55: How to build a solid backend for a scalable app?

Now that we have a functional app with a decent interface, we can focus on the backend database storage. I used…
#54: How to integrate alert system into a machine vision app ?

2024年9月20日

#54: How to integrate alert system into a machine vision app ?

This will be a tutorial with code snippets. So, if you are building/ planning to build your app in Python, and want to…
# 53 The app now tracks defects in real-time

2024年9月19日

# 53 The app now tracks defects in real-time

What do real time quality dashboards 'really look' like? I found some results on Google which seemed pretty…
#52: Looks better than yesterday

2024年9月18日

#52: Looks better than yesterday

Today, I made some functional changes. Looks better, and fixed the slider issue.
#51: And the winner for the final model is VGG16

2024年9月17日

#51: And the winner for the final model is VGG16

Quick Recap: Yesterday we created an app that took product images as inputs and predicted the % of defects in it. The…

2 条评论
#50: Machine Vision for checking defects

2024年9月16日

#50: Machine Vision for checking defects

BACK AT IT ! Well, today I read about machine vision used in manufacturing setups. We know that humans can inspect only…
#49: Product Design for Smarter iPhone Search

2024年6月22日

#49: Product Design for Smarter iPhone Search

In the previous article, I mentioned 5 main improvements to be made in the iPhone photo Search. Today, I design…
#48 Tech Review on iPhone's Image Search

2024年6月22日

#48 Tech Review on iPhone's Image Search

As a phone user, I found a pain point in accessing photos from my gallery. Today, I study all the features that Apple…

See all articles

#14 Coding U-Net Architecture from Scratch

Riya Chhikara

Data Scientist at The Economist | Guest Teacher at LSE

U- shaped

Encoder

Connecting Paths: The Bottleneck

领英推荐

Upsampling and Recovering Resolution

Segmentation Output

Advantages and Applications

GitHub:

Resources:

100 Days of Computer Vision

835 位关注者

Riya Chhikara的更多文章

社区洞察

其他会员也浏览了

FOD#71: Matryoshka against Transformers

A Beginner’s Guide to Computer Vision: History, Techniques, and Future

How to Use AI in Geo-technical Engineering?

Change in Computer Vision Technologies Begins!

Mimicking the Human Gaze: The Evolution of Self-Learned Object Detection

Researcher Spotlight: Dr. sc. ing. Rihards Novickis

Artificial Intelligence Uncovers Alternative Physics

Featured Research from Computers, Materials & Continua (Vol. 81, No. 1, 2024)

Revolutionizing Simulation

OpenAI and Los Alamos National Lab Announce Research Partnership

U- shaped

Encoder

Connecting Paths: The Bottleneck

领英推荐

Upsampling and Recovering Resolution

Segmentation Output

Advantages and Applications

GitHub:

Resources:

100 Days of Computer Vision

835 位关注者

Riya Chhikara的更多文章

#57 Vintage Watch Finder: AI in Luxury Watch Shopping

#56 Connecting the app to AWS S3 bucket

#55: How to build a solid backend for a scalable app?

#54: How to integrate alert system into a machine vision app ?

# 53 The app now tracks defects in real-time

#52: Looks better than yesterday

#51: And the winner for the final model is VGG16

#50: Machine Vision for checking defects

#49: Product Design for Smarter iPhone Search

#48 Tech Review on iPhone's Image Search

社区洞察

其他会员也浏览了

FOD#71: Matryoshka against Transformers

A Beginner’s Guide to Computer Vision: History, Techniques, and Future

How to Use AI in Geo-technical Engineering?

Change in Computer Vision Technologies Begins!

Mimicking the Human Gaze: The Evolution of Self-Learned Object Detection

Researcher Spotlight: Dr. sc. ing. Rihards Novickis

Artificial Intelligence Uncovers Alternative Physics

Featured Research from Computers, Materials & Continua (Vol. 81, No. 1, 2024)

Revolutionizing Simulation

OpenAI and Los Alamos National Lab Announce Research Partnership