登录查看更多内容

The Next AI Revolution: Self-Supervised Learning

Ibrahim Sobh - PhD

?? Senior Expert of Artificial Intelligence, Valeo Group | LinkedIn Top Voice | Machine Learning | Deep Learning | Data Science | Computer Vision | NLP | Developer | Researcher | Lecturer

发布日期: 2020年3月7日

+ 关注

Babies learn how the world works by observation, with remarkably little interaction.

"Self-supervised learning is the cake, supervised learning is the icing on the cake, reinforcement learning is the cherry on the cake" Yann LeCun

Machine Learning

1- Supervised Learning: learning using data with fine-grained human-annotated labels for training.

However, data collection and annotation usually are expensive in terms of time and cost. Additionally, in some domains, the annotation process requires special skills. Accordingly, semi-supervised, weakly supervised, and self supervised learning methods are proposed to reduce that cost.

2- Semi-supervised Learning: learning using a small amount of labeled data in addition to a large amount of (easy to have) unlabeled data.

3- Weakly-supervised Learning: learning with coarse-grained (noisy or not accurate) labels, can be collected at much lower coast.

4- Unsupervised Learning: learning without using any annotations.

5- Self-supervised Learning: a subset of unsupervised learning methods, refers to learning methods, where the models are explicitly trained with automatically generated labels.

Self-supervised learning empowers us to exploit a variety of labels that come with the data for free.

Transfer Learning

In deep learning, training a neural network starting from random weights is not an easy task. It is more practical to start neural network training with a pre-trained model on a source task, and then fine tune it towards the target task. Accordingly, 1000x less data can be used (for fine tuning) compared to starting from scratch.

Generally speaking, using a few early layers from a pre-trained generic model (ImageNet for example) can improve the speed of training, and accuracy of the model. However, if the target task is not similar to the source task, the improvement is not that good.

One solution could be Self-Supervised Learning, where a model is trained using labels that are part of the data itself without external (costly) labels.

Self-supervised learning is used widely in natural language processing (NLP), and it is used much less in computer vision.

Self-supervised learning in computer vision

In self-supervised learning the task used for pre-training is known as the “pretext task”. The tasks that used for fine tuning are known as the “downstream tasks”.?

Usually, we don’t care much about the performance of the invented (source) task used for pre-training. Rather we care about the learned intermediate representation and hope that this representation can be beneficial to a variety of practical downstream (target) tasks. This is similar to how auxiliary tasks are treated.

Lightning AI 1 年前

Transfer Learning: Reusing Knowledge to Solve New…

Noorain Fathima 1 个月前

Flash Attention: Accelerating Deep Learning with…

AYOUB KIROUANE 1 年前

For example, images can be rotated at random and a model is trained to predict how input image is rotated (pretext task). This required no annotations. Accordingly, the learned intermediate representations are expected to be beneficial for the downstream tasks.

Choosing a pretext task

The task should to be something that, if solved, would require an understanding of the data which would also be needed to solve the downstream task. Moreover, the pretext task is something that a human can do based on the required understanding. For example, a pretext task that generates a future frame (next frame or next few frames) of a video is possible, however generates a very far future frame is not. Generally, there is no need to spend too much time creating the perfect pretext model. Moreover, could learning multiple tasks at once (multi-task learning) is also possible.

Examples:

Many ideas have been proposed for self-supervised representation learning on images. A common workflow is to train a model on one or multiple pretext tasks with unlabelled images and then use one intermediate feature layer of this model to feed a multinomial logistic regression classifier on ImageNet classification. The final classification accuracy quantifies how good the learned representation is.

1- Rotation of an image (Gidaris et al. 2018 ) is a cheap way to modify an input image while the semantic content stays unchanged. Each input image is first rotated by a multiple of at random. Then, the model is trained to predict which rotation has been applied, thus a 4-class classification problem.

2- The denoising autoencoder (Vincent, et al, 2008 ) learns to recover an image from a version that is partially corrupted or has random noise. "As we increase the noise level, denoising training forces the filters to differentiate more, and capture more distinctive features. Higher noise levels tend to induce less local filters, as expected. One can distinguish different kinds of filters, from local blob detectors, to stroke detectors, and some full character detectors at the higher noise levels."

3- Context Encoders: Feature Learning by Inpainting (Pathak, et al., 2016 ), where the network is trained to fill in a missing piece in the image. In this work, unsupervised visual feature learning algorithm is presented, where Context Encoders are a convolutional neural network trained to generate the contents of an arbitrary image region conditioned on its surroundings. In order to succeed at this task, context encoders need to both understand the content of the entire image, as well as produce a plausible hypothesis for the missing part(s).

4- Predicting the relative position or random patches from one image (Doersch et al. 2015 ) . As mentioned in the paper: "Given only a large, unlabeled image collection, we extract random pairs of patches from each image and train a convolutional neural net to predict the position of the second patch relative to the first. We argue that doing well on this task requires the model to learn to recognize objects and their parts."

5- Validate frame order (Misra, et al 2016 ). The pretext task is to determine whether a sequence of frames from a video is placed in the correct temporal order. As mentioned in the paper: "A video imposes a natural temporal structure for visual data. In many cases, one can easily verify whether frames are in the correct temporal order (shuffled or not). Such a simple sequential verification task captures important spatiotemporal signals in videos. We use this task for unsupervised pre-training of a Convolutional Neural Network (CNN)."

References and further readings:

Awesome Self-Supervised Learning

Self-supervised learning and computer vision, by Jeremy Howard

Self-Supervised Representation Learning, by Lilian Weng

Self-supervised Visual Feature Learning with Deep Neural Networks: A Survey

Course SSL

Best Regards

Ahmed Ibrahim

Product Management Leader | Technical Strategist | Innovator | Entrepreneur

4 年

Great work sir

2 次回应

查看更多评论

要查看或添加评论，请登录

Ibrahim Sobh - PhD的更多文章

How to Learn Artificial Intelligence: A Beginner’s Guide

2024年5月31日

How to Learn Artificial Intelligence: A Beginner’s Guide

Artificial Intelligence (AI) is a fascinating field that simulates human intelligence and task performance using…
[????????????] ?????????????????? ???????????? explained with code ??

2023年1月28日

[????????????] ?????????????????? ???????????? explained with code ??

"During the last two years there has been a plethora of large generative models such as ChatGPT or Stable Diffusion…

2 条评论
A conversation with ChatGPT about AI, study roadmap, applications, interview questions with answers, salaries, and more!

2023年1月21日

A conversation with ChatGPT about AI, study roadmap, applications, interview questions with answers, salaries, and more!

Hello everyone, and thank you all for being here today! Let me introduce our new star, the ChatGPT, who will discuss…
10 Object detectors with code [YOLOF, YOLOX, DETR, Deformable DETR, SparseR-CNN, VarifocalNet, PAA, SABL, ATSS, Double Heads]

2022年2月17日

10 Object detectors with code [YOLOF, YOLOX, DETR, Deformable DETR, SparseR-CNN, VarifocalNet, PAA, SABL, ATSS, Double Heads]

In this article, 10 well-known pre-trained object detectors are loaded and used in a standard and easy way. YOLOF: You…

6 条评论
FNet: Do we need the attention layer at all? [Explained with code]

2021年10月30日

FNet: Do we need the attention layer at all? [Explained with code]

FNet: Mixing Tokens with Fourier Transforms "In this work, we investigate whether simpler token mixing mechanisms can…
Patches Are All You Need! [with code]

2021年10月28日

Patches Are All You Need! [with code]

"It is only a matter of time before Transformers become the dominant architecture for vision domains, just as they have…
MLP is all you need! [with code]

2021年10月23日

MLP is all you need! [with code]

From Google: MLP-Mixer: An all-MLP Architecture for Vision Main idea: "While convolutions and attention are both…

2 条评论
9 Steps for solving any machine learning problem

2021年8月28日

9 Steps for solving any machine learning problem

In this article, we will present a universal blueprint that we can use to attack and solve any machine-learning…

2 条评论
Anatomy of the Beast with many heads! [with code]

2021年6月12日

Anatomy of the Beast with many heads! [with code]

1) Introduction: In previous articles, we discussed the Transfomers, where Learning Representations of Variable Length…

2 条评论
The magic of XLM-R: Unsupervised Cross-lingual Representation Learning at Scale

2021年1月16日

The magic of XLM-R: Unsupervised Cross-lingual Representation Learning at Scale

The goal of this paper, by Facebook AI, is to improve cross-lingual language understanding (XLU). Previously, we…

See all articles

The Next AI Revolution: Self-Supervised Learning

Ibrahim Sobh - PhD

?? Senior Expert of Artificial Intelligence, Valeo Group | LinkedIn Top Voice | Machine Learning | Deep Learning | Data Science | Computer Vision | NLP | Developer | Researcher | Lecturer

Machine Learning

Transfer Learning

Self-supervised learning in computer vision

领英推荐

Choosing a pretext task

Examples:

References and further readings:

Ibrahim Sobh - PhD的更多文章

社区洞察

其他会员也浏览了

"Smart Learning Paths: Navigating Education Through AI Adaptability"

Unravelling the Learning Dynamics of Generative Models

Echoes in the AI Labyrinth: Unsupervised Learning via Contrastive Methods

"Your Expertise Is No Longer Needed" - Sincerely, DEEP Learning.

Transformers, Self-Attention, and the Rise of Self-Supervised Learning: Unlocking the Potential of Versatile AI Models

AI Terms for Educators: NLPs, LLMs, GANs, Generative AI, and Beyond

Free Generative AI courses launched by Google in 2023

Demystifying Machine Learning: A Beginner's Guide

Exploring the Power of Machine Learning Real-world Applications and Advancements

The Power of Pre-training

Machine Learning

Transfer Learning

Self-supervised learning in computer vision

领英推荐

Choosing a pretext task

Examples:

References and further readings:

Ibrahim Sobh - PhD的更多文章

How to Learn Artificial Intelligence: A Beginner’s Guide

[????????????] ?????????????????? ???????????? explained with code ??

A conversation with ChatGPT about AI, study roadmap, applications, interview questions with answers, salaries, and more!

10 Object detectors with code [YOLOF, YOLOX, DETR, Deformable DETR, SparseR-CNN, VarifocalNet, PAA, SABL, ATSS, Double Heads]

FNet: Do we need the attention layer at all? [Explained with code]

Patches Are All You Need! [with code]

MLP is all you need! [with code]

9 Steps for solving any machine learning problem

Anatomy of the Beast with many heads! [with code]

The magic of XLM-R: Unsupervised Cross-lingual Representation Learning at Scale

社区洞察

其他会员也浏览了

"Smart Learning Paths: Navigating Education Through AI Adaptability"

Unravelling the Learning Dynamics of Generative Models

Echoes in the AI Labyrinth: Unsupervised Learning via Contrastive Methods

"Your Expertise Is No Longer Needed" - Sincerely, DEEP Learning.

Transformers, Self-Attention, and the Rise of Self-Supervised Learning: Unlocking the Potential of Versatile AI Models

AI Terms for Educators: NLPs, LLMs, GANs, Generative AI, and Beyond

Free Generative AI courses launched by Google in 2023

Demystifying Machine Learning: A Beginner's Guide

Exploring the Power of Machine Learning Real-world Applications and Advancements

The Power of Pre-training