登录查看更多内容

Yann LeCun: "Energy-Based Self-Supervised Learning"

Blaine Bateman, EAF

Chief Data Scientist at EAF LLC

发布日期: 2020年5月13日

Yann LeCun gave a talk at the Institute for Pure and Applied Mathematics in December 2019 entitled “Energy-Based Self-Supervised Learning”, which was a preview of his talk he gave at the recent ICLR 2020. The 2019 talk is on YouTube (https://www.youtube.com/watch?v=A7AnCvYDQrU&t=45s) and I watched it and produced the following.

A key point is that machine learning sample efficiency is much much lower than humans or even animals in a learning context. Supervised learning is pretty inefficient, often requiring huge amounts of data and massive power consumption to train state-of-the-art models. Reinforcement learning is even worse! Yann gave the examples that it took 20 million self-play games of GO, requiring 2000 GPUs for 14 days, to learn the game at a human level. (See this recent post from me: https://www.dhirubhai.net/posts/blainebateman_ai21-labs-asks-how-much-does-it-cost-to-activity-6662131065588633600-eHj_)

To master Atari games it takes 83 hours of machine self-play to achieve the performance most humans do in 15 minutes. So a question is how do humans and animals learn so quickly?

Yann said a key part of this is learning by observation. Children can learn world models with very little input; for instance babies learn face tracking, and toddlers learn “intuitive physics” by about 9 months. Yann said that by that time, a child learns to predict what should happen—things fall if you drop them, etc. He then said “Prediction is the essence of intelligence, in my opinion.” With that was the segue into Self Supervised Learning.

He described self-supervised learning as learning to “predict any part of the input from any other part”; examples are the future from the past, as well as the masked from the visible. He then talked about the Google Brain introduction of the Transformer model, which uses 100s of millions of parameters and is trained on billions of words. As an example of what such a model can do, you can remove a random 10% of a text and the model can predict the missing words. An interesting aspect of this is that it produces a probability vector over ALL words. Yann pointed out that this strategy works well on NLP problems but more poorly on images or other problems.

Yann introduced the energy model paradigm for self-supervised learning. By way of introduction, some key quotes from him: “You DO NOT want to learn distributions! They are bad for you”. And: “Maximum likelihood sucks!” As well as “Applying probability theory blindly actually is bad for you.” He gave intuition for energy models as say you have a thing you want to predict. If the function you are generating which is the prediction is a “good” representation then you design it such that energy is low. If it is a “bad” representation, energy is high. With reference to the image, if the dotted path is what you are trying to predict, you want energy low along that path and high everywhere else. The analogy is you find a function that pushes energy “down” along the path, and pushes “up” everywhere else. This analogy nicely generalizes to high dimensional spaces which are what is of interest in most real problems.

Credit: Yann LeCun, YouTube, Dec. 2019

The goal here is to learn things without labeling. Yann quoted Jitendra Malik: "Labels are the opium of the machine learning researcher". A good portion of the remainder of the talk was describing how you go about training such a model, and I won’t review that as a lot of the math is over my head. But consider today where an entire industry has sprung up to label data for people trying to train models. If you are in that industry, the promise of energy based models trained via self-supervision is clearly disruptive. It is rare you get such a clear advanced warning of an industry disruption!

Debayan Das

Interim CDO@MiiHealth | AI for Medicine 3.0

4 年

I am a strong supporter of Professor Lecun's arguments in favor of EBMs. I would like to ask your opinion on promising convergence/inspiration from Adiabatic Optimization to an breakthrough in EBMs. When I learnt about Adiabatic Optimization and Quantum Annealing, I immediately had the feeling that this is the right direction to take up research efforts in.

查看更多评论

要查看或添加评论，请登录

Blaine Bateman, EAF的更多文章

Regulation Looming for AI Industry

2019年4月15日

Regulation Looming for AI Industry

A recent article in the Insurance Journal asked the question "How Real Are ‘Ethical Artificial Intelligence’ Efforts by…

4 条评论
Entropy

2018年3月29日

Entropy

He huddled, shivering in the blackness. Damn the cold! He tried concentrating to stop the involuntary shaking, but his…
Extending churn analysis to revenue forecasting using R

2018年3月28日

Extending churn analysis to revenue forecasting using R

Note: reposted due to errors on LinkedIn in graphics etc. In this article we will review application of clustering to…

6 条评论
Weighted Linear Regression in R

2018年3月20日

Weighted Linear Regression in R

If you are like me, back in engineering school you learned linear regression as a way to “fit a line to data” and…

5 条评论
Limits of linear models for forecasting

2017年10月20日

Limits of linear models for forecasting

Blaine Bateman, President, EAF LLC, October 20, 2017 In this post, I will demonstrate the use of nonlinear models for…

8 条评论
Parameter tuning in neural networks for regression

2017年8月5日

Parameter tuning in neural networks for regression

Most of what you see in the news and in other media regarding neural networks is (a) labeled as "Artificial…
Scare Tactics

2016年11月6日

Scare Tactics

It is interesting to consider marketing and sales hype around various topics through the lens of Michael Porter's 5…
On being acquired

2016年4月23日

On being acquired

Although the economy may yet be in a fragile state, there has been a lot of M&A (mergers and acquisitions) this year…
Another layer in the infinity bureaucracy?

2016年3月3日

Another layer in the infinity bureaucracy?

I read this article (https://www.reuters.
Trends in Electronics Distribution Revenues

2016年1月4日

Trends in Electronics Distribution Revenues

Over the years, a number of trade publications have published annual summaries of top electronics distributors. The…

4 条评论

See all articles

Yann LeCun: "Energy-Based Self-Supervised Learning"

Blaine Bateman, EAF

Chief Data Scientist at EAF LLC

Blaine Bateman, EAF的更多文章

社区洞察

其他会员也浏览了

Types of Machine Learning - Mustafa Mahmud HussAIn

Australian Framework for Generative Artificial Intelligence in Schools

Ch 12:Reinforcement learning Complete Guide

Just Used Machine Learning To Be a Better Learner

Mastering the Art of Learning: A Comprehensive Guide to Reinforcement Learning

Type of ML and Application

A Primer to learn Machine Learning

DNDR: End-to-End Learning with Different Functionality Discovered by Gradient Descent

What is Reinforcement Learning?

What is Machine Learning? Explained for Kids

Blaine Bateman, EAF的更多文章

Regulation Looming for AI Industry

Entropy

Extending churn analysis to revenue forecasting using R

Weighted Linear Regression in R

Limits of linear models for forecasting

Parameter tuning in neural networks for regression

Scare Tactics

On being acquired

Another layer in the infinity bureaucracy?

Trends in Electronics Distribution Revenues

社区洞察

其他会员也浏览了

Types of Machine Learning - Mustafa Mahmud HussAIn

Australian Framework for Generative Artificial Intelligence in Schools

Ch 12:Reinforcement learning Complete Guide

Just Used Machine Learning To Be a Better Learner

Mastering the Art of Learning: A Comprehensive Guide to Reinforcement Learning

Type of ML and Application

A Primer to learn Machine Learning

DNDR: End-to-End Learning with Different Functionality Discovered by Gradient Descent

What is Reinforcement Learning?

What is Machine Learning? Explained for Kids