Yann LeCun: "Energy-Based Self-Supervised Learning"?
Credit: Yann LeCun, YouTube, December 2019

Yann LeCun: "Energy-Based Self-Supervised Learning"

Yann LeCun gave a talk at the Institute for Pure and Applied Mathematics in December 2019 entitled “Energy-Based Self-Supervised Learning”, which was a preview of his talk he gave at the recent ICLR 2020. The 2019 talk is on YouTube (https://www.youtube.com/watch?v=A7AnCvYDQrU&t=45s) and I watched it and produced the following.

A key point is that machine learning sample efficiency is much much lower than humans or even animals in a learning context. Supervised learning is pretty inefficient, often requiring huge amounts of data and massive power consumption to train state-of-the-art models. Reinforcement learning is even worse! Yann gave the examples that it took 20 million self-play games of GO, requiring 2000 GPUs for 14 days, to learn the game at a human level. (See this recent post from me: https://www.dhirubhai.net/posts/blainebateman_ai21-labs-asks-how-much-does-it-cost-to-activity-6662131065588633600-eHj_)

To master Atari games it takes 83 hours of machine self-play to achieve the performance most humans do in 15 minutes. So a question is how do humans and animals learn so quickly?

Yann said a key part of this is learning by observation. Children can learn world models with very little input; for instance babies learn face tracking, and toddlers learn “intuitive physics” by about 9 months. Yann said that by that time, a child learns to predict what should happen—things fall if you drop them, etc. He then said “Prediction is the essence of intelligence, in my opinion.” With that was the segue into Self Supervised Learning.

He described self-supervised learning as learning to “predict any part of the input from any other part”; examples are the future from the past, as well as the masked from the visible. He then talked about the Google Brain introduction of the Transformer model, which uses 100s of millions of parameters and is trained on billions of words. As an example of what such a model can do, you can remove a random 10% of a text and the model can predict the missing words. An interesting aspect of this is that it produces a probability vector over ALL words. Yann pointed out that this strategy works well on NLP problems but more poorly on images or other problems.

Yann introduced the energy model paradigm for self-supervised learning. By way of introduction, some key quotes from him: “You DO NOT want to learn distributions! They are bad for you”. And: “Maximum likelihood sucks!” As well as “Applying probability theory blindly actually is bad for you.” He gave intuition for energy models as say you have a thing you want to predict. If the function you are generating which is the prediction is a “good” representation then you design it such that energy is low. If it is a “bad” representation, energy is high. With reference to the image, if the dotted path is what you are trying to predict, you want energy low along that path and high everywhere else. The analogy is you find a function that pushes energy “down” along the path, and pushes “up” everywhere else. This analogy nicely generalizes to high dimensional spaces which are what is of interest in most real problems.

No alt text provided for this image

Credit: Yann LeCun, YouTube, Dec. 2019

The goal here is to learn things without labeling. Yann quoted Jitendra Malik: "Labels are the opium of the machine learning researcher". A good portion of the remainder of the talk was describing how you go about training such a model, and I won’t review that as a lot of the math is over my head. But consider today where an entire industry has sprung up to label data for people trying to train models. If you are in that industry, the promise of energy based models trained via self-supervision is clearly disruptive. It is rare you get such a clear advanced warning of an industry disruption!


Debayan Das

Interim CDO@MiiHealth | AI for Medicine 3.0

4 年

I am a strong supporter of Professor Lecun's arguments in favor of EBMs. I would like to ask your opinion on promising convergence/inspiration from Adiabatic Optimization to an breakthrough in EBMs. When I learnt about Adiabatic Optimization and Quantum Annealing, I immediately had the feeling that this is the right direction to take up research efforts in.

回复

要查看或添加评论,请登录

Blaine Bateman, EAF的更多文章

  • Regulation Looming for AI Industry

    Regulation Looming for AI Industry

    A recent article in the Insurance Journal asked the question "How Real Are ‘Ethical Artificial Intelligence’ Efforts by…

    4 条评论
  • Entropy

    Entropy

    He huddled, shivering in the blackness. Damn the cold! He tried concentrating to stop the involuntary shaking, but his…

  • Extending churn analysis to revenue forecasting using R

    Extending churn analysis to revenue forecasting using R

    Note: reposted due to errors on LinkedIn in graphics etc. In this article we will review application of clustering to…

    6 条评论
  • Weighted Linear Regression in R

    Weighted Linear Regression in R

    If you are like me, back in engineering school you learned linear regression as a way to “fit a line to data” and…

    5 条评论
  • Limits of linear models for forecasting

    Limits of linear models for forecasting

    Blaine Bateman, President, EAF LLC, October 20, 2017 In this post, I will demonstrate the use of nonlinear models for…

    8 条评论
  • Parameter tuning in neural networks for regression

    Parameter tuning in neural networks for regression

    Most of what you see in the news and in other media regarding neural networks is (a) labeled as "Artificial…

  • Scare Tactics

    Scare Tactics

    It is interesting to consider marketing and sales hype around various topics through the lens of Michael Porter's 5…

  • On being acquired

    On being acquired

    Although the economy may yet be in a fragile state, there has been a lot of M&A (mergers and acquisitions) this year…

  • Another layer in the infinity bureaucracy?

    Another layer in the infinity bureaucracy?

    I read this article (https://www.reuters.

  • Trends in Electronics Distribution Revenues

    Trends in Electronics Distribution Revenues

    Over the years, a number of trade publications have published annual summaries of top electronics distributors. The…

    4 条评论

社区洞察

其他会员也浏览了