登录查看更多内容

Generating Training Datasets Using Energy Based Models that Actually Scale

Jesus Rodriguez

CEO of IntoTheBlock, Co-Founder, Co-Founder of LayerLens, Faktory,and NeuralFabric, Founder of The Sequence AI Newsletter, Guest Lecturer at Columbia, Guest Lecturer at Wharton Business School, Investor, Author.

发布日期: 2019年4月1日

Energy-Based Models(EBM) is one of the most promising areas of deep learning that hasn’t seen a tremendous level of adoption yet. Conceptually, EBMs are a form of generative modeling that learns the key characteristics of a target dataset and tries to generate similar datasets. While EBMs results appealing because of its simplicity they have experienced many challenges when applied in real world applications. Recently, AI-powerhouse OpenAI published a new research paper that explores a new technique to create EBM model that can scale across complex deep learning topologies.

EBMs are typically used in one of the most complex problems of real world deep learning solutions: generating quality training datasets. May of the state-of-the-art deep learning techniques relied on large volumes of training data which is unpractical to maintain at scale. EBMs have the ability of observing the key mathematical elements of a training datasets are generate new datasets that follow a similar distribution. EBMs are not the only discipline in this area of generative modeling. Techniques such as Variational Autoencoders(VAEs) or Generative Adversarial Neural Networks(GANs) are also used to address the challenge of dataset generation but, given its simplicity, EBMs present tangible advantages over alternatives. Unfortunately, EBMs have been really hard to scale when applied in practice. To understand why, we can probably start by dissecting some of the key characteristics of EBMs.

Understanding Energy-Based Learning

From some perspectives, of the main goals of machine learning is to capture dependencies between variables. By capturing those dependencies, a model can be used to answer questions about the values of unknown variables given the values of known variables. EBMs capture dependencies by associating a scalar energy (a measure of compatibility) to each configuration of the variables. In that scheme, inference consists on in setting the value of observed variables and finding values of the remaining variables that minimize the energy. Similarly, learning can be achieved by finding an energy function that associates low energies to correct values of the remaining variables, and higher energies to incorrect values.

EBMs provides a unified framework for many probabilistic and non-probabilistic approaches to learning, particularly for non-probabilistic training of graphical models and other structured models. Because there is no requirement for proper normalization, energy-based approaches avoid the problems associated with estimating the normalization constant in probabilistic models. Furthermore, the absence of the normalization condition allows for much more flexibility in the design of learning machines.

The capabilities of EBMs makes it an ideal candidate for different deep learning areas such as natural language processing, robotics or computer vision. However, one of the well-known limitations of EBMs is that they rely on gradient-descent optimization methods that are typically hard to scale in high dimensional datasets.

Scalable Energy Based Models

To mitigate the limitations of traditional EBMs related to the dependency on gradient-descent methods, OpenAI decided to leverage a technique known as Langevin Dynamics as its main optimization method. Named after French physicist Paul Langevin, this optimization technique that draws inspiration from molecular system models. Like stochastic gradient descent, Langevin Dynamics is an iterative optimization algorithm which introduces additional noise to the stochastic gradient estimator to optimize an objective function. The main advantage that Langevin Dynamics offer over traditional optimization methods is that it can be used for Bayesian learning scenarios as the method produces samples from a posterior distribution of parameters based on available data.

OpenAI leveraged Langevin Dynamics to perform noisy gradient descent on the energy function to arrive at low-energy configurations. Unlike GANs, VAEs, and Flow-based models, this approach does not require an explicit neural network to generate samples — samples are generated implicitly. OpenAI combines Langevin Dynamics with a replay buffer of past images that are used to initialize the optimization module.

The idea of combining EBMs and Langevin Dynamics effectively introduces an iterative refinement in EBMs that enables the generation of higher quality datasets. This approach brings some very tangible benefits compared to traditional EBM approaches:

1) Simplicity and Stability: An EBM is the only object that needs to be trained and designed in the model. Unlike VAEs or GANs, there is no need to tune training processes for separate networks to make sure they are balanced.

2) Adaptive Computation Time: The EBM model allows to run sequential refinement for long amount of time to generate sharp, diverse samples or a short amount of time for coarse less diverse samples.

3) Flexibility of Generation: In both VAEs and Flow based models, the generator must learn a map from a continuous space to a possibly disconnected space containing different data modes, which requires large capacity and may not be possible to learn. In EBMs, by contrast, can easily learn to assign low energies at disjoint regions.

4) Adaptive Generation: While the final objective of training an EBM looks similar to that of GANs, the generator is implicitly defined by the probability distribution, and automatically adapts as the distribution changes. As a result, the generator does not need to be trained, allowing EBMs to applied to domains where it is difficult to train the generator of a GAN as well as ameliorating mode collapse.

5) Compositionality: Since each model represents an unnormalized probability distribution, models can be naturally combined through product of experts or other hierarchical models.

OpenAI evaluated their EBM architecture using well-know datasets such as CIFAR-10 and ImageNet 32x32. The EBM model was able to generate high-quality images in a relatively short period of time. What is even more impressive, the EBM model show the ability to combine features learned from one type of image in the generation process of other types of images. The following figure illustrates how the EBM model can auto-complete images and morph images from one class (such as truck) to another (such as frog).

One of the most impressive achievements of OpenAI EBM models was the ability to generalize when evaluated against out-of-distribution datasets. In the initial tests, the EBM method was able to outperform other likelihood models such as Flow based and autoregressive models. OpenAI also tested classification using conditional energy-based models, and found that the resultant classification exhibited good generalization to adversarial perturbations. Our model — despite never being trained for classification — performed classification better than models explicitly trained against adversarial perturbations. The following figure shows the results of the generalization experiments.

EBMs are still considered a nascent area in the deep learning ecosystem. The OpenAI optimizations showed that EBMs are perfectly able to scale across high-dimensional datasets. The work also demonstrated that implicit generation procedures combined with energy-based models allow for compositionality and flexible denoising and inpainting. Together with the research paper, OpenAI open sourced an initial implementation of its EBM model as well as the corresponding datasets. This type of work is likely to inspire other researchers to consider EBM techniques as an important method for generating effective training datasets at a fraction of the current cost.

要查看或添加评论，请登录

Jesus Rodriguez的更多文章

Robust Agents Are All We Need: Faktory Emerges from Stealth Mode with a Private?Alpha

2024年2月28日

Robust Agents Are All We Need: Faktory Emerges from Stealth Mode with a Private?Alpha

Last year, I had the unique opportunity to incubate a new project in the autonomous agents space, alongside a…

1 条评论
Google’s BLEURT is BERT for Evaluating Natural Language Generation Models

2020年5月27日

Google’s BLEURT is BERT for Evaluating Natural Language Generation Models

Natural language generation(NLG) is one of the fastest growing areas of research in deep learning. NLG applications are…
Two Deep Learning Frameworks and an AI Super-Computer: Microsoft Launches New Efforts to Achieve Large-Scale AI

2020年5月25日

Two Deep Learning Frameworks and an AI Super-Computer: Microsoft Launches New Efforts to Achieve Large-Scale AI

Training models with massive datasets is becoming the norm in modern deep learning applications. Some of the latest…
Uber Open Sources a New Framework for Designing Optimal Statistical Experiments

2020年5月18日

Uber Open Sources a New Framework for Designing Optimal Statistical Experiments

Rapid experimentation is a key element of modern software development. The raise in popularity of machine learning, has…
Uber Unveils Its New Data Quality Management Solution

2020年5月13日

Uber Unveils Its New Data Quality Management Solution

Data quality management is one of those often forgotten aspects of machine learning workflows. Small inconsistencies or…
LinkedIn Open Sources a Small Component to Simplify the TensorFlow-Spark Interoperability

2020年5月7日

LinkedIn Open Sources a Small Component to Simplify the TensorFlow-Spark Interoperability

Interoperating TensorFlow and Apache Spark is a common challenge in real world machine learning scenarios. TensorFlow…
Google Unveils TAPAS, a BERT-Based Neural Network for Querying Tables Using Natural Language

2020年5月6日

Google Unveils TAPAS, a BERT-Based Neural Network for Querying Tables Using Natural Language

Querying relational data structures using natural languages has long been a dream of technologists in the space. With…
Facebook Open Sources Blender, the Largest-Ever Open Domain Chatbot

2020年5月4日

Facebook Open Sources Blender, the Largest-Ever Open Domain Chatbot

Natural language understanding(NLU) has been one of the most active areas adopting state-pf-the-art deep learning…

2 条评论
Microsoft Research Unveils Three Efforts to Advance Deep Generative Models

2020年4月27日

Microsoft Research Unveils Three Efforts to Advance Deep Generative Models

Generative models have been an important component of machine learning for the last few decades. With the emergence of…
Facebook and Amazon Bring Two Projects to PyTorch 1.5 that Streamline the Lifecycle of Production-Ready Deep Learning Models

2020年4月22日

Facebook and Amazon Bring Two Projects to PyTorch 1.5 that Streamline the Lifecycle of Production-Ready Deep Learning Models

PyTorch is one of the fastest growing open source projects in the deep learning space. Initially incubated by Facebook,…

See all articles

Generating Training Datasets Using Energy Based Models that Actually Scale

Jesus Rodriguez

CEO of IntoTheBlock, Co-Founder, Co-Founder of LayerLens, Faktory,and NeuralFabric, Founder of The Sequence AI Newsletter, Guest Lecturer at Columbia, Guest Lecturer at Wharton Business School, Investor, Author.

Understanding Energy-Based Learning

Scalable Energy Based Models

Jesus Rodriguez的更多文章

社区洞察

其他会员也浏览了

Deep Learning and Artificial Intelligence Continues to Breakthrough: Defense Industry

AI Atlas #5 Deep Learning

Why Deep Learning Excites Me

What Uncertainties Do We Need to capture in Deep Learning? [with code]

Advanced Concepts of Machine Learing

Against deep learning: on bio-inspired alternatives

Deep Learning Can’t Progress With IEEE-754 Floating Point. Here’s Why Google, Microsoft, And Intel Are Leaving It Behind

IEEE FORMATING STYLE

Glossary for Machine Learning (ML) recruiting

How to easily Detect Objects with Deep Learning on Raspberry Pi

Understanding Energy-Based Learning

Scalable Energy Based Models

Jesus Rodriguez的更多文章

Robust Agents Are All We Need: Faktory Emerges from Stealth Mode with a Private?Alpha

Google’s BLEURT is BERT for Evaluating Natural Language Generation Models

Two Deep Learning Frameworks and an AI Super-Computer: Microsoft Launches New Efforts to Achieve Large-Scale AI

Uber Open Sources a New Framework for Designing Optimal Statistical Experiments

Uber Unveils Its New Data Quality Management Solution

LinkedIn Open Sources a Small Component to Simplify the TensorFlow-Spark Interoperability

Google Unveils TAPAS, a BERT-Based Neural Network for Querying Tables Using Natural Language

Facebook Open Sources Blender, the Largest-Ever Open Domain Chatbot

Microsoft Research Unveils Three Efforts to Advance Deep Generative Models

Facebook and Amazon Bring Two Projects to PyTorch 1.5 that Streamline the Lifecycle of Production-Ready Deep Learning Models

社区洞察

其他会员也浏览了

Deep Learning and Artificial Intelligence Continues to Breakthrough: Defense Industry

AI Atlas #5 Deep Learning

Why Deep Learning Excites Me

What Uncertainties Do We Need to capture in Deep Learning? [with code]

Advanced Concepts of Machine Learing

Against deep learning: on bio-inspired alternatives

Deep Learning Can’t Progress With IEEE-754 Floating Point. Here’s Why Google, Microsoft, And Intel Are Leaving It Behind

IEEE FORMATING STYLE

Glossary for Machine Learning (ML) recruiting

How to easily Detect Objects with Deep Learning on Raspberry Pi