登录查看更多内容

Knowledge is Everything: Using Representation Learning to Optimize Feature Extraction and Knowledge Quality

Jesus Rodriguez

CEO of IntoTheBlock, Co-Founder, Co-Founder of LayerLens, Faktory,and NeuralFabric, Founder of The Sequence AI Newsletter, Guest Lecturer at Columbia, Guest Lecturer at Wharton Business School, Investor, Author.

发布日期: 2019年12月12日

Lately, I’ve been working on a couple of scenarios that have reminded me of the importance of feature extraction in deep learning models. As a result, I would like to summarize some ideas I’ve outlined before about some of the principles of knowledge quality in deep learning and model and the applicability of representation learning to those scenarios.

Understanding the characteristics of input datasets is an essential capability of machine learning algorithms. Given a specific input, machine learning models need to infer specific features about the data in order to perform some target actions. Representation learning or feature learning is the subdiscipline of the machine learning space that deals with extracting features or understanding the representation of a dataset.

Representation learning can be illustrated using a very simple example. Take deep learning algorithm that is trying to identify geometric shapes in the following image:

In order to match pixels to geometric shapes, the algorithm first needs to understand some basic features/representations of the data such as the number of corners. That’s the role of representation learning.

Representation learning has been a established discipline in the machine learning space for decades but its relevance has increased tremendously lately with the emergence of deep learning. While traditional machine learning techniques such as classification often deal with mathematically well-structured datasets, deep learning models process data such as images or sounds that have not well-defined features. In that sense, representation learning is a key element of most deep learning architectures.

The central problem of representation learning is to determine an optimal representation for the input data. In the context of deep learning, the quality of a representation is mostly given by how much it facilitates the learning process. In the real world, the learning algorithm and the underlying representation of a model are directly related.

The No Free Lunch Theorem

If the knowledge representation of a model is tied to its learning algorithm then selecting the correct representation should be trivial, right? We simply pick the knowledge representation associated with the learning task and that should guarantee an optimal performance. I wish were that simple. In the journey to find an optimal representation we quickly find an old friend: The No Free Lunch Theorem(NFLT).

NFLT is one of those mathematical paradoxes that puzzles the most pragmatic data scientists and technologists. In a nutshell, NFLT states that, averages over all possible data generating distributions, every machine learning algorithm has approximately the same error rate when processing previously unobserved points (read my previous article about NFLT). In other words, no machine learning algorithm is better than any other given a broad enough dataset.

In the context of representation learning, NFLT demonstrates that multiple knowledge representations can be applicable to the learning task. If that’s the case, how can we empirically decide on one knowledge representation vs. another? The answer is one of the core, and often ignored, techniques in machine learning and deep learning models: regularization.

Regularization

A core task of machine learning algorithms is to perform well with new inputs outside the training dataset. Optimizing that task is the role of regularization. Conceptually, regularization induces modifications to a machine learning algorithm that reduces the test or generalization error without affecting the training error.

Let’s now come full circle and see how regularization is related to representation learning. The relationship is crystal clear: the quality of a knowledge representation is fundamentally related to its ability to generalize knowledge efficiently. In other words, the knowledge representation must be able to adapt to new inputs outside the training dataset. In order to perform well with new inputs and reduce the generalization error, any representation of knowledge should be useful in regularization techniques. Therefore, the quality of representation learning models is directly influenced by its ability to work with different regularization strategies. The next step is to figure out which regularization strategies are specifically relevant in representation learning. That will be the topic of a future post.

Now that we know that regularization is a mechanism to improve the representation of knowledge the next step is to evaluate the quality of a given representation. Essentially, we are trying to answer a simple question: what makes a knowledge representation superior to others?

Improving Knowledge by Regularization

Just to get the terminology straight, by regularization we are referring to the ability of a model to reduce its test error(generation error) without impacting its training error. Every knowledge representation has certain characteristics that makes it more prompt to specific regularization techniques. Artificial intelligence luminaries Ian Goodfellow and Yoshua Bengio have done some remarkable work in the area of regularization. Based on Goodfellow and Bengio’s thesis, there are a few characteristics that make knowledge representations more efficient when comes to regularization. I’ve summarized five of my favorite regulation patters below:

1 — Disentangling of Causal Factors

One of the key indicators of a robust knowledge representation is the fact that its features correspond to the underlying causes of the training data. This characteristic helps to separate which features in the representation correspond to specific causes in the input dataset and, consequently, help to better separate some features from others.

2 — Smoothness

Representation smoothness is the assumption that a value of a hypothesis doesn’t change drastically among points in close proximity in the input dataset. Mathematically, smoothness implies that f(x+ed)≈ f(x) for a very small e. This characteristic allows knowledge representations to generalize better across close areas in the input dataset.

3-Linearity

Linearity is a regularization pattern that is complementary to the smoothness assumption. Conceptually, this characteristic assumes that the relationship between some input variables is linear (f(x) = ax + b) which allows to make accurate predictions even when there are relatively large variations from the input.

4 — Hierarchical Structures

Knowledge representations based on hierarchies are ideal for many regularization techniques. A hierarchy assumes that every step in the network can be explained by previous steps which tremendously helps to better reason through a knowledge representation.

5 — Manifold Representation

Manifold learning is one of the most fascinating, mathematically deep foundations of machine learning. Conceptually, a manifold is a high dimensional area of fully connected points. The manifold assumption states that probability masses tend to concentrate is manifolds in the input data. The great thing about manifolds is that they are relatively easy to reduce from high dimensional structures to lower dimensional representations which are easier and cheaper to manipulate. Many regularization algorithms are especially efficient at detecting and manipulating manifolds.

Despite its importance, representation learning remains a not very well-known discipline in the deep learning space. Understanding the features and representation of underlying datasets is essential in order to select the best neural network architectures for any given tasks. Some of the characteristics explained in this article provide a simple framework to think about representation learning in the context of deep learning solutions.

要查看或添加评论，请登录

Jesus Rodriguez的更多文章

Robust Agents Are All We Need: Faktory Emerges from Stealth Mode with a Private?Alpha

2024年2月28日

Robust Agents Are All We Need: Faktory Emerges from Stealth Mode with a Private?Alpha

Last year, I had the unique opportunity to incubate a new project in the autonomous agents space, alongside a…

1 条评论
Google’s BLEURT is BERT for Evaluating Natural Language Generation Models

2020年5月27日

Google’s BLEURT is BERT for Evaluating Natural Language Generation Models

Natural language generation(NLG) is one of the fastest growing areas of research in deep learning. NLG applications are…
Two Deep Learning Frameworks and an AI Super-Computer: Microsoft Launches New Efforts to Achieve Large-Scale AI

2020年5月25日

Two Deep Learning Frameworks and an AI Super-Computer: Microsoft Launches New Efforts to Achieve Large-Scale AI

Training models with massive datasets is becoming the norm in modern deep learning applications. Some of the latest…
Uber Open Sources a New Framework for Designing Optimal Statistical Experiments

2020年5月18日

Uber Open Sources a New Framework for Designing Optimal Statistical Experiments

Rapid experimentation is a key element of modern software development. The raise in popularity of machine learning, has…
Uber Unveils Its New Data Quality Management Solution

2020年5月13日

Uber Unveils Its New Data Quality Management Solution

Data quality management is one of those often forgotten aspects of machine learning workflows. Small inconsistencies or…
LinkedIn Open Sources a Small Component to Simplify the TensorFlow-Spark Interoperability

2020年5月7日

LinkedIn Open Sources a Small Component to Simplify the TensorFlow-Spark Interoperability

Interoperating TensorFlow and Apache Spark is a common challenge in real world machine learning scenarios. TensorFlow…
Google Unveils TAPAS, a BERT-Based Neural Network for Querying Tables Using Natural Language

2020年5月6日

Google Unveils TAPAS, a BERT-Based Neural Network for Querying Tables Using Natural Language

Querying relational data structures using natural languages has long been a dream of technologists in the space. With…
Facebook Open Sources Blender, the Largest-Ever Open Domain Chatbot

2020年5月4日

Facebook Open Sources Blender, the Largest-Ever Open Domain Chatbot

Natural language understanding(NLU) has been one of the most active areas adopting state-pf-the-art deep learning…

2 条评论
Microsoft Research Unveils Three Efforts to Advance Deep Generative Models

2020年4月27日

Microsoft Research Unveils Three Efforts to Advance Deep Generative Models

Generative models have been an important component of machine learning for the last few decades. With the emergence of…
Facebook and Amazon Bring Two Projects to PyTorch 1.5 that Streamline the Lifecycle of Production-Ready Deep Learning Models

2020年4月22日

Facebook and Amazon Bring Two Projects to PyTorch 1.5 that Streamline the Lifecycle of Production-Ready Deep Learning Models

PyTorch is one of the fastest growing open source projects in the deep learning space. Initially incubated by Facebook,…

See all articles

Knowledge is Everything: Using Representation Learning to Optimize Feature Extraction and Knowledge Quality

Jesus Rodriguez

CEO of IntoTheBlock, Co-Founder, Co-Founder of LayerLens, Faktory,and NeuralFabric, Founder of The Sequence AI Newsletter, Guest Lecturer at Columbia, Guest Lecturer at Wharton Business School, Investor, Author.

The No Free Lunch Theorem

Regularization

Improving Knowledge by Regularization

Jesus Rodriguez的更多文章

社区洞察

其他会员也浏览了

Understanding Different Types of Machine Learning Algorithms - Exploring Machine Learning Algorithms and Services - InbuiltData

Artificial Intelligence #5 : A taxonomy of machine learning and deep learning algorithms

Artificial Intelligence #48: How do we combine statistical thinking and machine learning?

World of Machine Learning

Why you should add statistical learning to your machine learning tool kit

ML Models

14 YouTube channels that provide valuable content for learning machine learning, AI, and data science

Know Your Algorithms: A Comprehensive Guide to Common Machine Learning Algorithms

What is Machine Learning? Article by Saurav Mukherjee

Big Data's Role in Supercharging Machine Learning and AI

The No Free Lunch Theorem

Regularization

Improving Knowledge by Regularization

Jesus Rodriguez的更多文章

Robust Agents Are All We Need: Faktory Emerges from Stealth Mode with a Private?Alpha

Google’s BLEURT is BERT for Evaluating Natural Language Generation Models

Two Deep Learning Frameworks and an AI Super-Computer: Microsoft Launches New Efforts to Achieve Large-Scale AI

Uber Open Sources a New Framework for Designing Optimal Statistical Experiments

Uber Unveils Its New Data Quality Management Solution

LinkedIn Open Sources a Small Component to Simplify the TensorFlow-Spark Interoperability

Google Unveils TAPAS, a BERT-Based Neural Network for Querying Tables Using Natural Language

Facebook Open Sources Blender, the Largest-Ever Open Domain Chatbot

Microsoft Research Unveils Three Efforts to Advance Deep Generative Models

Facebook and Amazon Bring Two Projects to PyTorch 1.5 that Streamline the Lifecycle of Production-Ready Deep Learning Models

社区洞察

其他会员也浏览了

Understanding Different Types of Machine Learning Algorithms - Exploring Machine Learning Algorithms and Services - InbuiltData

Artificial Intelligence #5 : A taxonomy of machine learning and deep learning algorithms

Artificial Intelligence #48: How do we combine statistical thinking and machine learning?

World of Machine Learning

Why you should add statistical learning to your machine learning tool kit

ML Models

14 YouTube channels that provide valuable content for learning machine learning, AI, and data science

Know Your Algorithms: A Comprehensive Guide to Common Machine Learning Algorithms

What is Machine Learning? Article by Saurav Mukherjee

Big Data's Role in Supercharging Machine Learning and AI