登录查看更多内容

A Neural Model to Learn Language Closer to How Humans Do

Jesus Rodriguez

CEO of IntoTheBlock, Co-Founder, Co-Founder of LayerLens, Faktory,and NeuralFabric, Founder of The Sequence AI Newsletter, Guest Lecturer at Columbia, Guest Lecturer at Wharton Business School, Investor, Author.

发布日期: 2019年3月25日

Natural language understanding(NLU) is one of the disciplines that has seen leading the deep learning revolution of the last few years. From basic chatbots to general-purpose digital assistants, conversational interfaces have become of the most prevalent manifestations of artificial intelligence(AI) influencing our daily lives. Despite the remarkable progress, NLU applications seem to be mostly constrained to task-specific models in which the language representation is very tailored to a specific task. Recently, researchers from Microsoft published a new paper and implementation of a new technique that can learn language representations across different NLU tasks.

The specialization of NLU models has its roots on something call language embeddings. Conceptually, a is a process of mapping symbolic natural language text (for example, words, phrases and sentences) to semantic vector representations. Currently, most NLU models rely on domain-specific language embeddings that can’t be applicable to other NLU tasks. In order to create more general-purpose conversational applications, we need language embedding models that can be reused across different NLU tasks.

The domain specialization characteristic of language embeddings in in direct contrast to how humans master language concepts. Upon learning a specific concept, humans are able to reuse it and apply it to numerous conversations across different topics. For instance, its trivial for a kid that just learned to ski to have a conversation about skating than for someone who has never been exposed to those activities. If we were modeling those conversations using NLU methods, we would have to train different models in the ski and skating language terms in order to have an effective dialog. Developing language embedding models that can be applied across different NLU tasks is one of the top challenges of the current generation of NLU techniques.

The Foundation: Multi-Task Learning and Language Model Pre-Training

Given its relevance, the idea of creating reusable language embeddings has been an active area of research in the NLU space for the last decade. Those efforts have produced two main techniques that represent the foundation to Microsoft’s new language learning model: multi-task learning and language model pre-training.

As it names clearly indicates, multi-task learning(MTL) is inspired by human learning activities where people often apply the knowledge learned from previous tasks to help learn a new task. In the context of language learning, MTL models create embeddings that can be reused across different NLU activities. The challenge with traditional MTL models is that they relied on supervised learning techniques that require large amounts of task-specific labeled data which is rarely available and difficult to scale.

To mitigate some of the challenges of MTL, researchers looked at the emerging field of semi-supervised learning. Language model pre-training(LMPT) is a new technique that can learn universal language representations by leveraging large amounts of unlabeled data. LMPT models are initially trained in unsupervised objectives and then fine-tuned for specific NLU tasks.

Together MTL and LMPT constitute the foundation of ‘s new language embedding model. To some extent, Microsoft approached the challenge of creating reusable language embeddings not by inventing a brand new method but by cleverly combining MTL and LMPT in a new neural network architecture that can learn textual representations that can be applied to different NLU tasks.

MT-DNN

Multi-Task Deep Learning Network(MT-DNN) is a new multi-task network model for learning universal language embeddings. MT-DNN combines strategies from muti-task learning and language model pre-training to achieve embedding reusability across different NLU tasks while maintaining state-of-the-art performance. Specifically, the embeddings learned by MT-DNN focused on four types of language tasks:

· Single-Sentence Classification: Given a sentence, the model labels it using one of the predefined class labels.

· Text Similarity: Given a pair of sentences, the model predicts a real-value score indicating the semantic similarity of the two sentences.

· Pairwise Text Classification: Given a pair of sentences, the model determines the relationship of the two sentences based on a set of pre-defined labels.

· Relevance Ranking: Given a query and a list of candidate answers, the model ranks all the candidates in the order of relevance to the query.

MT-DNN is based on an intriguing neural network architecture that combines general-purpose and task specific layers. The lower layers are shared across all tasks while the top layers are task-speci?c. The input X, either a sentence or a pair of sentences, is ?rst represented as a sequence of embedding vectors, one for each word, in l1. Then the transformer-based encoder captures the contextual information for each word and generates the shared contextual embedding vectors in l2. Finally, for each task, additional task-speci?c layers generate task-speci?c representations, followed by operations necessary for classi?cation, similarity scoring, or relevance ranking. MT-DNN initializes its shared layers using language model pre-training, then refines them via multi-task learning.

To train the MT-DNN model, Microsoft used a two-stage process. The first stage is based on a language pretraining model in which the parameters of the lexicon encoder and Transformer encoder are learned using two unsupervised prediction tasks: masked language modeling and next sentence prediction. That phase is followed by a multi-task fine-tuning stage in which a minibatch based stochastic gradient descent is used to learn all the parameters of the model and optimize for task-specific objectives.

MT-DNN in Action

Microsoft evaluated MT-DNN against different state-of-the-art multi-task language models using three popular benchmarks: GLUE, Stanford Natural Language Inference (SNLI), and SciTail. Among the candidate models, Microsoft included Google’s BERT considered by many the gold standard of language pre-training techniques. In all tests, MT-DNN consistently outperformed the alternative models showing a tremendous level of efficiency when adapting to new tasks. The following matrix summarizes the results of the GLUE test.

One way to evaluate how universal the language embeddings are is to measure how fast the embeddings can be adapted to a new task, or how many task-specific labels are needed to get a reasonably good result on the new task. More universal embeddings require fewer task-specific labels. The following chart shows how MT-DNN produced language embeddings that were considerably more universal than those produced by BERT for the same task.

Together with the research paper, Microsoft open sourced a PyTorch-based implementation of MT-DNN. Developers can test this implementation by downloading and running the Docker instance encapsulating the model.

The transition from task-specific to universal language embeddings will be one of the main areas of focus on the next generation of NLU applications. Although in nascent stages, techniques such as MT-DNN highlight some of the key ideas to create universal language embedding representations that are applicable to many NLU tasks.

Saji S.

6 年

Great article Jesus Rodriguez on universal language dataset embedding and compelling accuracy stats on MTDNN wave .

要查看或添加评论，请登录

Jesus Rodriguez的更多文章

Robust Agents Are All We Need: Faktory Emerges from Stealth Mode with a Private?Alpha

2024年2月28日

Robust Agents Are All We Need: Faktory Emerges from Stealth Mode with a Private?Alpha

Last year, I had the unique opportunity to incubate a new project in the autonomous agents space, alongside a…

1 条评论
Google’s BLEURT is BERT for Evaluating Natural Language Generation Models

2020年5月27日

Google’s BLEURT is BERT for Evaluating Natural Language Generation Models

Natural language generation(NLG) is one of the fastest growing areas of research in deep learning. NLG applications are…
Two Deep Learning Frameworks and an AI Super-Computer: Microsoft Launches New Efforts to Achieve Large-Scale AI

2020年5月25日

Two Deep Learning Frameworks and an AI Super-Computer: Microsoft Launches New Efforts to Achieve Large-Scale AI

Training models with massive datasets is becoming the norm in modern deep learning applications. Some of the latest…
Uber Open Sources a New Framework for Designing Optimal Statistical Experiments

2020年5月18日

Uber Open Sources a New Framework for Designing Optimal Statistical Experiments

Rapid experimentation is a key element of modern software development. The raise in popularity of machine learning, has…
Uber Unveils Its New Data Quality Management Solution

2020年5月13日

Uber Unveils Its New Data Quality Management Solution

Data quality management is one of those often forgotten aspects of machine learning workflows. Small inconsistencies or…
LinkedIn Open Sources a Small Component to Simplify the TensorFlow-Spark Interoperability

2020年5月7日

LinkedIn Open Sources a Small Component to Simplify the TensorFlow-Spark Interoperability

Interoperating TensorFlow and Apache Spark is a common challenge in real world machine learning scenarios. TensorFlow…
Google Unveils TAPAS, a BERT-Based Neural Network for Querying Tables Using Natural Language

2020年5月6日

Google Unveils TAPAS, a BERT-Based Neural Network for Querying Tables Using Natural Language

Querying relational data structures using natural languages has long been a dream of technologists in the space. With…
Facebook Open Sources Blender, the Largest-Ever Open Domain Chatbot

2020年5月4日

Facebook Open Sources Blender, the Largest-Ever Open Domain Chatbot

Natural language understanding(NLU) has been one of the most active areas adopting state-pf-the-art deep learning…

2 条评论
Microsoft Research Unveils Three Efforts to Advance Deep Generative Models

2020年4月27日

Microsoft Research Unveils Three Efforts to Advance Deep Generative Models

Generative models have been an important component of machine learning for the last few decades. With the emergence of…
Facebook and Amazon Bring Two Projects to PyTorch 1.5 that Streamline the Lifecycle of Production-Ready Deep Learning Models

2020年4月22日

Facebook and Amazon Bring Two Projects to PyTorch 1.5 that Streamline the Lifecycle of Production-Ready Deep Learning Models

PyTorch is one of the fastest growing open source projects in the deep learning space. Initially incubated by Facebook,…

See all articles

A Neural Model to Learn Language Closer to How Humans Do

Jesus Rodriguez

CEO of IntoTheBlock, Co-Founder, Co-Founder of LayerLens, Faktory,and NeuralFabric, Founder of The Sequence AI Newsletter, Guest Lecturer at Columbia, Guest Lecturer at Wharton Business School, Investor, Author.

The Foundation: Multi-Task Learning and Language Model Pre-Training

MT-DNN

MT-DNN in Action

Jesus Rodriguez的更多文章

社区洞察

其他会员也浏览了

Unlocking the Future: Understanding Large Language Models and AI

Mixture-of-Experts Meets Instruction Tuning: A Winning Combination for Large Language Models

Yapay Zeka Uzmanlar? i?in Büyük Dil Modelleri Hakk?nda Makale ?nerileri

ChatGPT is sexy! But how the heck does it (large language models) work

Multimodal Large Language Models (MLLM)

Basic Guide to Artificial Intelligence Terminology, Applications, and Examples

Why is Machine Learning not enough for natural language understanding and Conversational AI, CAI?

Simplifying Continual Pre-training of Large Language Models

LANGUAGE MODEL

BERT: The Breakthrough in Bidirectional Language Understanding

The Foundation: Multi-Task Learning and Language Model Pre-Training

MT-DNN

MT-DNN in Action

Jesus Rodriguez的更多文章

Robust Agents Are All We Need: Faktory Emerges from Stealth Mode with a Private?Alpha

Google’s BLEURT is BERT for Evaluating Natural Language Generation Models

Two Deep Learning Frameworks and an AI Super-Computer: Microsoft Launches New Efforts to Achieve Large-Scale AI

Uber Open Sources a New Framework for Designing Optimal Statistical Experiments

Uber Unveils Its New Data Quality Management Solution

LinkedIn Open Sources a Small Component to Simplify the TensorFlow-Spark Interoperability

Google Unveils TAPAS, a BERT-Based Neural Network for Querying Tables Using Natural Language

Facebook Open Sources Blender, the Largest-Ever Open Domain Chatbot

Microsoft Research Unveils Three Efforts to Advance Deep Generative Models

Facebook and Amazon Bring Two Projects to PyTorch 1.5 that Streamline the Lifecycle of Production-Ready Deep Learning Models

社区洞察

其他会员也浏览了

Unlocking the Future: Understanding Large Language Models and AI

Mixture-of-Experts Meets Instruction Tuning: A Winning Combination for Large Language Models

Yapay Zeka Uzmanlar? i?in Büyük Dil Modelleri Hakk?nda Makale ?nerileri

ChatGPT is sexy! But how the heck does it (large language models) work

Multimodal Large Language Models (MLLM)

Basic Guide to Artificial Intelligence Terminology, Applications, and Examples

Why is Machine Learning not enough for natural language understanding and Conversational AI, CAI?

Simplifying Continual Pre-training of Large Language Models

LANGUAGE MODEL

BERT: The Breakthrough in Bidirectional Language Understanding