登录查看更多内容

Foundational Papers in NLP: Bi-Directional Attention Flow (BIDAF) network - Seo et al 2016.

Vijay Raghavan Ph.D., M.B.A.,

Leader in AI

发布日期: 2024年1月6日

Circa 2016, Machine comprehension and question answering had become popular tasks in NLP and computer vision. End-to-end neural systems could now achieve promising results by using attention mechanisms to focus on the most relevant parts of the context for answering the question. However, previous attention approaches had some limitations:

They often summarized the context into a fixed-size vector prematurely, losing information.
They employed temporally dynamic attention weights, depending on previous attention steps.
They were usually unidirectional, with the query attending to the context.

This paper introduced the Bi-Directional Attention Flow (BIDAF) network to address these limitations.

The key aspects were:

?Attention was computed at each time step and allowed to flow through to subsequent layers rather than early summarization. This reduced information loss.??
It used memory-less attention, where each attention step depended only on the current query/context, not previous attention. This simplified things.
It employed both context-to-query and query-to-context attention to get complimentary representations.

Evaluations showed BIDAF achieving state-of-the-art results on the SQuAD question answering dataset. Ablations demonstrated the importance of its components. Analyses provided visualizations and comparisons showing BIDAF learning suitable representations for strong machine comprehension.?

BIDAF represented the context paragraph at different granularity levels using the following layers:

1.? Character Embedding Layer (1D CNN): Mapped words to vectors based on character n-grams. CNN outputs were max pooled to obtain fixed-size character-based word representations.??

2.???? Word Embedding Layer (GloVe): Leveraged pre-trained word vectors to map words to additional embeddings.?

3.???? Contextual Embedding Layer (BiLSTM): Employed a bi-directional LSTM on top of the word embeddings to incorporate contextual cues from surrounding words and model temporal interactions.

4.???? Attention Flow Layer: Computed a similarity matrix between LSTM-encoded context and query embeddings to capture relevance between each query and context word. Applied context-to-query (C2Q) attention to attend from context to query words and query-to-context (Q2C) attention to identify salient context words. Combined bidirectional attention representations with context embeddings to create query-aware context representations.??

5.???? Modeling Layer (BiLSTM): Captured interactions between context words conditioned on the query using two layers of bi-directional LSTM over the query-aware representations from the attention flow layer.??

Data & Analytics 6 个月前

What's New in NLP? #8 Coral, McKinsey, Amazon Bedrock,…

Cohere 1 年前

Open Data Science Conference (ODSC) 2 年前

6.???? Output Layer: Predicted answer span start/end positions for QA or missing words for cloze tasks based on the BiLSTM representations of the context.

The training process for the BIDAF network defined a loss function that calculated the negative log probability of the true start and end positions of the answer span in the context, averaged over all training examples. By minimizing this loss, the model was optimized to predict the correct start and end indices using all its components including the CNN, LSTM, attention, and output layers. The test inference process then leveraged the distributions learned by the model to find the optimal answer span. It chose the start and end positions that maximized the product of their predicted probabilities, using dynamic programming for efficient computation in linear time.

Overall, training guided the model parameters to discern likely answer positions, which enabled the testing process to associate the highest joint probability start and end points as the predicted answer span. Defining suitable objectives for both phases allowed learning how to locate answer contexts and efficiently surface the most probable ones.

The paper conducted experiments on machine comprehension using the SQuAD question answering dataset, which contained over 100k examples of questions along with context paragraphs that contained the answer span. As previously stated, the BIDAF model architecture combined character and word-level embeddings and passed them through recurrent and attention layers to predict the start and end positions of the answer in the context. The attention mechanism in BIDAF used both context-to-query and query-to-context alignments to get query-aware context representations.

The results showed that the BIDAF model achieved state-of-the-art performance on the hidden SQuAD test set leaderboard, with an Exact Match score of 73.3 and F1 score of 81.1 using an ensemble of 12 identically initialized models. This outperformed all previous published approaches. Ablation studies on the dev set without ensembling showed that removing character embeddings, word embeddings, bidirectional attention, or using a more traditional dynamic attention formulation resulted in significant performance drops. Further analyses provided visualizations and comparisons suggesting BIDAF learned improved intermediate representations.

(a) t-SNE visualizations of the months names embedded in the two feature spaces. The contextual embedding layer is able to distinguish the two usages of the word May using context from the surrounding text. (b) Venn diagram of the questions answered correctly by our model and the more traditional baseline (Rajpurkar et al., 2016). (c) Correctly answered questions broken down by the 10 most frequent first words in the question.

Attention matrices for question-context tuples. The left palette shows the context paragraph (correct answer in red and underlined), the middle palette shows the attention matrix (each row is a question word, each column is a context word), and the right palette shows the top attention points for each question word, above a threshold.

Additionally, BIDAF was evaluated on cloze-style reading comprehension using CNN/DailyMail datasets. With minor output layer modifications, it achieved new state-of-the-art single model performance, even surpassing previous ensemble approaches. The strong results across two machine comprehension formulations demonstrated the effectiveness of BIDAF's hierarchical modeling and attention mechanisms for this task.

Results on CNN/DailyMail datasets. Includes the results of previous ensemble methods (marked with ? ) for completeness.

Seo, M., Kembhavi, A., Farhadi, A., & Hajishirzi, H. (2016). Bidirectional attention flow for machine comprehension. arXiv preprint arXiv:1611.01603.

要查看或添加评论，请登录

查看全部

Foundational Papers in NLP: Bi-Directional Attention Flow (BIDAF) network - Seo et al 2016.

Vijay Raghavan Ph.D., M.B.A.,

Leader in AI

领英推荐

更多精彩文章

社区洞察

其他会员也浏览了

BERT

AI News Letter, December 31,2022

Essential NLP for Healthcare: Mastering Tokenization, Stemming, and Lemmatization

Evolution of Word Embeddings: A Journey Through NLP History

FineTuning BERT- Named Entity Recognition - Bidirectional Encoders Representation of Transformers - Part 4

Beyond Context: Answering Deeper Questions for Healthcare and Life Sciences by Combining Spark NLP and Graph Database Analytics

How I turned a NLP Transformer into a Time Series Predictor (PyTorch)

Developing a Stock Market Prediction Tool using NLP with Practical Code Examples

领英推荐

Supercharge Your Sales Force: The Ranking Revolution

2024年9月23日

Nvidia's Heptagon of Power: Crushing the AI Game with 7 Unbeatable Strategies

2024年9月16日

AI Engineering: Scaling your models with Ray Train for Blazing-Fast Performance

2024年9月9日

Impossible Distillation: How to Make High-quality Lemonade out of Small, Low-quality Model.

2024年9月4日

Comprehensive Report on LLM Evaluation Metrics

2024年9月3日

Generative AI Series - 4 Introduction to Autoencoders and Variational Autoencoders

2024年7月29日

Generative AI Series 3 - Introduction to Diffusion Models

2024年7月22日

Generative AI Series 2 - Introduction to Energy Based Models

2024年7月8日

Generative AI Series - 1 Introduction to Normalized Flow Models - Without equations

2024年6月24日

The Alchemy of Language: Distilling High-Quality Models from Small Language Models (SLMs)

2024年6月18日