登录查看更多内容

Improving Predictions in Language Modelling

Amit Vikram Raj

ML Platform Engineer @ Pattern

发布日期: 2023年9月10日

Here is something that I picked up along the way on how we can improve our predictions of LSTM networks, specifically regarding Language Modelling, i.e., Generating Text.

Here are some techniques that help LSTMs perform better at the prediction stage:

Greedy Sampling,
Beam Search,
Word Embeddings: Using Word Vectors instead of a one-hot-encoded representation of words, and
Using bidirectional LSTMs

NOTE: These optimization techniques are not specific to LSTMs; rather, any sequential model can benefit from them.

Now, let's understand them in the context of Language Modelling,

Scenario: Let's assume that our LSTM network is trained on a corpus of text data, and given an initial set of words, it can predict subsequent words, making sense like a story.
Problem: If we try always to predict the next word with the highest probability, the LSTM will tend to produce very monotonic results. For example, due to the frequent occurrence of stop words (e.g., is, the), it may repeat them many times before switching to another word.

Solutions:

Greedy Sampling: One way to get around this is to use greedy sampling, where we pick the predicted best n and sample from that set. This helps to break the monotonic nature of the predictions.

For e.g.: Suppose we have a sentence 'Amit is learning Natural Language Processing'. Given the first word, 'Amit', and we want our LSTM network to predict the subsequent words.

If we attempt to choose samples deterministically, the LSTM might output something like the following: 'Amit is learning is Natural learning'

However, by sampling the next word from a subset of words in the vocabulary (most highly probable ones), the LSTM is forced to vary the prediction and might output the desired sentence with a respectable probability or something similar like: 'Amit is learning Processing Natural Language'.

However, although greedy sampling helps add more flavor/diversity to the generated text, this method does not guarantee that the output will always be realistic, especially when outputting longer text sequences.

Next comes:

领英推荐

Fine-Tuning a Language Model

Solutyics 8 个月前

AMR Future Brief| Why Have Large Language Models…

Allied Market Research 8 个月前

Advanced Machine Learning with Basic Excel

Vincent Granville 2 年前

Beam Search: In this, the predictions are found by solving a search problem. Particularly, we predict several steps ahead for multiple candidates at each step. This gives rise to a tree-like structure with candidate sequences of words.

The crucial idea of beam search is to produce the 'b' outputs simultaneously instead of a single output. We are looking farther into the future before making a prediction, which usually leads to better results.

Here, 'b' is known as the length of the beam, and the 'b' outputs produced are known as the beam.

Bidirectional LSTMs: Making LSTMs bidirectional is another way of improving the quality of the predictions of an LSTM. By this, we mean training the LSTM with text read in both directions: from the beginning to the end and the end to the beginning.

Other variants of LSTMs include Peephole connections, GRUs, etc.

If you are interested in delving deeper into understanding these concepts, consider checking out my notebooks:

?? Notebook on Understanding LSTM & Improving Predictions in Language Modelling.

In one of the previous posts, I shared about neutral networks like RNNs, LSTMs, and GRUs, which are explicitly used for text data.

Here is a link to the post:

Varun Sai Irukulapati

Manager @INR Pharmachem | BS (DS) @IITM'25

1 年

I'm also working on NLP. let us discuss it sometime.

查看更多评论

要查看或添加评论，请登录

Amit Vikram Raj的更多文章

How to SSH Tunnel into AWS EC2 and connect to DocumentDB using Python?

2024年1月20日

How to SSH Tunnel into AWS EC2 and connect to DocumentDB using Python?

Why it's needed? Before I tell you why it's needed, I'd like to share why I had to do it. The answer is simple: to…

2 条评论
Layer Normalization

2023年10月1日

Layer Normalization

Layer Norm, Batch Norm & Covariate Shift: Continuing from my last post on batch normalization, Here are a few things on…
Bahdanau Attention Mechanism

2023年9月21日

Bahdanau Attention Mechanism

In my last NLP post regarding NMT(Neural Machine Translation), I shared about its architecture in a very intuitive…
NMT Architecture

2023年9月11日

NMT Architecture

In my previous post, I shared a higher level understanding of NMT(Neural Machine Translation) architecture. So…

Improving Predictions in Language Modelling

Amit Vikram Raj

ML Platform Engineer @ Pattern

领英推荐

Amit Vikram Raj的更多文章

社区洞察

其他会员也浏览了

Advanced Machine Learning with Basic Excel

How to Launch LLM Chatbot Powered by Enterprise Data on E2E Cloud

How Large Language Models (LLMs) Work and How They Are Developed

A Friendly Introduction to Stemming in Natural Language Processing

NLP

Large Language Models

Natural Language Processing using Python

Unlocking the Power of Small Language Models (SLMs): Evolution of Phi

Large Language Models ( Under 5 Mins)

Do language models memorize?

领英推荐

Amit Vikram Raj的更多文章

How to SSH Tunnel into AWS EC2 and connect to DocumentDB using Python?

Layer Normalization

Bahdanau Attention Mechanism

NMT Architecture

社区洞察

其他会员也浏览了

Advanced Machine Learning with Basic Excel

How to Launch LLM Chatbot Powered by Enterprise Data on E2E Cloud

How Large Language Models (LLMs) Work and How They Are Developed

A Friendly Introduction to Stemming in Natural Language Processing

NLP

Large Language Models

Natural Language Processing using Python

Unlocking the Power of Small Language Models (SLMs): Evolution of Phi

Large Language Models ( Under 5 Mins)

Do language models memorize?