Improving Predictions in Language Modelling
Source: Google Images

Improving Predictions in Language Modelling

Here is something that I picked up along the way on how we can improve our predictions of LSTM networks, specifically regarding Language Modelling, i.e., Generating Text.

Here are some techniques that help LSTMs perform better at the prediction stage:

  1. Greedy Sampling,
  2. Beam Search,
  3. Word Embeddings: Using Word Vectors instead of a one-hot-encoded representation of words, and
  4. Using bidirectional LSTMs

NOTE: These optimization techniques are not specific to LSTMs; rather, any sequential model can benefit from them.


Now, let's understand them in the context of Language Modelling,

  • Scenario: Let's assume that our LSTM network is trained on a corpus of text data, and given an initial set of words, it can predict subsequent words, making sense like a story.
  • Problem: If we try always to predict the next word with the highest probability, the LSTM will tend to produce very monotonic results. For example, due to the frequent occurrence of stop words (e.g., is, the), it may repeat them many times before switching to another word.


Solutions:

  • Greedy Sampling: One way to get around this is to use greedy sampling, where we pick the predicted best n and sample from that set. This helps to break the monotonic nature of the predictions.

For e.g.: Suppose we have a sentence 'Amit is learning Natural Language Processing'. Given the first word, 'Amit', and we want our LSTM network to predict the subsequent words.

If we attempt to choose samples deterministically, the LSTM might output something like the following: 'Amit is learning is Natural learning'

However, by sampling the next word from a subset of words in the vocabulary (most highly probable ones), the LSTM is forced to vary the prediction and might output the desired sentence with a respectable probability or something similar like: 'Amit is learning Processing Natural Language'.

However, although greedy sampling helps add more flavor/diversity to the generated text, this method does not guarantee that the output will always be realistic, especially when outputting longer text sequences.


Next comes:

  • Beam Search: In this, the predictions are found by solving a search problem. Particularly, we predict several steps ahead for multiple candidates at each step. This gives rise to a tree-like structure with candidate sequences of words.

The crucial idea of beam search is to produce the 'b' outputs simultaneously instead of a single output. We are looking farther into the future before making a prediction, which usually leads to better results.

Here, 'b' is known as the length of the beam, and the 'b' outputs produced are known as the beam.


  • Bidirectional LSTMs: Making LSTMs bidirectional is another way of improving the quality of the predictions of an LSTM. By this, we mean training the LSTM with text read in both directions: from the beginning to the end and the end to the beginning.

Other variants of LSTMs include Peephole connections, GRUs, etc.


If you are interested in delving deeper into understanding these concepts, consider checking out my notebooks:


In one of the previous posts, I shared about neutral networks like RNNs, LSTMs, and GRUs, which are explicitly used for text data.

Here is a link to the post:



Varun Sai Irukulapati

Manager @INR Pharmachem | BS (DS) @IITM'25

1 年

I'm also working on NLP. let us discuss it sometime.

回复

要查看或添加评论,请登录

Amit Vikram Raj的更多文章

  • How to SSH Tunnel into AWS EC2 and connect to DocumentDB using Python?

    How to SSH Tunnel into AWS EC2 and connect to DocumentDB using Python?

    Why it's needed? Before I tell you why it's needed, I'd like to share why I had to do it. The answer is simple: to…

    2 条评论
  • Layer Normalization

    Layer Normalization

    Layer Norm, Batch Norm & Covariate Shift: Continuing from my last post on batch normalization, Here are a few things on…

  • Bahdanau Attention Mechanism

    Bahdanau Attention Mechanism

    In my last NLP post regarding NMT(Neural Machine Translation), I shared about its architecture in a very intuitive…

  • NMT Architecture

    NMT Architecture

    In my previous post, I shared a higher level understanding of NMT(Neural Machine Translation) architecture. So…

社区洞察

其他会员也浏览了