Neural Language Models (NLM) without pain
Ibrahim Sobh - PhD
?? Senior Expert of Artificial Intelligence, Valeo Group | LinkedIn Top Voice | Machine Learning | Deep Learning | Data Science | Computer Vision | NLP | Developer | Researcher | Lecturer
What is a Language Model?
Why do we need Language Models?
Language Modeling is a subcomponent of many NLP tasks, especially those involving generating text or estimating the probability of text:
n-gram Language Models
Using a large amount of text (corpus), we collect statistics about how frequently different words are, and use these to predict the next word. For example, the probability that a word w comes after these three words “students opened their” can be estimated as follows:?
The above example is a 4-gram model. And we may get:?
Then we can conclude that the word “books” is more probable than “cars” in this context.?
Accordingly, arbitrary text can be generated from a language model given starting word(s), by sampling from the output probability distribution of the next word, and so on.
Language Modeling Toolkits:
File sizes: approx. 24 GB compressed (gzip'ed) text files
Number of tokens: 1,024,908,267,229
Number of sentences: 95,119,665,584
Number of unigrams: 13,588,391
Number of bigrams: 314,843,401
Number of trigrams: 977,069,902
Number of fourgrams: 1,313,818,354
Number of fivegrams: 1,176,470,663
Examples of 4-gram data:
serve as the incoming 92
serve as the incubator 99
serve as the independent 794
serve as the index 223
serve as the indication 72
serve as the indicator 120
serve as the indicators 45
serve as the indispensable 111
serve as the indispensible 40
...
Sparsity problem:?
Large storage requirements: Need to store count for all n-grams you saw in the corpus.
For more information, kindly refer to the article: Probabilistic Language Models
Neural Language Model (NLM)
NLM usually (but not always) uses an RNN to learn sequences of words (sentences, paragraphs, … etc) and hence can predict the next word.?
Advantages:?
As depicted, At each step, we have a probability distribution of the next word over the vocabulary.
Disadvantages:?
Evaluating Language Models
Perplexity is the standard evaluation metric for Language Models. Perplexity is defined as the inverse probability of a text, according to the Language Model. A good language model should give a lower Perplexity for a test text. Specifically, a lower perplexity for a given text means that text has a high probability in the eyes of that Language Model.
Moreover, if we have two language models, for example, one for sports and the other for politics, we can use Perplexity to classify a piece of text to be sports or politics based on the lower Perplexity value.
Language Modeling is the task of predicting what word comes next
More advanced and related topics such as neural machine translation, attention, and transformers will be / are discussed.
Doing things with weather data ? ????
4 年Thanks for the read! Today, RNNs are increasingly being replaced by Transformer based architectures due to their parallelism. I've been looking into them recently and am really impressed about what they can achieve. https://www.machinecurve.com/index.php/2020/12/28/introduction-to-transformers-in-machine-learning/