登录查看更多内容

Bi-gram Model (Part 2)

Jyoti Y.

Cyber Security Data Scientist at Microsoft | Ex-Blockchain.com | Ex-EXL | Ex-Lucidian | DSE | SRCC

发布日期: 2021年6月12日

The last past described all the theories behind the bi-gram model. This article is all about the results of the model and the corresponding interpretation. The article explaining the text analysis of three of the books depicts the similarities in the top words in the training and testing dataset.

Results

The results of the bi-gram model have been presented below:

The results of the uniform model are ass follows:

The average log-likelihood of the bi-gram is lower than that of the unigram model. The simplicity of the model sometimes leads to a better result. A detailed representation of the results of the three models has been presented below.

It is very clear from the chart that the higher gram model has a lower log-likelihood than that of the naive one.

Impact of 's' on average log-likelihood

The results of this analysis differ a little from the unigram results. The average log-likelihood first increases till s = 3 and then starts decreasing.

Other Evaluation Metrics

There are other types of metrics which can be used for evaluation namely cross-entropy and perplexity. The former is the negative of log likelihood and the latter is exponential. These can be implemented very easily with minute changes in the evaluation metric formula.

With this I conclude the bi-gram models. I hope you enjoyed the article. Stay tuned for more!!

Link to the code:

要查看或添加评论，请登录

Jyoti Y.的更多文章

BERT (Part -3)

2021年6月30日

BERT (Part -3)

In the last two articles, I have described each element of the BERT model. This article combines all the concepts of…
BERT (Part-2)

2021年6月29日

BERT (Part-2)

The paper released by Google shows two architectures of BERT: Base: It is consisting of 12 encoder layers, 12 attention…
BERT (Part-1)

2021年6月28日

BERT (Part-1)

In 2019, Google released a breakthrough in the NLP domain. It has introduced the concept that has become the…
Attention Based Model (Part-2)

2021年6月26日

Attention Based Model (Part-2)

In the previous article, we studied the issues related to long sequences faced by an RNN architecture in the case of…
Attention Based Model (Part-1)

2021年6月25日

Attention Based Model (Part-1)

In the previous articles, we have gone through some of the text mining and preliminary methods for text analysis and…
Recurrent Neural Network (Part - 3)

2021年6月23日

Recurrent Neural Network (Part - 3)

For illustration purposes, we are using the airline's review dataset. The first and foremost is to filter the data…
Recurrent Neural Network (Part -2)

2021年6月22日

Recurrent Neural Network (Part -2)

This segment describes the backpropagation of the entire RNN model. Backpropagation is when the final loss is…
Recurrent Neural Network (Part-1)

2021年6月21日

Recurrent Neural Network (Part-1)

In the entire series of NLP, we have come across many techniques like TF-IDF, word2vec, BoW. These techniques are…
Latent Dirichlet Allocation (Part -3)

2021年6月20日

Latent Dirichlet Allocation (Part -3)

The theory and implementation of the model have been provided in the last two articles. This article is primarily about…
Latent Dirichlet Allocation (Part 2)

2021年6月19日

Latent Dirichlet Allocation (Part 2)

The theory behind the entire model has been described in the last article. This article puts light on the code part of…

See all articles

Bi-gram Model (Part 2)

Jyoti Y.

Cyber Security Data Scientist at Microsoft | Ex-Blockchain.com | Ex-EXL | Ex-Lucidian | DSE | SRCC

Results

Impact of 's' on average log-likelihood

Other Evaluation Metrics

Jyoti Y.的更多文章

社区洞察

其他会员也浏览了

Partially Observable MDPs

Understanding Grounding Dino's Thresholds: A Deeper Dive

Accuracy & Prediction of your forecast: Throw the dart to find out.

WHAT IS GRADIENT DESCENT ? Let's see from basic to advance...

Digital Things: FRIENDS, Algorithms, and Making a Trifle

What Are Happy Numbers And How To Find Them

Occam's Razor: Simplifying Complexities with Probability

Area under receiver operating characteristic curve

??Example to illustrate each ensemble method????

Results

Impact of 's' on average log-likelihood

Other Evaluation Metrics

Jyoti Y.的更多文章

BERT (Part -3)

BERT (Part-2)

BERT (Part-1)

Attention Based Model (Part-2)

Attention Based Model (Part-1)

Recurrent Neural Network (Part - 3)

Recurrent Neural Network (Part -2)

Recurrent Neural Network (Part-1)

Latent Dirichlet Allocation (Part -3)

Latent Dirichlet Allocation (Part 2)

社区洞察

其他会员也浏览了

Partially Observable MDPs

Understanding Grounding Dino's Thresholds: A Deeper Dive

Accuracy & Prediction of your forecast: Throw the dart to find out.

WHAT IS GRADIENT DESCENT ? Let's see from basic to advance...

Digital Things: FRIENDS, Algorithms, and Making a Trifle

What Are Happy Numbers And How To Find Them

Occam's Razor: Simplifying Complexities with Probability

Area under receiver operating characteristic curve

??Example to illustrate each ensemble method????