Bi-gram Model (Part 2)
The last past described all the theories behind the bi-gram model. This article is all about the results of the model and the corresponding interpretation. The article explaining the text analysis of three of the books depicts the similarities in the top words in the training and testing dataset.
Results
The results of the bi-gram model have been presented below:
The results of the uniform model are ass follows:
The average log-likelihood of the bi-gram is lower than that of the unigram model. The simplicity of the model sometimes leads to a better result. A detailed representation of the results of the three models has been presented below.
It is very clear from the chart that the higher gram model has a lower log-likelihood than that of the naive one.
Impact of 's' on average log-likelihood
The results of this analysis differ a little from the unigram results. The average log-likelihood first increases till s = 3 and then starts decreasing.
Other Evaluation Metrics
There are other types of metrics which can be used for evaluation namely cross-entropy and perplexity. The former is the negative of log likelihood and the latter is exponential. These can be implemented very easily with minute changes in the evaluation metric formula.
With this I conclude the bi-gram models. I hope you enjoyed the article. Stay tuned for more!!
Link to the code: