Use of N-grams for Content Optemization

Use of N-grams for Content Optemization

Analyzing N-grams can be done both manually and using tools. However, using tools is generally more efficient and provides more accurate insights.?

Here is how we can conduct an N-gram analysis to determine which content is more likely to rank on the SERPs.

For instance, when I searched "What is n-gram analysis?" on Google:

Content (a), positioned first in the featured snippet, defines:

"An n-gram is a collection of n successive items in a text document, which may consist of words, numbers, symbols, and punctuation. N-gram models are vital in many text analytics applications where word sequences matter, like sentiment analysis, text classification, and text generation."

Here is the URL of the content “A”: https://www.mathworks.com/discovery/ngram.html#:~:text=An%20n%2Dgram%20is%20a,text%20classification%2C%20and%20text%20generation.

While content (b), ranked 21st on Google SERP, elucidates:

"N-grams are all combinations of adjacent words or letters of length n found in your source text. For instance, given the word 'fox', all 2-grams (or “bigrams”) are 'fo' and 'ox'. You can also count the word boundary, expanding the list of 2-grams to include #f, fo, ox, and x#, with # indicating a word boundary."

Here is the URL of the content B: https://stackoverflow.com/questions/18193253/what-exactly-is-an-n-gram

So, why did Google rank content (a) higher than (b)?

Let's analyze the two paragraphs using unigrams (single words) to determine which one is more relevant to the query "What is N-gram analysis?"

Unigram Analysis

Paragraph A Unigrams:

  1. An
  2. n-gram
  3. is
  4. a
  5. collection
  6. of
  7. n
  8. successive
  9. items
  10. in
  11. text
  12. document
  13. that
  14. may
  15. include
  16. words
  17. numbers
  18. symbols
  19. and
  20. punctuation
  21. N-gram
  22. models
  23. are
  24. useful
  25. in
  26. many
  27. text
  28. analytics
  29. applications
  30. where
  31. sequences
  32. of
  33. words
  34. are
  35. relevant
  36. such
  37. as
  38. in
  39. sentiment
  40. analysis
  41. text
  42. classification
  43. and
  44. text
  45. generation

Paragraph B Unigrams:

  1. N-grams
  2. are
  3. simply
  4. all
  5. combinations
  6. of
  7. adjacent
  8. words
  9. or
  10. letters
  11. of
  12. length
  13. n
  14. that
  15. you
  16. can
  17. find
  18. in
  19. your
  20. source
  21. text
  22. For
  23. example
  24. given
  25. the
  26. word
  27. fox
  28. all
  29. 2-grams
  30. or
  31. “bigrams”
  32. are
  33. fo
  34. and
  35. ox
  36. You
  37. may
  38. also
  39. count
  40. the
  41. word
  42. boundary
  43. that
  44. would
  45. expand
  46. the
  47. list
  48. of
  49. 2-grams
  50. to
  51. #f
  52. fo
  53. ox
  54. and
  55. x#
  56. where
  57. denotes
  58. a
  59. word
  60. boundary

Analysis:

Paragraph A consists of 45 unigram components, while Paragraph B contains 60. Notably, the term "n-gram" appears twice in Paragraph A, but only once in Paragraph B. Additionally, the word "analysis" is mentioned once in Paragraph A, whereas it is absent in Paragraph B. Based on this unigram analysis, Paragraph A seems to be more suitable for ranking on SERPs for the keyword "What is N-gram analysis?". However, we will conduct further N-gram analysis to clarify the results.

Now, let's analyze the two paragraphs using bigrams (two adjacent words) to determine which one is more relevant to the query "What is N-gram analysis?"

Bigram Analysis

Paragraph A Bigrams:

  1. An n-gram
  2. n-gram is
  3. is a
  4. a collection
  5. collection of
  6. of n
  7. n successive
  8. successive items
  9. items in
  10. in a
  11. a text
  12. text document
  13. document that
  14. that may
  15. may include
  16. include words
  17. words, numbers
  18. numbers, symbols
  19. symbols, and
  20. and punctuation
  21. N-gram models
  22. models are
  23. are useful
  24. useful in
  25. in many
  26. many text
  27. text analytics
  28. analytics applications
  29. applications where
  30. where sequences
  31. sequences of
  32. of words
  33. words are
  34. are relevant
  35. relevant, such
  36. such as
  37. as in
  38. in sentiment
  39. sentiment analysis
  40. analysis, text
  41. text classification
  42. classification, and
  43. and text
  44. text generation

Paragraph B Bigrams:

  1. N-grams are
  2. are simply
  3. simply all
  4. all combinations
  5. combinations of
  6. of adjacent
  7. adjacent words
  8. words or
  9. or letters
  10. letters of
  11. of length
  12. length n
  13. n that
  14. that you
  15. you can
  16. can find
  17. find in
  18. in your
  19. your source
  20. source text
  21. For example
  22. example, given
  23. given the
  24. the word
  25. word fox
  26. fox, all
  27. all 2-grams
  28. 2-grams (or
  29. (or “bigrams”)
  30. “bigrams” are
  31. are fo
  32. fo and
  33. and ox
  34. You may
  35. may also
  36. also count
  37. count the
  38. the word
  39. word boundary
  40. boundary –
  41. – that
  42. that would
  43. would expand
  44. expand the
  45. the list
  46. list of
  47. of 2-grams
  48. 2-grams to
  49. to #f
  50. #f, fo
  51. fo, ox
  52. ox, and
  53. and x#
  54. x#, where
  55. where #
  56. denotes
  57. denotes a
  58. a word
  59. word boundary

Analysis:

  1. Upon conducting a unigram analysis of both paragraphs, it's observed that Paragraph A contains 44 bigram components, while Paragraph B has 59.
  2. For the search query "What is N-gram analysis?", the primary terms of focus are "n-gram" and "analysis". In Paragraph A, the term "n-gram" appears in phrases such as "An n-gram", "n-gram is", and "N-gram models", totaling three mentions. In contrast, Paragraph B mentions this term only once, as seen in "N-grams are".
  3. When we consider the term "analysis", Paragraph A mentions it twice - in "sentiment analysis" and "analysis, text". However, Paragraph B does not include this term at all.
  4. Despite Paragraph B having a higher word count than Paragraph A, the latter is richer in user intent. This suggests that Paragraph A is more deserving of a higher ranking on SERPs.

Now, let's break down the two paragraphs using trigrams (three adjacent words) to determine which one better matches the user's query "What is N-gram analysis?"

Trigram Analysis

Paragraph A Trigrams:

  1. An n-gram is
  2. n-gram is a
  3. is a collection
  4. a collection of
  5. collection of n
  6. of n successive
  7. n successive items
  8. successive items in
  9. items in a
  10. in a text
  11. a text document
  12. text document that
  13. document that may
  14. that may include
  15. may include words
  16. include words, numbers
  17. words, numbers, symbols
  18. numbers, symbols, and
  19. symbols, and punctuation
  20. N-gram models are
  21. models are useful
  22. are useful in
  23. useful in many
  24. in many text
  25. many text analytics
  26. text analytics applications
  27. analytics applications where
  28. applications where sequences
  29. where sequences of
  30. sequences of words
  31. of words are
  32. words are relevant
  33. are relevant, such
  34. relevant, such as
  35. such as in
  36. as in sentiment
  37. in sentiment analysis
  38. sentiment analysis, text
  39. analysis, text classification
  40. text classification, and
  41. classification, and text
  42. and text generation

Paragraph B Trigrams:

  1. N-grams are simply
  2. are simply all
  3. simply all combinations
  4. all combinations of
  5. combinations of adjacent
  6. of adjacent words
  7. adjacent words or
  8. words or letters
  9. or letters of
  10. letters of length
  11. of length n
  12. length n that
  13. n that you
  14. that you can
  15. you can find
  16. can find in
  17. find in your
  18. in your source
  19. your source text
  20. For example, given
  21. example, given the
  22. given the word
  23. the word fox
  24. word fox, all
  25. fox, all 2-grams
  26. all 2-grams (or
  27. 2-grams (or “bigrams”)
  28. (or “bigrams”) are
  29. “bigrams”) are fo
  30. are fo and
  31. fo and ox
  32. You may also
  33. may also count
  34. also count the
  35. count the word
  36. the word boundary
  37. word boundary –
  38. boundary – that
  39. – that would
  40. that would expand
  41. would expand the
  42. expand the list
  43. the list of
  44. list of 2-grams
  45. of 2-grams to
  46. 2-grams to #f
  47. to #f, fo
  48. #f, fo, ox
  49. fo, ox, and
  50. ox, and x#
  51. and x#, where
  52. x#, where #
  53. where # denotes
  54. denotes a
  55. denotes a word
  56. a word boundary

Analysis:

Upon analyzing both Paragraphs A and B, we begin by examining our primary keyword: "What is N-gram analysis?".

In Paragraph A, the term most semantically and lexically related to our keyword is "N-gram". This term appears three times in the following contexts:

  1. "An n-gram is"
  2. "n-gram is a"
  3. "N-gram models are"

In comparison, Paragraph B mentions "N-gram" only once, as seen in "N-grams are simply".

Another significant term from our keyword is "analysis". In Paragraph A, it's mentioned three times:

  1. "in sentiment analysis"
  2. "sentiment analysis, text"
  3. "analysis, text classification"

However, Paragraph B does not include this term at all. This evidence suggests that Paragraph A is more aligned with the keyword and is, therefore, more deserving of a higher ranking compared to Paragraph B.

Let's break down the two paragraphs using 4-grams (four adjacent words) to determine which one is more relevant to the query "What is N-gram analysis?"

4-grams Analysis

Paragraph A 4-grams:

  1. An n-gram is a
  2. n-gram is a collection
  3. is a collection of
  4. a collection of n
  5. collection of n successive
  6. of n successive items
  7. n successive items in
  8. successive items in a
  9. items in a text
  10. in a text document
  11. a text document that
  12. text document that may
  13. document that may include
  14. that may include words
  15. may include words, numbers
  16. include words, numbers, symbols
  17. words, numbers, symbols, and
  18. numbers, symbols, and punctuation
  19. N-gram models are useful
  20. models are useful in
  21. are useful in many
  22. useful in many text
  23. in many text analytics
  24. many text analytics applications
  25. text analytics applications where
  26. analytics applications where sequences
  27. applications where sequences of
  28. where sequences of words
  29. sequences of words are
  30. of words are relevant
  31. words are relevant, such
  32. are relevant, such as
  33. relevant, such as in
  34. such as in sentiment
  35. as in sentiment analysis
  36. in sentiment analysis, text
  37. sentiment analysis, text classification
  38. analysis, text classification, and
  39. text classification, and text
  40. classification, and text generation

Paragraph B 4-grams:

  1. N-grams are simply all
  2. are simply all combinations
  3. simply all combinations of
  4. all combinations of adjacent
  5. combinations of adjacent words
  6. of adjacent words or
  7. adjacent words or letters
  8. words or letters of
  9. or letters of length
  10. letters of length n
  11. of length n that
  12. length n that you
  13. n that you can
  14. that you can find
  15. you can find in
  16. can find in your
  17. find in your source
  18. in your source text
  19. For example, given the
  20. example, given the word
  21. given the word fox
  22. the word fox, all
  23. word fox, all 2-grams
  24. fox, all 2-grams (or
  25. all 2-grams (or “bigrams”)
  26. 2-grams (or “bigrams”) are
  27. (or “bigrams”) are fo
  28. “bigrams”) are fo and
  29. are fo and ox
  30. fo and ox. You
  31. and ox. You may
  32. ox. You may also
  33. You may also count
  34. may also count the
  35. also count the word
  36. count the word boundary
  37. the word boundary –
  38. word boundary – that
  39. boundary – that would
  40. – that would expand
  41. that would expand the
  42. would expand the list
  43. expand the list of
  44. the list of 2-grams
  45. list of 2-grams to
  46. of 2-grams to #f
  47. 2-grams to #f, fo
  48. to #f, fo, ox
  49. #f, fo, ox, and
  50. fo, ox, and x#
  51. ox, and x#, where
  52. and x#, where #
  53. x#, where # denotes
  54. where # denotes a
  55. denotes a word boundary

Analysis:

Similar to our previous analysis, Paragraph A contains more relevant 4-grams compared to Paragraph B. This suggests that Paragraph A is more deserving of a higher ranking on SERPs.

Further N-gram Analysis:?

When assessing both Paragraphs A and B concerning the keywords "N-gram" and "What is N-gram?", it's insightful to review their respective search rankings. Searching for "N-gram" on Google, Paragraph A, represented by mathworks.com, appears prominently at the top. In stark contrast, Paragraph B doesn't even make it to the first 60 results.

More intriguingly, when we input the keyword “What is N-gram?” into the Google search bar, mathworks.com (representing Paragraph A) secures the second spot, while stackoverflow.com lags behind at the 26th position. It's worth noting that even though stackoverflow.com embeds the exact phrase “What is N-gram?” within its page URL (https://stackoverflow.com/questions/18193253/what-exactly-is-an-n-gram), it still doesn't outperform mathworks.com.

Final Thoughts

So, why can't a website rank higher even with the targeted keyword in its URL? The answer might be simpler than you think: the quality of the content matters immensely.

Mahnoor M.Akram

Software Engineer |Web developer |Graphic Designer |Brand Manager |SEO |SEO Content Writer

1 年

but why such a complicated term "n-gram" is used for just finding out how many times a word is repeated in a content? only because of this term, it seems like I'm not getting what exactly your post is describing.

回复
Mahnoor M.Akram

Software Engineer |Web developer |Graphic Designer |Brand Manager |SEO |SEO Content Writer

1 年

it seems that n-gram is to find how many times a specific word appears in a content. but how does it help you in creating a topical map?

回复
Anupam Pathak

SEO Manager at Content Ladder #Seo #searchengineoptimization #onpageseo #technicalseo #seocontentwriter

1 年

Helpful!

要查看或添加评论,请登录

社区洞察

其他会员也浏览了