课程: Complete Guide to NLP with R

今天就学习课程吧!

今天就开通帐号,24,700 门业界名师课程任您挑!

N-grams

N-grams

- [Instructor] N-grams are a special type of token. They're actually combinations of tokens. You might consider them to be phrases. The question is, how do you break tokens up into phrases? Let's take a look at how to use an N-gram command. In line three, I bring in the text mining package, and then in line six, I define some text with some sample text, very simple this time. In line seven, I use the boost tokenizer to break some text into individual tokens, and we can take a look at that just by typing in ngram_tokens. This shows that the original sumtext is now a vector of individual words. In line eight, I use the N-grams command supplied by the Tm package against ngram_tokens and I've given N-grams a number of three. Three is the number of tokens I want combined into each N-gram. I'll run line eight and then we'll use line nine to show the contents of what just resulted. Start with the first item, which was "Brillig…

内容