课程: Complete Guide to NLP with R

今天就学习课程吧!

今天就开通帐号,24,700 门业界名师课程任您挑!

Group tokens

Group tokens

When you have a tokens object, there are a couple of things that you can do with those tokens. We've just covered keep and remove. Now we're going to talk about grouping them in different ways. We're looking at code that illustrates this concept. In line three, I'm going to bring in the quanteda package and then I'm going to create a tokens object from our corpus titled data corpus inaugural. And you'll notice that I'm removing numbers, punctuations and symbols. Let's take a quick look at scTokens and I can do that just by typing it into the console here. And you'll see that I have a series of documents with tokens in them. Again, a bag of words. You'll also notice I haven't removed the stopwords, things like of and the. Well, I can replace certain phrases with individual things called compounds. We'll talk about ngrams in a second. But let's look at line 12 where I say tokens compound. And what I'm saying is take the…

内容