课程: Complete Guide to NLP with R

今天就学习课程吧!

今天就开通帐号,24,700 门业界名师课程任您挑!

Stop words

Stop words

- [Instructor] In natural language processing, when we run statistics on documents, it's important to run those statistics on significant words in the document. Stop words is a term or a phrase used to describe words that may not be all that useful, things like And and The and Too, words that are nice for humans, but when it comes down to analyzing the actual meaning of a document may not significantly add to our information and in fact may cloud our results. So it's important to be able to remove those stop words and TM provides the stop words dictionary to do that. Let's take a look at how it works. In line three, I, of course, bring in the TM package using the library command. Then in line six, I define a vector called myText and myText is simply a vector full of words. I'm going to open this up a bit. I'm going to clear the console. And now in line 13, I'm going to use removeWords which is a transformation provided…

内容