课程: Complete Guide to NLP with R

今天就学习课程吧!

今天就开通帐号,24,700 门业界名师课程任您挑!

Stemming and lemmatization

Stemming and lemmatization

- [Instructor] Stemming and lemmification are two related concepts. They're both designed to aggregate words to improve statistics on token counts. Let's take a look at how to do both processes and talk a bit about the concerns you might have with stemming and lemmification. I've set up some sample code and in line five, six and seven I bring in the tidy verse, tidy text, and read text libraries. Let's start with stemming with something called snowball C. Snowball C is a standard stemming library. Let's run the code in line 12 and then take a look at the results of using word stem which is part of the snowball C package. I'm going to click on stemmed and you'll see a table with three columns, doc id, the original word and the stemmed word. Look down at line 13, the original word was restrictions. Snowball C has converted that to restrict. If we look through the document anywhere you saw restrictions or restrict or…

内容