课程: Complete Guide to NLP with R
今天就学习课程吧!
今天就开通帐号,24,700 门业界名师课程任您挑!
Stemming and lemmatization
- [Instructor] Stemming and lemmification are two related concepts. They're both designed to aggregate words to improve statistics on token counts. Let's take a look at how to do both processes and talk a bit about the concerns you might have with stemming and lemmification. I've set up some sample code and in line five, six and seven I bring in the tidy verse, tidy text, and read text libraries. Let's start with stemming with something called snowball C. Snowball C is a standard stemming library. Let's run the code in line 12 and then take a look at the results of using word stem which is part of the snowball C package. I'm going to click on stemmed and you'll see a table with three columns, doc id, the original word and the stemmed word. Look down at line 13, the original word was restrictions. Snowball C has converted that to restrict. If we look through the document anywhere you saw restrictions or restrict or…
随堂练习,边学边练
下载课堂讲义。学练结合,紧跟进度,轻松巩固知识。
内容
-
-
-
-
-
-
-
-
-
-
-
-
-
(已锁定)
How to think like tidytext1 分钟 59 秒
-
(已锁定)
An example: Calculate the most popular terms in a document3 分钟 10 秒
-
(已锁定)
Tokenizing with unnest_tokens( )8 分钟 19 秒
-
(已锁定)
Stopwords, punctuation, whitespace, and numbers6 分钟 30 秒
-
(已锁定)
Stemming and lemmatization5 分钟 35 秒
-
(已锁定)
Term frequency with bind_tf_idf( )5 分钟 54 秒
-
(已锁定)
Sentiment analysis with sentiments( )4 分钟 44 秒
-
(已锁定)
Parts of speech with parts_of_speech( )4 分钟 32 秒
-
(已锁定)
Import and export from other NLP packages2 分钟 30 秒
-
(已锁定)
-
-
-
-
-
-
-
-
-