课程: Complete Guide to NLP with R
今天就学习课程吧!
今天就开通帐号,24,700 门业界名师课程任您挑!
Term frequency with bind_tf_idf( )
- [Instructor] Natural language processing uses a sophisticated statistic called term frequency and inverse document frequency. The math behind this particular statistic is deep and I'm not going to go into it right now, instead of what I'm going to do is spend time showing you how to compute this particular statistic. I've set up code and in line five, six, and seven I bring in the necessary libraries, tidyfirst, tidytext and readtext. The first thing I want to show you is a standard term frequency, which is what we've been using all along. Term frequency is simply the number of times a word appears in this document. Now, this is a bit different than the term frequency used in TFIDF. In TFIDF, term frequency is compared to the number of documents. One of the other things that's important about TFIDF is you'll need to know the number of words in a document. In line 17 and 18, I've provided you with this code, but we…
随堂练习,边学边练
下载课堂讲义。学练结合,紧跟进度,轻松巩固知识。
内容
-
-
-
-
-
-
-
-
-
-
-
-
-
(已锁定)
How to think like tidytext1 分钟 59 秒
-
(已锁定)
An example: Calculate the most popular terms in a document3 分钟 10 秒
-
(已锁定)
Tokenizing with unnest_tokens( )8 分钟 19 秒
-
(已锁定)
Stopwords, punctuation, whitespace, and numbers6 分钟 30 秒
-
(已锁定)
Stemming and lemmatization5 分钟 35 秒
-
(已锁定)
Term frequency with bind_tf_idf( )5 分钟 54 秒
-
(已锁定)
Sentiment analysis with sentiments( )4 分钟 44 秒
-
(已锁定)
Parts of speech with parts_of_speech( )4 分钟 32 秒
-
(已锁定)
Import and export from other NLP packages2 分钟 30 秒
-
(已锁定)
-
-
-
-
-
-
-
-
-