课程: Complete Guide to NLP with R
今天就学习课程吧!
今天就开通帐号,24,700 门业界名师课程任您挑!
Tokenizing with unnest_tokens( )
- [Instructor] tokenizing is at the heart of natural language processing and tidytext provides some useful tools to do that. In this session, let's look at how tokenizing is accomplished with tidytext. Right up front, tidytext uses something called unnest tokens to accomplish this task, here is some code to accomplish this. In order to run this code, of course you'll need to make sure that your current working directory points to the exercise files where it can find wonderful wizard of oz dot text. In line five, six, and seven I've shown how to use libraries to bring in a tidyverse, tidytext, and readtext. Now I'll need the text of wonderful wizard of oz dot text, and in line nine I use read text to bring that in and place it into a tibble. Let's take a quick look at that tibble just so we know what we're talking about. You can see that there are two fields and one record. The first field is doc id, which in this case…
随堂练习,边学边练
下载课堂讲义。学练结合,紧跟进度,轻松巩固知识。
内容
-
-
-
-
-
-
-
-
-
-
-
-
-
(已锁定)
How to think like tidytext1 分钟 59 秒
-
(已锁定)
An example: Calculate the most popular terms in a document3 分钟 10 秒
-
(已锁定)
Tokenizing with unnest_tokens( )8 分钟 19 秒
-
(已锁定)
Stopwords, punctuation, whitespace, and numbers6 分钟 30 秒
-
(已锁定)
Stemming and lemmatization5 分钟 35 秒
-
(已锁定)
Term frequency with bind_tf_idf( )5 分钟 54 秒
-
(已锁定)
Sentiment analysis with sentiments( )4 分钟 44 秒
-
(已锁定)
Parts of speech with parts_of_speech( )4 分钟 32 秒
-
(已锁定)
Import and export from other NLP packages2 分钟 30 秒
-
(已锁定)
-
-
-
-
-
-
-
-
-