课程: Complete Guide to NLP with R

今天就学习课程吧!

今天就开通帐号,24,700 门业界名师课程任您挑!

Tokenizing with unnest_tokens( )

Tokenizing with unnest_tokens( ) - R教程

课程: Complete Guide to NLP with R

Tokenizing with unnest_tokens( )

- [Instructor] tokenizing is at the heart of natural language processing and tidytext provides some useful tools to do that. In this session, let's look at how tokenizing is accomplished with tidytext. Right up front, tidytext uses something called unnest tokens to accomplish this task, here is some code to accomplish this. In order to run this code, of course you'll need to make sure that your current working directory points to the exercise files where it can find wonderful wizard of oz dot text. In line five, six, and seven I've shown how to use libraries to bring in a tidyverse, tidytext, and readtext. Now I'll need the text of wonderful wizard of oz dot text, and in line nine I use read text to bring that in and place it into a tibble. Let's take a quick look at that tibble just so we know what we're talking about. You can see that there are two fields and one record. The first field is doc id, which in this case…

内容