课程: R for Data Science: Lunch Break Lessons

免费学习该课程!

今天就开通帐号,24,100 门业界名师课程任您挑!

Create and clean a natural language corpus

Create and clean a natural language corpus

- [Instructor] In the upcoming series of our weekly sessions I'd like to talk a bit about natural language processing. And natural language processing is of course using statistical analysis against like text documents or poetry or fiction or nonfiction to generate insights into how those documents are created or used. Now to do this I'll need a collection of documents and I've used the Gutenberg Project. To retrieve a set of documents authored by Rabindranath Tagore. In lines 10 through 21, I've created a directory called the works of Rabindranath and in it I've downloaded those documents from Project Gutenberg into this folder. So you'll wind up with a directory that looks like this and it contains all the Project Gutenberg documents that we can get ahold of. Once we've got those documents, we need to create a Corpus and a Corpus is actually a fancy word for a collection of documents. I'm going to use the TM package.…

内容