课程: Introduction to NLP Using R

今天就学习课程吧!

今天就开通帐号,24,100 门业界名师课程任您挑!

Understanding the document-term matrix

Understanding the document-term matrix - R教程

课程: Introduction to NLP Using R

Understanding the document-term matrix

- [Instructor] You won't spend much time with natural language processing before you run into the concept of a document term matrix or a term document matrix or a document feature matrix. Let's take a look at the basic concept of a DTM. The text mining package provides us with tools to create a DTM, so in line two, I bring it in to my current R session. Then in line five, I bring in a poet corpus and this is the poet corpus we've been using in the past. Remember that your current working directory needs to point to a directory that contains poet corpus. Now I'm all ready to build a document term matrix and I've set this up in lines eight through 12. Lines 9, 10, 11, and 12 should look very familiar by now. Stop words, remove punctuation, remove numbers and stemming are all concepts we've talked about. To create a document term matrix, I use the code starting with line eight, and the command is document term matrix. When that…

内容