课程: Complete Guide to NLP with R
What is tm and why do you need it?
- [Instructor] R is an excellent programming language for statistics and matrix manipulation. Given a table or matrix where data is organized, R can return a wealth of insights and visualizations. However, data is rarely stored in clean tables with well-ordered rows and columns. Data is messy and requires cleaning, often referred to as data wrangling. Human language is a prime example of messy data. Concepts aren't tagged, context is fluid, there are no standardized rules, and absolutely no reliable indicators to help a computer understand what is being said and how to separate the information from the presentation. This is where natural language processing comes in. Natural language processing is a collection of tools and techniques to convert human language into a format useful to computers. If you wanted to, you could do this by hand but it would be painful. Instead, it's easier to use a framework. In this course, we'll use a package called Tm, short for Text Mining. Tm is a standard R package. To use it, you'll need to install it in your copy of R. Like all R packages, once you've installed Tm you don't need to install it again. When you want to use features of this package, you can link it with the library command. Tm requires an additional package called NLP, a package that provides infrastructure for natural language processing. You may notice this happening when you install Tm but you don't have to learn anymore about that particular package. When TM is loaded and linked, you can list the contents. This confirms you have successfully loaded the package. For documentation on the Tm package, use the help command with package=tm. As you'll see, Tm has quite a few functions. In the next few chapters, we'll break Tm down into bite size chunks.
随堂练习,边学边练
下载课堂讲义。学练结合,紧跟进度,轻松巩固知识。