课程: Hands-On Natural Language Processing

今天就学习课程吧!

今天就开通帐号,24,700 门业界名师课程任您挑!

Data preprocessing for topic modeling

Data preprocessing for topic modeling

- Now, let's see how all these work in practice. We will be using the 20 news script data set, from the open source UCI Machine Learning Repository, in this notebook. And to get a clean version of this data, we are going to use Psyche plan to input. As you can see here in the first cell we start by importing the required Python libraries escalan dot data sets, pay, print, and pandas. We then save the data from Fetch 20 news group into a variable called "Data Sets" on the function call. We specified that the subset of the data retrieved should be whole. We set the random state to 32 in order to ensure consistency. And finally, we specify that we want to remove headers, photos, and quotes, that are present in the original dataset. Once we get the data, we can print out the keys using dataset dot keys function. The results is a dictionary key array with data, file names, target names, and description. Now let's get the…

内容