课程: Hands-On Natural Language Processing
Introduction to topic modeling
- Topic modeling is one of the most important and useful techniques used when analyzing large text data sets. It is an unsupervised machine learning technique that automatically finds patterns. And in first topics within text based data, the findings are usually context aware. In other words, the categories of text elements or documents into classes is based not only on text similarity but also on semantic similarity. For example, documents containing words like health, doctor, patient, hospital, will be categorized under a topic called healthcare. And for a topic called famine, you might see words like farm, crops, corn, and wheat. The assumption of topic modeling is that every document comprises a statistical distribution of topics that can be obtained by combining all the distributions for all the topics covered. In other words, the algorithms try to figure out which topics are present in the data set and how strong that presence is. In this course, we are going to focus on two main topic modeling algorithms: LSA for latent semantic analysis and LDA for latent dirichlet allocation. LSA is one of the foundational techniques used for component analysis applied to text data. It is a well-known linear algebra method called singular value decomposition, or SVD. On the other hand, LDA uses a probabilistic method using dirichlet powers, making it less prone to over-fitting. In general, the best practice is to try several different algorithms and see which one works best with your data set. Topic modeling is used in several application, such as gathering insights to improve your product from customer reviews and customer support emails, who cluster millions and millions of news articles and published papers to find common patterns. Topic modeling is used in recommendation engine or information retriever systems to find similar products, topics, or articles related to what you're looking for, making for a better user experience. Another application of topic modeling is for text annotation. One of the most tedious tasks one can do manually, it automatically categorizes your data set into classes, making it easier to collect insight and analyze your information. These are just a few examples of what this powerful technique of topic modeling can do for you and your large text-based data sets.
随堂练习,边学边练
下载课堂讲义。学练结合,紧跟进度,轻松巩固知识。