Quantico: Unsupervised Learning
Quantico offers a wide range of operations that assist in the data science & analytics process. One of those major components is unsupervised learning and that's what I'll be covering in this article. I classify unsupervised learning as separate from data wrangling and feature engineering although they can be used for both.
I wanted to cover the basics of the Shiny App, that is, overview of plotting, data wrangling, feature engineering, and now unsupervised learning. Truth be told, the most exciting parts are yet to come. The machine learning and forecasting articles are going to be coming soon, so hang in there with me while I go over the basics. The ML and Forecasting are enhanced versions of what's available in my AutoQuant package, but the methods discussed here and previously play a pivotal role in assisting the performance of those methods.
Unsupervised Learning Methods:
NLP Functions
The suite of NLP functions include one word2vec method along with a handful of statistical methods. There is definitely room to beef up this suite so suggestions are welcome! Each of the the methods are intended to automatically do the magic behind the scenes and make those variables available to you without any data engineering efforts.
A list of the methods include:
Word2Vec: by h2o
This word2vec function will convert any number of text columns to vectors that are useful from a modeling perspective. You can run one column at a time or all of them at once. There are numerous parameters to configure to your liking as well.
Text Summary
Text summary info is useful for a variety of reasons and although it's probably not formally an unsupervised learning method, it is grouped with them for ease of understanding. There are a variety of outputs that come with this function and you can select which ones you'd like to exclude!
领英推荐
Sentiment
Sentiment comes in one of two flavors: positive or negative along with positive, neutral, or negative. The user selects their preference.
Readability
There are so many possible readability measures to utilize. For modeling purposes you should iterate through all them, which Quantico makes easy for you. If you have a specific one in mind then select the one you want.
Lexical Diversity
There are many possibilities here too. Same as above. Choose the one's you don't want and they won't be added to your dataset.
Anomaly Detection
Anomaly detection currently rests on the isolation forest functionality. There are many others that can be utilized but this is what's available for now. Feel free to request others!
Dimensionality Reduction
Dimensionality reduction is currently done via deep learning autoencoders so as to account for non-linearities in your data. The user can select the layer to return and the number of variables to return as well. This is another areas where more methods are welcome. Please reach out with requests!