Analysis Paralysis: Which Machine Learning Model to Use!?
It is not uncommon for people new to the field of Machine learning to be intimidated by hundreds of statistical/machine learning models and neural network architectures available at their fingertips. We often wrack our minds and try out a bunch of algorithms to figure out the best possible solution. My professor often used to state that the best way is to try out all possible models until you figure out the best one. I lived by his words for the past two years trying out different kinds of solutions for any dataset I faced. During my journey, I realized based on the problem statement and data some models performed better than others. And slowly I developed an intuitive understanding of how different models fit themselves to the data. So here I am sharing my simple and short understanding of "Which machine learning model to use?". However here is a disclaimer before I get started: This guide is based on my observations, experience, and research there will and always be different and better approaches. As a famous saying in statistics goes " All models are wrong, but some are useful". Keeping the above statement in mind hope you find this guide useful.
I categorize machine learning models into four different subgroups and I will be addressing the models in each category and discuss the application for which they are best suited.
Group-1 Linear statistical models:
Models that fall in this category:
- Generalized Linear Models: These models are based on probability distributions and the often-used logistic and linear regression belong to this category. The ability of these models to accurately model uncertainty of the predictions is what sets them apart from other statistical models. The most widely used distributions are Gaussian and Binomial but GLM's can be based on any distribution belonging to the exponential family of distributions. If you are creating a model to predict the number of defects go ahead and use a Poisson distribution instead of a Gaussian. Or if you are dealing with a multi-class classification problem use a multinomial distribution before jumping to some tree-based model or SVM.
- Linear Discriminant Analysis: This is one of the first algorithms I learned for multiclass classification. The ease of use and excellent interpretability makes the algorithm stand out. The algorithm is based on the gaussian distribution and it also gives a linear separation boundary.
- Naive Bayes Algorithm: A probabilistic model for binary classification. Excellent at handling large amounts of categorical data. Multinomial Naive Bayes algorithm is also used for multi-class classification. A lesser-known fact is that the decision boundary of the algorithm is linear and it performs similarly to Logistic regression.
Advantages: Works well with even less data, high interpretability, high speed, predicts uncertainty, resistant to overfitting
Disadvantages: Strong assumption of linearity, low flexibility, high bias, requires data preparation
Application: Macro-economics, scientific research, market research
When to use these models :
- Use these models when you have fewer data points to analyze and you want to interpret your results eg. use it in your research paper or to analyze survey forms
- Use these models if you are in the manufacturing/engineering domain. eg predict machine breakdowns, predict defects, identify bottlenecks. It can also be used to analyze a lot of sensor/IoT data but as the number of data points increases a neural network solution would perform better.
- You can also use such models on board level aggregated data eg optimizing your marketing spend, predicting revenues from customer segments, forecasting sales
- The ability to come up with a probabilistic estimate is the crux of analytics. Analytics without uncertainty is just mathematics. Use these models to predict uncertainty eg get a confidence interval on your ROI forecast and mitigate your risk, predict your sales uncertainty to maintain optimum inventory levels, figure out the uncertainty in your customer lifetime value.
- Please don't use these models to predict or analyze data at an individual level (micro-economics) because humans are complex beings and you will mostly face non-linear data and multimodal distributions eg HR attrition, Customer retention, credit propensity. In all these applications there is a huge chance that models like logistic regression or discriminant analysis won't perform well. In such scenarios, tree-based models are your best friend
- The above point is however not applicable in the medical field as individual data is mostly linear for eg. increase in obesity leads to heart disease is a linear relationship and such patterns are common. These models are based mainly on Gaussian distributions and it is famously known that all things in nature are gaussian and on the other hand whenever money is involved we can have all sorts of funny distributions. So keep in mind the distributions when dealing with linear statistical models
Group-2 Non-linear statistical models:
- Support Vector Machines: One of the most beautiful models that exist in the field of machine learning and statistics. 10 years ago these models were the only statistical models that gave neck to neck competition to neural networks. The ability of the models to handle non-linearity is what makes them stand apart from traditional statistical models. The basic concept of the model existed since the 1960s in the form of optimal margin classifiers but they were put into practice only in the '90s when two statisticians discovered something known as the kernel trick. The kernel trick replaces the dot product of the input vectors in the dual optimization problem with a kernel function. This allows SVM's to find a decision boundary in an infinite-dimensional space with a limited amount of computation. The ability to choose the right kernel or create your kernel for different scenarios is an important skill to understand before you use SVM's.
Advantages: Ability to handle larger data sets, handles non-linearity, Not dependant on strong assumptions, high speed while making predictions
Disadvantages: longer training times, difficulty in implementing multiclass classification, low interpretability, high learning curve (difficult to master and understand)
When to use SVM?
SVM's were once considered to be better than neural networks but as on today's date, neural networks have replaced SVM's in many applications this includes image classification, facial recognition, text classification, and speech synthesis. Having said that there are still some applications like protein sequence detection, fingerprint classification and many geology applications were SVM's outshine neural networks. SVM does an outstanding job to detect and generalize the patterns in the data. Use it in datasets in which the data has a non-linear relationship but follows a specific pattern. In the field of medicine and geology, these patterns are prominent and distinguishable, and hence SVM's perform well.
Please don't use SVM's on data that involves human behavior. Though human exhibit patterns but these patterns are generally too random and don't have a functional structure to it.
There is another widely know non-linear statistical model
- Polynomial Regression: Please don't use this model unless you are completely sure of the existence of a polynomial relation. It can easily be overfitted and can provide unrealistic results on the test set. However, it is one of the best models to explain the concept of overfitting and bias-variance tradeoff.
Group-3 Tree-based models:
Models that fall in this category:
- Decision Trees: Highly interpretable useless models. Please use decision trees with caution. The big tradeoff of their high interpretability is high variance and errors when it comes to predictions. It is not uncommon to get an entirely different tree by changing one data point in the entire dataset.
- Random Forest: This model overcomes the shortcomings of the decision tree by averaging the results of hundreds and thousands of decision trees. It's fast and simple to implement with fewer hyperparameters than gradient boosted trees. It is resistant to overfitting and hence provides a good baseline performance while implementing tree-based models.
- Gradient Boosted Trees: One of the algorithms that revolutionized the field of machine learning as a go-to solution for many machine learning problems. The algorithm makes use of hundreds of trees like the random forest but fits them to the errors, systematically reducing the error with each subsequent tree. The way the model learns is a form of controlled overfitting and the error on the train set decrease with each iteration. This property of the algorithm makes it flexible enough to learn any decision boundary. Gradient boosted trees in 99% of the cases will perform better than a random forest with the right set of hyperparameters.
Advantages: Ability to handle larger data sets, ability to handle tabular data, handles non-linearity, Not dependant on strong assumptions, works amazingly well with tabular data
Disadvantages: longer training times, requires a lot of hyperparameter tuning, inability to handle any form of data other than tabular form, requires large datasets
When to use Tree-based models?
- The ability of the model to work on large datasets and handle non-linear relationships in the data make it an ideal model for analyzing consumer behavior
- Use it to predict HR attrition, customer retention, and any other task in which the data is associated with human behavior. Yahoo once used these models to rank its search results.
- Tree-based models are non-parametric models that memorize the data that they are trained on rather than learning a functional relationship. This property of the model makes it similar to many clustering and segmentation algorithms. Human behavior is similar and the best way to make a prediction is by comparing similar behavioral patterns. For example in fraud detection, if you want to predict if a person can commit a fraud you will look at similar cases that have occurred before.
- Use it in cases where you are more concerned with high prediction accuracy rather than model interpretability.
Group 4: Neural Networks
Neural network architectures have revolutionized the field of machine learning. Neural networks are functional models with a large number of parameters. This gives them the ability to handle any kind of non-linearities in the data. It is important to note that the theoretical possibilities of a neural network are endless but their training is limited by the availability of data or compute. In simple terms, you may often face the problem in which the logic of your model will be perfect but its performance will fall short or you are unable to train the model.
Advantages: Ability to handle any form of data, Ability to handle any task, supports transfer learning
Disadvantages: slow training speed, requires large amounts of data
Fascinating to learn and frustrating to train the application of these models are endless. Keeping this in my mind I can give two tips to do justice to other machine learning models.
- Use it when none of the above models work or cant be applied to the data
- Avoid using it on smaller datasets and tabular data
- Sometimes a simpler solution is much better than a complex one so only proceed with neural networks once you have created a baseline model
Conclusion:
Machine learning is an ever-developing field with a new model always around the corner. It is extremely important to keep yourself updated and to keep practicing and learning.
In the field of Machine learning and AI........"What We Know is a Drop; What We Don't Know is an Ocean”: Dark
Thank you for making it to the end. Keep exploring and continue learning!
Sr Manager, Lead Analytics | Godrej Agrovet (CDPL) || SCMHRD || Digital Transformation || Process Automation
4 年Nicely concluded.A very helpful guide for Data Science aspirants..
API Integration Team Lead - at Expoplatform | Ex-Darwinbox | Integrations | SaaS | SCMHRD
4 年Quite helpful!!! Keep up the good work ??????
Consultant at Evalueserve | Ex-Deloitte | Ex-Birlasoft | Ex-TCS | NMIMS
4 年Joseph Jose very insightful and thoughtfull article. Enjoyed the way you explained each and every machine learning model. Thanks for sharing it with the community !!
Corporate Strategy, Operations | Ex-Jio Platforms Limited
4 年Mazaaa aa gaya padhke. Waaahhh. Very well written.
Experienced Product Manager || Capgemini FS || Driving Innovation in Financial Services
4 年Thank you Joseph. Very insightful!