Artificial Intelligence & Deep Learning
Don't waste too much time training your models! It is essential to limit the time for the model selection process. It is often better to have an imperfect model after three days of development than a perfect model after three years.
In general, it is important to understand that the model selection pipeline may not yield an optimal model for the business problem the ML solution is trying to address. It only matters how the model is performing in production, and a model that has been measured as optimal on offline metrics may not be optimal as measured in online experiments. Offline metrics may not correlate to online metrics. Typically, offline validation metrics are only relevant to machine learning experiments (AUC, log loss, etc.) and may not address the entirety of the business problem. During A/B testing experiments and production, the model is usually assessed using metrics more relevant to the business problem (click-through rate, profit margin, etc.).
Junior data scientists and machine learning engineers often ignore the differences between offline experiments and live models. They will continue to optimize models on offline metrics for longer than necessary, providing optimal model performance that will not translate to production data and online metrics. That is why it is essential not to over-optimize models at the time of development.
When we perform a model selection experiment, we must be careful in setting up the experimental design. Ultimately, we need to be able to compare models accurately. "Comparing" means that we have a metric that can be used to rank model performance. For example, when you use that metric to say that Model A > Model B, an "optimal "model will be the one that maximizes that metric. So, choosing the offline metric that correlates the most with the online metric used for A/B testing is critical. When you design your metric, you cannot think about a "better "model in a vacuum; you need to consider how this metric relates to the business problem you are trying to solve. For example, if for the business problem, only the top-ranked samples matter, don't use an offline metric like AUC since it will give as much importance to all the samples.
领英推荐
#MachineLearning #DataScience #ArtificialIntelligence
--
Register for the ML Fundamentals Bootcamp: https://learn.theaiedge.io/.../machine-learning...