Static models in a rapidly changing dynamic world
We always develop a machine learning solution to solve real-life problems. The data that we use to train the models is nothing but static numerical or categorical values. Measuring the model performance and deploying makes us an achiever. But the problems start when the client is not happy with predictions and we start figuring out the root problem. In my data journey, the first rule I learned is to "Never Trust Data". We build models considering the static data and forget that the data changes constantly. Using historical data for model building is good but this historical data becomes outdated at some point and our models still believe certain features are still having a higher priority. This story is about how models that we built stay static when things in real-life change and what steps we can take to make our ml models consistent.
The pressing question we have is are we keeping track of data changes and model performance. The upstream data changes quickly and our model is not built to consider these changes. So consistent data checks should be done on upstream data before feeding into the ml model. Some of the checks I would do are making sure the distribution is consistent, the trend and pattern follows the historical data (that was used for training the model). Distribution similarity checks will help in knowing about changes and we can trigger an alarm if the similarity score is less.
领英推荐
We should also validate model performance in regular intervals and retrain it when needed. Having a real-time monitoring dashboard for tracking the data checks is useful and easy to notify when certain thresholds change. If the model is given a data point that it hasn’t seen before will result in wrongful predictions.?
Comment below your thoughts on how ml model monitoring can be done.