A continuously learning ML/Neural_Network model, Is it possible?
So after some long hours of back-testing a model on a stream of real-time data, something interesting happened.
The model's accuracy of prediction was pristine but didn't last more than an hour or two. I thought to myself that there might be some problem with the data that was being fed into the model and "YES, Garbage In -> Garbage Out". I started investigating on this phenomenon and I stumbled upon a concept called "Concept Drift" which explains this phenomenon.
It does describe my problem and I accidentally stumbled upon a whole new research world which is trying to solve this problem. You see this phenomenon called "Concept Drift" occurs when the model is fed with inputs having totally different characteristics from which the model got trained on and it is an inevitable thing that happens to any model that you deploy in a real-life scenario or atleast its just my thoughts (Ps: I am an undergrad student with the knowledge of implementing ML/Neural_Nets ideas from reading a lot of blog posts, websites and books). There might be a way to overcome this scenario and companies may use it while deployment but after hours and hours of Googling I couldn't find any article that could help me solve the problem.
I was using an #LSTM and never in my life wished that such an advanced system should be having an "Adaptive Learning" feature which can be turned on/off. You see, once the model is trained, everything is set and there is NO MORE LEARNING while it is predicting something which is not really desirable in an ever changing environment. A baby might like Vanilla flavor and despise Chocolate today but this behavior might change within a span of month or a week too. We can't be having model that requires constant pampering after it is deployed and left in the wild.
So I thought of making a model that keeps learning while simultaneously making predictions. But there is a trivial problem here.
The MinMax Scaler that I used for training the model will have its min and max parameters set from the data that I used to train the model but
"WHAT IF THE SCALER HAS TO SCALE SOMETHING WHICH HAS A RANGE THAT IS DIFFERENT FROM THE ONE IT WAS FITTED WITH??"
For example say, the scaler used for training the model had its min and max parameters set to 0 and 20 respectively and now the set the values that the scaler has to perform its operations is having a range of -5 and 25. In this scenario the scaler will definitely not perform as intended to perform and hence leading to a Garbage input.
The above formula for MinMax Scaler proves my point. Now you may ask, "IS IT TRUE FOR ALL THE SCALERS?" and my answer is Yes, if you were to work on a challenging datasets which involves Time-series Regression.
Ok now let's imagine we were making a model for making smart predictions on Stock prices and now we have a stream of data entering our model. Our model performs with the same accuracy at the start as intended to but after a few hours, the accuracy drops and now you have ended up with the huge losses due to degrading accuracy.
What might been the problem here? Can it be the varying data range? Can it be the varying standard deviation and variance? Can it be the varying trends or the varying seasonality or is it totally something else. It can be anything but the point here is the model is performing poorly and we need to swap it with another model which is not ideal after deployment.
Why not have a model that is simultaneously getting trained from the stream of data which is given for prediction? Yes that can be a good solution, but making a model which can adaptively change weights during prediction has its owns complications. Parallel processing can be an answer but it requires we as a programmer to time the threads accurately which is daunting if we take the unknown variable latency into account. Also what about the data points that we missed in real-time while training our model and before starting the prediction phase
Can we trick the model to give out predictions by indefinitely keeping it in a training state and try extracting its raw outputs from the last layer manually? Well that might be a possible way but I am no good in that without prior experience in doing so.
These are only a few challenges that I could come up as a Fresher and these questions might be silly but hey everybody asks a silly question now and then out of pure curiosity during the learning phase and the companies that are solely built on providing ML solutions might hold the key for my questions, but for now I feel like a clueless idiot even though I have built 10+ ML models and solved a variety of problems and these are few questions that the 1000+ blogspots/websites/youtube that teach beginners on how to code ML/ANN/DNN don't touch upon.
Anyways I guess this post has become too long and if you have reached till here Thank you and if you feel like I would be a good candidate for some discussions such as this one, do feel free to reach out.
BYE