Incremental learning - The Live Machine learning model training Approach
Ashish Patel ????
Sr AWS AI ML Solution Architect at IBM | Generative AI Expert Strategist | Author Hands-on Time Series Analytics with Python | IBM Quantum ML Certified | 12+ Years in AI | IIMA | 100k+Followers | 6x LinkedIn Top Voice |
“Why data is so essential to each industry?”
Nowadays, It can be noticed that people are producing such an amount of data on different social media such as Facebook, LinkedIn, Snapchat, Instagram, WhatsApp. Besides, Industries have been following the same approach. They are being logged every piece of data, which will be needed to examine; Realtime data, which is a generated continuously known as Streaming data (Live data). Industries such as healthcare, retail, manufacturing, Finance, Banking, Insurance, Education, Transportation, Supply chain management and logistics, Agriculture, Energy, Government, Hospitality, Professional Services, Sports are generating the numerous amounts of data daily with different format such as text, audio, video, picture. In this article, we will discuss a practical implementation with streamline data.
“What is Online Learning(Incremental learning)?”
Online learning is known as Streamline learning technique or Incremental learning in which input data ceaselessly expand the model’s maturity about the knowledge to train the model further. In the traditional machine learning process, Incremental learning refers to learning from streaming data.
Machine learning provides a robust solution to the industry with its current research. Major Industry application utilizes the present method in which data is given in the form of the batch, and is given meta-parameter to model's training. Besides, the Model needs to optimize its meta-parameters to provide maximum maturity of knowledge to the model. The model stops learning when it is producing an optimal result. In this approach, the model can be carefully chosen base on the given dataset. Incremental learning, in contrast, refers to the state of continuous model optimization based on continuously incoming data streams. This kind of model is present in self-driving car and robotics which is autonomously behave.
Little maths of Incremental Learning
In Supervised learning,
Data D = ((x1,y1), (x2,y2),…,(xm,ym)) as input x and outputs y.
The task is to infer the data M ≈ p(y|x) from such data. Machine learning often trains this kind of data in batch mode.
In Incremental learning, Data D is not a presented priorly but arrives over a while.
The task is to infer a trusted model Mt after every time step based on the example (xt, yt) and the previous model Mt-1 only.
It is realized by online learning approaches, which use training sample one by one, without knowing their number in advance, to optimize its internal cost function.
Algorithm Support Incremental Learning
A stochastic optimization technique easily achieves incremental learning, such as online back-propagation.
- Support vector machine (SVM)
- Radial Basis Function Networks (RBF)
- Learning Vector Quantization (LVQ)
- k-nearest neighbor (k-NN)
- Logistic Regression
- Decision Tree(DT)
Practical Implementation of Incremental Learning
Creme is a library for online machine learning, also known as incremental learning. Online learning is a machine learning regime where a model learns one observation at a time.
In contrast to batch learning, where all data is being processed at once. Incremental learning is desirable when the data is too large to fit in memory, or simply when you want to handle streaming data. In addition to many online machine learning algorithms, Creme provides utilities to extract features from a stream of data.
In the following example, we will be training a logistic regression to predict whether or not the price of electricity will increases or decreases in the subsequent 30 minutes. We will be utilizing an actual real-world data of electricity prices from New South Wales in Australia. The dataset will be able to stream by using the fetch_electricity() function from the data-sets module. Here is what the first observation looks like:
Installation of Creme: https://creme-ml.github.io/install.html
NoteBook Code:
# installation
> !pip install creme
Collecting creme
Downloading https://files.pythonhosted.org/packages/25/7f/11df4db8cdc957fc3134c9ac18d2f6446e9810416cd376ca7ed777e3c091/creme-0.4.4-cp37-cp37m-win_amd64.whl (558kB)
Requirement already satisfied: scipy>=1.3.0 in c:\users\prompt\anaconda3\envs\prompt\lib\site-packages (from creme) (1.3.3)
Collecting scikit-learn>=0.21.2
Using cached https://files.pythonhosted.org/packages/9d/10/1dd2e3436e13402cc2b16c61b5f7407fb2e8057dcc18461db0d8e3523202/scikit_learn-0.22-cp37-cp37m-win_amd64.whl
Requirement already satisfied: numpy>=1.16.4 in c:\users\prompt\anaconda3\envs\prompt\lib\site-packages (from creme) (1.17.4)
Requirement already satisfied: joblib>=0.11 in c:\users\prompt\anaconda3\envs\prompt\lib\site-packages (from scikit-learn>=0.21.2->creme) (0.14.1)
Installing collected packages: scikit-learn, creme
Found existing installation: scikit-learn 0.20.4
Uninstalling scikit-learn-0.20.4:
Successfully uninstalled scikit-learn-0.20.4
Successfully installed creme-0.4.4 scikit-learn-0.22
ERROR: pyod 0.7.5.1 has requirement scikit-learn<=0.21.*,>=0.19.1, but you'll have scikit-learn 0.22 which is incompatible.
In [2]:
from creme import datasets
In [3]:
X_y = datasets.fetch_electricity()
In [4]:
x, y = next(X_y)
In [5]:
x
Out[5]:
{'date': 0.0,
'day': 2,
'period': 0.0,
'nswprice': 0.056443,
'nswdemand': 0.439155,
'vicprice': 0.003467,
'vicdemand': 0.422915,
'transfer': 0.414912}
In [6]:
y
Out[6]:
True
In [7]:
from creme import datasets
from creme import linear_model
from creme import metrics
from creme import optim
from creme import preprocessing
In [8]:
X_y = datasets.fetch_electricity()
In [9]:
model = preprocessing.StandardScaler()
In [10]:
model |= linear_model.LogisticRegression(optimizer=optim.SGD(.1))
In [11]:
metric = metrics.Accuracy()
In [12]:
for x, y in X_y:
y_pred = model.predict_one(x) # Make a prediction
metric = metric.update(y, y_pred) # Update the metric
model = model.fit_one(x, y) # Update the model
In [13]:
print(metric)
Accuracy: 0.894642
More Example with Crème Package:
- A quick overview of batch learning
- A hands-on introduction to incremental learning
- Bike-sharing forecasting (regression)
- Building a simple time series model
- The art of using pipelines
- Debugging a pipeline
- Handling uncertainty with quantile regression
Here a few resources if you want to do some reading:
- Online learning – Wikipedia
- What is online machine learning? – Max Pagels
- Introduction to Online Learning – USC course
- Online Methods in Machine Learning – MIT course
- Online Learning: A Comprehensive Survey
- Streaming 101: The world beyond batch
- Machine learning for data streams
- Data Stream Mining: A Practical Approach
References :
1. https://www.researchgate.net/publication/224720096_Overview_of_Some_Incremental_Learning_Algorithms
?2. https://www.elen.ucl.ac.be/Proceedings/esann/esannpdf/es2016-19.pdf