Is it going to rain ? A Pyomo-SVM Model
Alireza Soroudi, PhD
Lead Data Scientist @ bluecrux || SMIEEE || Optimization expert || Healthcare management || Lab Digitalization || Power and Energy systems || Developer || Author / Speaker || (views are mine)
Is it going to rain tomorrow?
This is the question we ask ourselves on a daily basis. It affects our schedules, the way we dress, our mood, coffee time, and many other things.
Now, is it going to rain tomorrow? To be or Not to be? This is the question.
The easiest way of answering this question is to open your weather forecast app (or your window) and check it. We usually trust these apps (or some of them) but how do they do it?
The math behind these weather predictions is not so complicated.?HISTORY?is the key.
The historical data is used as a fortune teller.?Support Vector Machines (SVM)?is a supervised classifier that can be used for this purpose.
The concept is described as follows:
Suppose a set of historical data is available for a multiple numbers of days (N):
The features (j) of row (i) can be anything like (but not limited to)
Now supervised learning comes into action. An expert should label the data. But how? It is simple in the 'Rain or not To Rain' question. We can check if it rained let's label the row with the label y_i=1 and if it did not rain then it can be labeled as y_i=-1 (so a new label column will be added to the data).
Some sample data is as follows:
领英推荐
The idea is to find a line (or curve) that can separate the red and blue points from each other.
Wj are the weight factors and b is a constant value of the line (all decision variables to be determined). The objective function is to maximize the distance between the blue straight lines (separators). If the data is linearly separable then Formulation A is functional. Otherwise, Formulation B is a modified version of A which tries to avoid the infeasibility
As you can see, the SVM formulation is quadratic (in objective function) and linear in the constraints.
Pyomo can be used to formulate and solve this quadratic programming.
A portion of data is used for training (obtaining w,b, epsilon) and the rest will be used for testing.
It is interesting to see it works! Let's see some results
As it is observed, some features have higher values of W. this means that they are more influential on the prediction result (this statement is more accurate if all features are uniformly scaled).
Some concluding remarks:
The following issues will influence the accuracy of the SVM:
Some other Applications:
Subscribe to the?Newsletter to have access to the upcoming posts and follow?#pyomo4all?for more!
Electrical Engineering
2 年Dear Alireza Soroudi; First of all thank you for this valuable information, I want to do this optimization in Julia with "JuMP". Are sample data available?
Optimizing the Last Mile using Data Science | MS - Operations Management
2 年Great newsletter. Easy to read and simple examples for the Pyomo library.
PhD Power Systems
2 年Interesting??
Lead Data Scientist @ bluecrux || SMIEEE || Optimization expert || Healthcare management || Lab Digitalization || Power and Energy systems || Developer || Author / Speaker || (views are mine)
2 年The data is available from https://www.kaggle.com/datasets/jsphyg/weather-dataset-rattle-package