Is it going to rain ? A Pyomo-SVM Model

Is it going to rain ? A Pyomo-SVM Model

Is it going to rain tomorrow?

This is the question we ask ourselves on a daily basis. It affects our schedules, the way we dress, our mood, coffee time, and many other things.

No alt text provided for this image

Now, is it going to rain tomorrow? To be or Not to be? This is the question.

The easiest way of answering this question is to open your weather forecast app (or your window) and check it. We usually trust these apps (or some of them) but how do they do it?

The math behind these weather predictions is not so complicated.?HISTORY?is the key.

No alt text provided for this image

The historical data is used as a fortune teller.?Support Vector Machines (SVM)?is a supervised classifier that can be used for this purpose.

The concept is described as follows:

Suppose a set of historical data is available for a multiple numbers of days (N):

No alt text provided for this image

The features (j) of row (i) can be anything like (but not limited to)

  • Wind speed in the morning and afternoon
  • Wind direction
  • Cloudy or Sunny?
  • Season
  • Temperature

Now supervised learning comes into action. An expert should label the data. But how? It is simple in the 'Rain or not To Rain' question. We can check if it rained let's label the row with the label y_i=1 and if it did not rain then it can be labeled as y_i=-1 (so a new label column will be added to the data).

Some sample data is as follows:

No alt text provided for this image
No alt text provided for this image

The idea is to find a line (or curve) that can separate the red and blue points from each other.

Wj are the weight factors and b is a constant value of the line (all decision variables to be determined). The objective function is to maximize the distance between the blue straight lines (separators). If the data is linearly separable then Formulation A is functional. Otherwise, Formulation B is a modified version of A which tries to avoid the infeasibility

As you can see, the SVM formulation is quadratic (in objective function) and linear in the constraints.

Pyomo can be used to formulate and solve this quadratic programming.

No alt text provided for this image

A portion of data is used for training (obtaining w,b, epsilon) and the rest will be used for testing.

It is interesting to see it works! Let's see some results

No alt text provided for this image

As it is observed, some features have higher values of W. this means that they are more influential on the prediction result (this statement is more accurate if all features are uniformly scaled).

Some concluding remarks:

The following issues will influence the accuracy of the SVM:

  • How to select the training set (large/small portion, which part of the data is used).
  • What features should be considered in the training of W, b, and epsilon?
  • What kernel should be used (linear or non-linear)?
  • How to clean the data (data quality)?

Some other Applications:

No alt text provided for this image

Subscribe to the?Newsletter to have access to the upcoming posts and follow?#pyomo4all?for more!

Hossein Karimianfard

Electrical Engineering

2 年

Dear Alireza Soroudi; First of all thank you for this valuable information, I want to do this optimization in Julia with "JuMP". Are sample data available?

回复
Justin Barker

Optimizing the Last Mile using Data Science | MS - Operations Management

2 年

Great newsletter. Easy to read and simple examples for the Pyomo library.

Yasin Pezhmani

PhD Power Systems

2 年

Interesting??

Alireza Soroudi, PhD

Lead Data Scientist @ bluecrux || SMIEEE || Optimization expert || Healthcare management || Lab Digitalization || Power and Energy systems || Developer || Author / Speaker || (views are mine)

2 年

要查看或添加评论,请登录

Alireza Soroudi, PhD的更多文章

社区洞察

其他会员也浏览了