登录查看更多内容

Is it going to rain ? A Pyomo-SVM Model

Alireza Soroudi, PhD

Lead Data Scientist @ bluecrux || SMIEEE || Optimization expert || Healthcare management || Lab Digitalization || Power and Energy systems || Developer || Author / Speaker || (views are mine)

发布日期: 2022年4月21日

Is it going to rain tomorrow?

This is the question we ask ourselves on a daily basis. It affects our schedules, the way we dress, our mood, coffee time, and many other things.

Now, is it going to rain tomorrow? To be or Not to be? This is the question.

The easiest way of answering this question is to open your weather forecast app (or your window) and check it. We usually trust these apps (or some of them) but how do they do it?

The math behind these weather predictions is not so complicated.?HISTORY?is the key.

The historical data is used as a fortune teller.?Support Vector Machines (SVM)?is a supervised classifier that can be used for this purpose.

The concept is described as follows:

Suppose a set of historical data is available for a multiple numbers of days (N):

The features (j) of row (i) can be anything like (but not limited to)

Wind speed in the morning and afternoon
Wind direction
Cloudy or Sunny?
Season
Temperature

Now supervised learning comes into action. An expert should label the data. But how? It is simple in the 'Rain or not To Rain' question. We can check if it rained let's label the row with the label y_i=1 and if it did not rain then it can be labeled as y_i=-1 (so a new label column will be added to the data).

Some sample data is as follows:

领英推荐

Binary Trees study guide

Kartik Kathuria 2 年前

How to predict Healthy and Faulty sounds with MFCC…

???? Abdelmalik Berrada ???? 2 年前

The Raw Output of Llama3.1 on Last Topic

张辛 7 个月前

The idea is to find a line (or curve) that can separate the red and blue points from each other.

Wj are the weight factors and b is a constant value of the line (all decision variables to be determined). The objective function is to maximize the distance between the blue straight lines (separators). If the data is linearly separable then Formulation A is functional. Otherwise, Formulation B is a modified version of A which tries to avoid the infeasibility

As you can see, the SVM formulation is quadratic (in objective function) and linear in the constraints.

Pyomo can be used to formulate and solve this quadratic programming.

A portion of data is used for training (obtaining w,b, epsilon) and the rest will be used for testing.

It is interesting to see it works! Let's see some results

As it is observed, some features have higher values of W. this means that they are more influential on the prediction result (this statement is more accurate if all features are uniformly scaled).

Some concluding remarks:

The following issues will influence the accuracy of the SVM:

How to select the training set (large/small portion, which part of the data is used).
What features should be considered in the training of W, b, and epsilon?
What kernel should be used (linear or non-linear)?
How to clean the data (data quality)?

Some other Applications:

Subscribe to the?Newsletter to have access to the upcoming posts and follow?#pyomo4all?for more!

Optimization in open-source

11,290 位关注者

Hossein Karimianfard

Electrical Engineering

2 年

Dear Alireza Soroudi; First of all thank you for this valuable information, I want to do this optimization in Julia with "JuMP". Are sample data available?

Justin Barker

Optimizing the Last Mile using Data Science | MS - Operations Management

2 年

Great newsletter. Easy to read and simple examples for the Pyomo library.

1 次回应

Yasin Pezhmani

PhD Power Systems

2 年

Interesting??

1 次回应

Alireza Soroudi, PhD

Lead Data Scientist @ bluecrux || SMIEEE || Optimization expert || Healthcare management || Lab Digitalization || Power and Energy systems || Developer || Author / Speaker || (views are mine)

2 年

The data is available from https://www.kaggle.com/datasets/jsphyg/weather-dataset-rattle-package

1 次回应

查看更多评论

要查看或添加评论，请登录

Alireza Soroudi, PhD的更多文章

Haft-Seen Logistics: Route Optimization with Uncle Nowruz

2025年3月19日

Haft-Seen Logistics: Route Optimization with Uncle Nowruz

Intro: Nowruz (Persian: ????? [no???u?z], meaning “New Day”) is the Iranian New Year, also known as the Persian New…

3 条评论
Machine Learning An Old Wine (OR) in a new Bottle

2025年2月13日

Machine Learning An Old Wine (OR) in a new Bottle

These days, you often hear alot about 'Machine learning' and AI. Are they distinct from Optimization and decision…

11 条评论
Happy Yalda: Celebrating the Victory of Light Over Darkness with the Power of Python and Optimization

2024年12月18日

Happy Yalda: Celebrating the Victory of Light Over Darkness with the Power of Python and Optimization

This episode of this newsletter is dedicated to a persian tradition called Yaldā Night. Yaldā Night or Chelle Night…

8 条评论
Optimized Task Assignments Using Constraint Programming

2024年12月9日

Optimized Task Assignments Using Constraint Programming

There are 10 tasks and 20 reasources. The duration of each task as well as the cost of assignmnet to each resource is…

13 条评论
How large is your snake? A CP approach

2024年11月4日

How large is your snake? A CP approach

I am not sure if it's important or not but sometimes you are curious to know how many items you can fit into your…

14 条评论
Gerrymandering using Constraint Programming

2024年10月25日

Gerrymandering using Constraint Programming

As the U.S.

11 条评论
Last Tango with Math (CP)

2024年10月18日

Last Tango with Math (CP)

The Last Tango is no longer in Paris and it seems to be on Linkedin. After coming across a logic puzzle on LinkedIn, I…

4 条评论
From Solving to Designing Queen Problem using CP

2024年9月17日

From Solving to Designing Queen Problem using CP

In the last thrilling episode of this newsletter, we cracked the LinkedIn Queen Problem! ?? But then it hit me—how does…

22 条评论
How to Cut your Board using CP?

2024年9月9日

How to Cut your Board using CP?

Cutting the shapes has always fascinated mathematicians. In this episode of Optimization in Open Source, we focus on…

13 条评论
Queens game using CP

2024年8月23日

Queens game using CP

This is an exciting installment of our newsletter! I recently came across a fascinating challenge on LinkedIn that…

20 条评论

See all articles

Is it going to rain ? A Pyomo-SVM Model

Alireza Soroudi, PhD

Lead Data Scientist @ bluecrux || SMIEEE || Optimization expert || Healthcare management || Lab Digitalization || Power and Energy systems || Developer || Author / Speaker || (views are mine)

领英推荐

Some concluding remarks:

Optimization in open-source

11,290 位关注者

Alireza Soroudi, PhD的更多文章

社区洞察

其他会员也浏览了

The Raw Output of Llama3.1 on Last Topic

Fit & predict for regression

Grind 75 - 23 - Maximum Depth of Binary Tree

Understanding Ridge Regression with 2D Data & Custom Implementation

Beyond 50/50: Predicting the Uncertain with "Monte Carlo Simulation"

What to do Before Running Panel Regression Model in Stata?

Corner case in constraint #49 Learnings & Solution

Longest Subarray With Sum K

Detecting Global Optimum Convergence

Understanding the Cumulative Distribution Function (CDF)

领英推荐

Some concluding remarks:

Optimization in open-source

11,290 位关注者

Alireza Soroudi, PhD的更多文章

Haft-Seen Logistics: Route Optimization with Uncle Nowruz

Machine Learning An Old Wine (OR) in a new Bottle

Happy Yalda: Celebrating the Victory of Light Over Darkness with the Power of Python and Optimization

Optimized Task Assignments Using Constraint Programming

How large is your snake? A CP approach

Gerrymandering using Constraint Programming

Last Tango with Math (CP)

From Solving to Designing Queen Problem using CP

How to Cut your Board using CP?

Queens game using CP

社区洞察

其他会员也浏览了

The Raw Output of Llama3.1 on Last Topic

Fit & predict for regression

Grind 75 - 23 - Maximum Depth of Binary Tree

Understanding Ridge Regression with 2D Data & Custom Implementation

Beyond 50/50: Predicting the Uncertain with "Monte Carlo Simulation"

What to do Before Running Panel Regression Model in Stata?

Corner case in constraint #49 Learnings & Solution

Longest Subarray With Sum K

Detecting Global Optimum Convergence

Understanding the Cumulative Distribution Function (CDF)