登录查看更多内容

ML and MIP Gurobi Webinar Follow Up

Peter Cacioppi

Customized Analytics for Supply Chain and Optimization

发布日期: 2017年6月12日

I recently had the pleasure of collaborating with Dr. Daniel Espinoza on a Gurobi webinar discussing Machine Learning and optimization. The webinar was intended as an introduction to the topic, presenting a series of basic machine learning operations as the data preparation step to an optimization problem. The code and data were based on a consulting project by Professor David Simchi-Levi. Of course, the code was simplified and the data set shrunken and altered for purposes of anonymization and exposition.

The history of the webinar is a story unto itself. Daniel and I collaborated on a very similar pair of related presentations for the INFORMS 2017 conference, where the material was presented to around 60 people or so. We then turned the content into a 50 minute webinar, which was attended by a vastly larger audience. I admit to a bit of stage fright, but except for a minor snafu with audio quality on my end, I think Daniel and I acquitted ourselves respectably.

In addition to thanking Gurobi for the opportunity, I thought I’d use this blog entry to discuss feedback and perhaps answer a few questions.

Goldilocks problem

It was about a tie between the people who thought the material was too advanced, and those who thought it was too simple, so that probably means we did something right. But a few complaints along these lines were more specific.

The code smelled funny

I’m personally of the opinion that the quest to achieve universal harmony via rigid normalization of coding styles is neither realistic nor productive. The best way to simplify code that looks odd is to rewrite yourself, and the best way to rewrite code it to experiment with a reproducible code sample. By providing a Jupyter notebook and associated data set, our presentation enabled interested parties to do just that. The webinar itself moved quickly, as we felt we our primary goal was to orient people to the code, rather than fully explain it.

The work lacked originality

The idea that Machine Learning can be used as a data preparation step isn’t terribly original, nor did we present it as such. We felt like this topic represented a good introduction to the combination of predictive and prescriptive analytics, particularly for people whose expertise lies in just one area.

Moreover, the amount of public data and code examples covering this combined ground seems quite skimpy, as most examples just discuss one or the other. Hopefully our contribution to the set of free and public examples advances the knowledge base of the community as a whole.

For people looking for a more sophisticated connection between Machine Learning and optimization, they probably will need to study specific case studies of bespoke pipelined analytics projects. As it happens, Professor Simchi-Levi has published a paper on just such a topic. Hopefully he can present this material to some future webinar audience.

Finally, an ancillary goal of the webinar was to demonstrate the power of the Opalytics Cloud Platform. We at Opalytics do feel that our ability to instantly deploy predictive and prescriptive apps, which can then be easily connected to each other, is in fact quite original. It’s possible there are other companies demonstrating something comparable in public videos, but we haven’t found them.

The readme was actually required reading

The download originally reflected a snapshot of the GitHub hosted directory, and as such, didn’t include the .xlsx files needed to execute the notebook file. Rather, a readme file directed you to a Python script which would then generate the .xlsx files from the provided .json files. There were a few reasons for archiving the files this way.

In general, binary files like .xlsx are less useful than human readable files like .json when incorporated into a source code control system. This becomes particularly apparent when you provide continuous integration support to your analysis code with test suite validation, which is part of the functionality enabled by packages like unittest. I’d be happy to blog and/or webinar on this topic in more detail if people are interested. In my opinion, the ease with which you can safety net your code with continuous integration support is one of the great advantages of Python.

The Machine Learning code failed to perform magic

There were some raised eyebrows about the nature of the predictions made by the ML logic, and how this played into the optimization results. There were two distinct issues that raised concerns.

The first was that our Random Forest estimator did not consistently predict that lowered prices would result in larger demand. This was partly a consequence of our historical table being fairly small, as it was shrunk for purposes of publication. It was also a consequence of the existence of other independent variables that drive customer demand. The reality of messy real world data is that historically, diminished price isn’t perfectly associated with increased demand, and might even be sometimes associated with reduced demand. Such behavior patterns will inevitably show up in our predictions, even if they are counter to our intuitive expectations of what would happen in the magical world where we get to experience the different outcomes of each of our decisions before actually making them.

The second concern involved the simulation results, which revealed that our optimal solution was somewhat likely to violate the maximum investment constraint. Of course this isn’t terribly surprising when you realize that Gurobi did its job well, and found a solution that was very close to exceeding this KPI restriction. Our optimal solution very nearly filled its knapsack to perfection. When placed under simulation, this knapsack became smaller roughly half the time, and thus our optimal solution was infeasible by a similar ratio. A longer webinar that was more oriented to stochastic optimization would explore the appropriate coping strategies in greater detail. Gurobi has real expertise in this area, and I wouldn’t be surprised if just such a webinar was forthcoming.

Finally, there was an interesting suggestion that the randomization be incorporated into the prediction itself. Since the fitting process seems to generate such a broad diversity of predictor objects, why not generate a range of random forests, and then return the average of their predictions as the prediction step of the predict-prescribe pipeline?

My response to this is two fold. First, since Random Forest is itself an ensemble method, it seems unlikely that a collection of forests would outperform a single forest of the same collective size. More generally, this collection of forests idea, or any other creative approach to predictors, would need to be examined by the same cross fold validation process we used to select random forests themselves. With Machine Learning, you always must return to the data, and apply a dispassionate analytical process to whichever set of tools and algorithms are under consideration.

要查看或添加评论，请登录

Peter Cacioppi的更多文章

Blood, sweat and tears

2022年3月22日

Blood, sweat and tears

Jean-Francois Puget, a friend and former colleague, recently kicked off an interesting conversation with the following…

3 条评论
Building Pythonic MIPs with AMPL

2018年4月12日

Building Pythonic MIPs with AMPL

Update Sadly, as of June 2021, AMPL has failed to follow through on their early promise of connecting their modeling…

4 条评论
Miami Metrorail meets Python

2018年3月7日

Miami Metrorail meets Python

Tallys Yunes recently posted his formulation of the "Buying Metrorail Tickets in Miami" optimization problem. His…
Connect OPL to Python with ticdat

2017年6月19日

Connect OPL to Python with ticdat

FYI Subsequent to this post, I've worked with combining AMPL and Python and have found this approach works even better…

1 条评论
We few. We happy few.

2016年12月4日

We few. We happy few.

I admit to having strong opinions. As I’ve mentioned in previous blogs, I think Python is the right programing language…

2 条评论
Why Python for MIP? Four Key Points

2016年11月7日

Why Python for MIP? Four Key Points

FYI - I've since recanted the dogmatic positions outlined below. Please go here https://t.

4 条评论
Fantasy Footballers are Nerds Too

2016年8月30日

Fantasy Footballers are Nerds Too

There are two types of nerds. Fantasy football nerds, and Dungeon and Dragons nerds.
Retire the Five Grand Old Men

2016年8月10日

Retire the Five Grand Old Men

Every two years, INFORMS gives out an Impact Prize “to recognize contributions that have had a broad impact on the…

See all articles

ML and MIP Gurobi Webinar Follow Up

Peter Cacioppi

Customized Analytics for Supply Chain and Optimization

Peter Cacioppi的更多文章

社区洞察

其他会员也浏览了

SpeedML

Data Phoenix Digest - ISSUE 15.2023

Issue #209 - THE ML ENGINEER???

Final ODSC Europe 2023 Schedule Released! Plan Your Week Here

Issue #191 - THE ML ENGINEER ??

Machine Learning - MLflow for managing the end-to-end machine learning lifecycle

Issue #179 - THE ML ENGINEER ??

Kubeflow Pipelines v2: Making ML pipelines easier, faster, and more scalable

Pre-Built Data Science Project Templates

LLM fine-tuning and model selection + other resources

Peter Cacioppi的更多文章

Blood, sweat and tears

Building Pythonic MIPs with AMPL

Miami Metrorail meets Python

Connect OPL to Python with ticdat

We few. We happy few.

Why Python for MIP? Four Key Points

Fantasy Footballers are Nerds Too

Retire the Five Grand Old Men

社区洞察

其他会员也浏览了

SpeedML

Data Phoenix Digest - ISSUE 15.2023

Issue #209 - THE ML ENGINEER???

Final ODSC Europe 2023 Schedule Released! Plan Your Week Here

Issue #191 - THE ML ENGINEER ??

Machine Learning - MLflow for managing the end-to-end machine learning lifecycle

Issue #179 - THE ML ENGINEER ??

Kubeflow Pipelines v2: Making ML pipelines easier, faster, and more scalable

Pre-Built Data Science Project Templates

LLM fine-tuning and model selection + other resources