Connect OPL to Python with ticdat
FYI Subsequent to this post, I've worked with combining AMPL and Python and have found this approach works even better due to the existence of the amplpy package. This is now my recommendation for those interested in using "runnable LaTeX" with Python. Please read more here. https://t.co/mH4EXkiRfn
I’ve written a few blog posts on the topic of the relationship between Python and data science. My enthusiasm is such that I’ve even been accused of being evangelical on this subject. While I do admit that I find the logical consistency of Python to be appealing, I think the factual evidence is now clear and overwhelming - if Python isn’t part of your skill-set, then data science is leaving you behind.
That said, I’m willing to admit that perhaps my “always use Python!” tone has been somewhat procrustean. In particular, when this post was shared on a CPLEX users group, I received a lot of interesting and thoughtful feedback. More than a few people spoke quite eloquently in defense of IBM’s Optimization Programming Language (OPL). While I find the set logic and syntax of Python to be quite intuitive, it’s entirely reasonable to admit that this is partly a matter of taste. For those who find OPL to be the right tool for this specific job, who am I to tell them otherwise?
Moreover, even granting a mass awakening to the one true religion of Python, there is no miraculous cure for currently existing OPL code. I think it’s safe to say that some sort of automated OPL-to-Python rewriting tool isn’t in the offing. This leaves organizations with legacy suites of useful OPL models with only two real choices - rewrite them by hand, or connect them to Python via some form type of data adaptor.
Although IBM doesn’t offer a Python - OPL adaptor as part of the CPLEX Optimization suite, people like Alex Fleischer have been very helpful in offering guidance as to how such a library might work. The more I studied this suggestion, the more it spoke to me. Alex is demonstrating a method by which Python could be used to replace not the .mod files that people find so endearing, but rather the repetitious and little loved .dat files that are used to empower OPL with data I/O. “Data glue” is a natural use case for Python, and the ticdat package I helped build was designed specifically to connect optimization logic with data. I quickly became obsessed with the idea of using ticdat to generalize Alex’s diet-specific example to any OPL model.
Luckily for me, I encountered a few people who found this idea appealing as well. Diego Olivier Fernandez Pons and Josh Woodruff are both OPL experts with an interest in connecting Python to OPL. By lending their technical expertise to the ticdat team, we were able to create a general purpose data adaptor between Python and OPL. This working group put a great deal of care and attention into transforming Alex’s one-off example into the easy-to-understand and industrial grade opl_run subroutine, all distributed for free under ticdat’s open-source license.
We have published the example models here and instructional videos here. I hope you play around with any and all of these examples. Although Opalytics does present instant-apps-for-OPL as a commercial partnership opportunity, the opl_run function itself is our attempt to enrich the optimization community at large.
In addition to the readme’s and docstrings on GitHub, I’d like use the remainder of this blog post as a technical survey of how and why to best use opl_run.
First, lets itemize of a few points that will help you get oriented.
1 - In order to use opl_run, you need to first do two things.
- Install the ticdat Python package. If you’ve done this already, bear in mind you want version 0.2.7 or later.
- Execute the oplrun_setup.py script in the appropriate directory.
2- Following the guidance of Alex’s diet example, opl_run creates auto-generated files for purposes of passing data back and forth to the OPL engine. Specifically, a foo.py file will auto-generate three files.
- ticdat_foo.mod - this is a data-free OPL file that defines the data structures and variables that will contain the input data. Your foo.mod file will need to include ticdat_foo.mod and reference the variables it defines.
- temp.dat - this is a .dat file that can be used to populate the variables defined in ticdat_foo.mod
- ticdat_foo_output.mod - this is an OPL file that defines the data structures and variables that will contain the solution data. This file also defines the writeOutputToFile() subroutine that needs be invoked by foo.mod to create the results.dat file. Following a successful OPL engine solve process, this results.dat file will be parsed by the opl_run code so as to return the solution data as a Python data structure.
3. Bear in mind that the foo.mod file needs to be written with references to ticdat_foo.mod and ticdat_foo_output.mod. However, these ticdat_foo*.mod files are autogenerated with each solve process. This creates something of a chicken and egg problem, for which we have devised the following coping techniques.
- My favorite strategy when writing the foo.mod file is to follow the “so what if it breaks” approach outlined below.
- --> First create the foo.py file so as to define the input and output schemas and the simple solve() subroutine that uses opl_run.
- --> Create a skeletal foo.mod file that does nothing more than contain the mandatory ticdat_foo.*mod include statements and a writeOutputToFile() reference.
- --> Execute foo.py with empty.sql as the -i named argument, where empty.sql is an empty file.
- --> The resulting invocation of opl_run will create the appropriate ticdat_foo*.mod files before throwing some sort of exception. As these are the exact same files that would have been created with real input data, they can safely be used to inform the writing of foo.mod.
- That said, for those who find such deliberate exception throwing unpleasant, we have also included create_opl_mod_text and create_opl_mod_output_text in the public interface for ticdat.
4. Finally, I’d like to highlight some of the reasons why opl_run might prove useful to you.
- OPL doesn’t address machine learning. Predictive and prescriptive analytics ought to go together like peas and carrots. Lucky for us, Python has a rich suite of open sourced machine learning code. Connect OPL to Python, and you’re well on your way to ML-optimization synergy.
- Dirty data is a real problem. One of the core functionalities enabled by ticdat is the ability to define data integrity rules in an efficient manner, and validate that those rules are obeyed before passing the data to the optimization engine. Although technically these data integrity definitions are optional, in practice I think you will find them invaluable.
- Python opens up a gateway to all sorts of data connectivity options that OPL doesn’t necessarily provide. For example, OPL withdrew Microsoft Access support in a recent release, whereas ticdat supports Microsoft Access by leveraging the pypyodb and pyodbc packages.
- OPL was never intended to be a comprehensive general purpose language. Nobody has ever suggested that OPL could be the only language an optimization focused application developer would ever need. Since you will surely need to incorporate at least one other programming language into your prescriptive analytical solution, why not use Python? Once your optimization logic has been rendered, you might find Python to be the only (other) language you’ll ever need.
Teacher at SHMTU
5 年https://github.com/opalytics/opalytics-ticdat/tree/master/examples/expert_section/opl Very interesting, but this link is broken.? Could you fix it?