Aging code
Thomas Schmelzer
Portfolio construction and technology @ ADIA | Commodities and LS Equities | Visiting Scholar at Stanford.
Code does not age well --- in particular if written in Python.
This year I had the pleasure to conduct research on a paper on the critical line algorithm by Marcos Lopez de Prado and David Bailey. The paper is here. I appreciate when academics publish code (fragments). My PhD supervisor Lloyd Nick Trefethen is a big proponent of this approach. He published an entire collection of his gems in the paper Ten digit algorithms.
Code is (for me) somewhat a lot more expressive and clearer than any mathematical poetry academics and practitioners come up with. The ultimate truth is in the code --- not in the paper. Prof. Alexander Lipton may disagree with me. Plenty of papers manage to get an unhealthy mismatch between both. Code makes a paper stronger...
The paper by de Prado and Bailey does a good job here and the code is embedded in very detailed comments. However, the code did not age well. I failed to get it running. The code is approximately 10 years old.
I moved from Matlab to Python in 2011 being pushed by developer legend Marcin Snarski . I first went back to my old code from the very same period. My old code is available on GitHub. My old code isn't exactly in perfect shape either. Sometimes I go back to my old code at night and address the biggest problems...
Today, I can only offer suggestions how to mitigate the effect. The problem with Python is that the environment and the actual code are often treated as separate entities. Because of this weakness I fell in love with containers. There we can embed both code and environment together in the same unit. This works great for apps and immutability is given for ages.
Some may refuse containers and they don't make sense if you just develop a package used in a bigger application. In this case, you may want to drown your package in tests. Those tests will help to identify the point where your package will eventually fall over and it will. My first tests check whether the code can reproduce the results stated in the paper... There should always be a paper somewhere with your code. Even if it's not intended for publication.
Today there are also tools like dependabot that update your packages following a schedule you define. I use it for most of my packages. Dependabot would not blindly update dependencies. It would generate a pull request which is triggering all the aforementioned tests. My little bot goes wild every Sunday...
But leaving all technical helpers aside we need to accept that code needs love and attention. We should append "But it worked 10 years ago" to "But it worked on my computer"...
领英推荐
There's a lot of hype on "reproducible science" but without addressing the aging issue this will fail even if we had the very best intentions.
I wish you all a wonderful Christmas and a great start into 2024...
Thomas
p.s.: If you are interested in the critical line algorithm: Github. With code and dependabot in action...
Portfolio construction and technology @ ADIA | Commodities and LS Equities | Visiting Scholar at Stanford.
1 个月Today I am using renovate rather than dependabot. Both are good tools though.
Founder & CEO SimpleAccounts.io at Data Innovation Technologies | Partner & Director of Strategic Planning & Relations at HiveWorx
8 个月Thomas, Great insights! ?? Thanks for sharing!
Two Sigma, ADIA, AQR, IBKR | quant for every investor | trunc.ai
1 年Love people who publish code as well.
Strategy-Analytics-Investments ,, Not everything that can be counted counts, and not everything that counts can be counted." by William Bruce Cameron
1 年What ever happened to the Universal Compiling System. Also very relevant issue, some of those legacy codes may have an application for quantum computing.
Unidel Chaired Professor of Mathematical Sciences at the University of Delaware
1 年More than once I've found that a published code does something not explained in, or even explained differently in, the prose of its related paper. For all its drawbacks, I will give MATLAB some credit. I've maintained the SC Toolbox since 1994, and only once did the language introduce breaking changes (in graphics). That is a remarkable achievement!