登录查看更多内容

Backtesting in the clouds

Thomas Schmelzer

Portfolio construction and technology @ ADIA | Commodities and LS Equities | Visiting Scholar at Stanford.

发布日期: 2025年2月24日

Imagine a scenario where you want to facilitate backtesting for multiple strategies, written by various developers, some of whom may use different programming languages or environments. This presents a few challenges, primarily around maintaining consistency, version control, and the ease of use of the backtest process for everyone involved. Let's break down concerns and potential solutions:

Problem: Different developers may prefer or be limited to specific programming languages (e.g., Python, R, C++, Julia, etc.). You can't expect everyone to use Python, and if you were to build a Python-only solution, it would alienate those outside of the ecosystem.

We follow and API-first design and different languages will communicate with the backtest engine using a gRPC messaging protocol. The tool we are going to introduce is going to make it easy and fast to create such APIs. The central backend could be written in Python and clients (in Python, R, Java, etc.) communicate with it by sending input data and receiving results.

Problem: Even if everyone uses Python, there's still the issue of dependency management, version control, and compatibility. Different users might install different versions of your package or its dependencies, leading to inconsistent results.

The management of versions has been simplified a great deal with dependabot and renovate but both ideas still require a level of discipline and computational hygiene that is often not met in practice. Needless to say that your backtesting engine may come with various other dependencies that are a potential source for conflict.

Our central backend operates on a server within a container. It’s completely isolated from the code of your strategy. If you still refuse containers let me remind you that it is 2025 :-)

Problem: Since backtests will be run by different individuals at different times, tracking this information (who ran it, when it was run, etc.) becomes important for auditing and ensuring reproducibility.

The backtest service would come with an integrated logging and versioning system. Every time a backtest is initiated, one can log all relevant metadata (time, version, who). The data is stored in a centralized database, which can be queried or visualized. A simple interface (maybe a dashboard) could be created for users to track and analyze backtest histories, metrics, and changes. This could help to support backtest counting following an idea by Marcos Lopez de Prado .

Problem: Creating a modern server based on Apache Flight is time consuming and challenging.

To create a basic server for your own experiments I have created numpy-flight:

https://pypi.org/project/numpy-flight/

To support the rapid creation of such servers I have also created a [qcradle](https://pypi.org/project/qCradle/) template, e.g.

uvx qcradle https://github.com/tschm/server

There follow the instructions to create a server which comes complete with required dependencies, a Dockerfile, some tests and basic documentation. Once you have the server in a container it’s trivial to fire it up with any of the established cloud providers.

Conclusion

The idea of using centralized services is not new but it is of elevated relevance in situations where results need to be reproduced, metadata has to be logged and several languages have to be supported.

The approach sketched here addresses the core challenges of backtesting in a multi-language environment:

gRPC bridges the gap between different programming languages.
Containerization ensures consistent environments and eliminates dependency conflicts.
Logging and versioning systems ensure that backtest results are auditable and reproducible.
Tools like numpy-flight and qcradle make it easier for developers to set up and experiment with backtesting environments

Tushar Chhabhaiya

SpeedBot | Expert Quant | Algorithmic Trading | HFT | Risk Management

20 小时前

Backtesting on the cloud is a game-changer! The scalability and speed of cloud-based infrastructure allow for running multiple strategies in parallel, reducing time-to-market for trading models. Plus, with GPU/TPU acceleration, complex simulations become more efficient. Curious to know—how do you handle data latency and API limits when scaling backtests?

Jared Broad

CEO/Founder at QuantConnect | Democratizing quant trading | Disrupting algorithmic trading with OS innovation

2 天前

This sounds a lot like the LEAN Engine container. And the cloud hosted variant is QuantConnect which has the data required.

4 次回应

Olusegun Solomon Oluwajebe

MD, MSc(Financial Engineering), EMBA Managing Partner at Enthusia Consulting Ventures | CEO at Medanchor Limited

2 天前

Interesting

Chris Tinker

Founding partner - Libra Investment Services

3 天前

Replication issues could certainly be improved via such a process too.

Arun Muralidhar

5 天前

We did this 20 years ago with AlphaEngine(R) (Mcube Investment Technologies - www.mcubeit.com) and Khaled Balama liked it (based on sage advice he received from the late Kirit Patel), but some of your colleagues were not happy and killed it after he left...LOL. Quite ironic that you bring this up 20 years later...

2 次回应

查看更多评论

要查看或添加评论，请登录

Thomas Schmelzer的更多文章

When did Elvis die?

2025年2月20日

When did Elvis die?

After celebrating being on the 25 quants listed by EQDerivatives, Inc it's time to focus on important stuff again…
Hierarchical Methods in Portfolio Construction: Introducing pyhrp

2025年2月15日

Hierarchical Methods in Portfolio Construction: Introducing pyhrp

In the last part of my mini-series of hierarchical methods. We are diving again into hierarchical methods for portfolio…

3 条评论
Hierarchical Methods in Portfolio Construction: Traversing trees

2025年2月9日

Hierarchical Methods in Portfolio Construction: Traversing trees

I shall start this 2nd and pre-ultimate part of my mini-series on hierarchical methods with a disclaimer: It is not my…

9 条评论
Hierarchical Methods in Portfolio Construction: Understanding the Root Node

2025年2月8日

Hierarchical Methods in Portfolio Construction: Understanding the Root Node

Portfolio construction has long relied on grouping assets into sub-portfolios rather than treating all assets as a…

13 条评论
Cornering Kelly

2025年2月1日

Cornering Kelly

Univariate Trading Systems: Simple & Practical Position Sizing In the world of trading, particularly within…

9 条评论
Radical assembly lining

2025年1月26日

Radical assembly lining

The evolution of manufacturing and software development processes has always fascinated me. My early days as a mechanic…

7 条评论
Rapid Quanting

2025年1月25日

Rapid Quanting

Apparently, quanting is a word! Just as car manufacturers use shared platforms for efficiency, stability, and speed in…

6 条评论
The Secret Shrinkage Sauce

2025年1月22日

The Secret Shrinkage Sauce

WARNING: By LinkedIn standards, this article leans heavily on mathematics. Statisticians have a reputation for poor…

11 条评论
Is convexity really worth it?

2025年1月21日

Is convexity really worth it?

Ever since my early days in finance I have worked with convex functions and their optimization. Once in a while I still…

26 条评论
How many is really a lot?

2024年9月27日

How many is really a lot?

Last weekend we celebrated reunion at Balliol. I had the pleasure to be placed across a young man running a shipping…

14 条评论

See all articles

Conclusion

Thomas Schmelzer的更多文章

When did Elvis die?

Hierarchical Methods in Portfolio Construction: Introducing pyhrp

Hierarchical Methods in Portfolio Construction: Traversing trees

Hierarchical Methods in Portfolio Construction: Understanding the Root Node

Cornering Kelly

Radical assembly lining

Rapid Quanting

The Secret Shrinkage Sauce

Is convexity really worth it?

How many is really a lot?