Estimating live trading confidence levels from multiple walk forward generation

Estimating live trading confidence levels from multiple walk forward generation

In the previous articles,

https://www.dhirubhai.net/pulse/whats-threshold-concern-when-strategys-pl-remains-study-landolfi/?trackingId=1MHO43E7RK3d2qELuEIP0g%3D%3D

and

https://www.dhirubhai.net/pulse/from-sharpe-ratio-max-drawdown-numerical-approach-francesco-landolfi/?trackingId=xo8pFojOTwKh7h8dFUeOzw%3D%3D

I focused on translating assumptions about the Sharpe ratio of a trading strategy into measurable metrics that are most relevant in real-time trading, such as drawdown and time to recovery from losses. These metrics can be used to implement an equity control strategy that provides guidance on when to turn off or revisit an underperforming strategy. However, these considerations are entirely independent of the specific model used and are based on an underlying assumption about the distribution of returns, which is then used to generate equity line samples.

To provide model-specific guidance for live trading, a different approach is needed that does not assume anything about the statistical properties of the model itself, but rather derives them from the calibration/validation process. Commonly used techniques for providing guidance on the expectations of a model in live trading are the Cross-validation technique and De Prado et al overfitting test (https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2326253). These techniques are elegant and clever, but they are affected by the fact that they do not respect the arrow of time, which can lead to inflated out-of-sample results.

For example, the standard cross-validation technique performs a series of calibrations/validations in which, given a set of data, the division between the training set and the test set takes the following form:

No alt text provided for this image



See?https://medium.com/python-in-plain-english/validating-a-parametric-trading-system-calibrated-through-a-genetic-algorithm-with-python-87a17f66f6e?for details.

The key issue with this approach is that it is not representing anything that is done in reality, where time flows always in the same direction and tests inevitably follows after trainings

A Time aware validation technique

The approach used to extract statistics from the training and validation technique is as follows:

1)?????Determine the minimum (m) and maximum (M) length of the training set.

2)?????Choose a single year to serve as the test set.

3)?????Select all n_plets (sets of n consecutive years) that come before the test year and have a length of m < n < M to create the set of training sets.

4)?????Calibrate the model for each n_plet in the training sets.

Repeat steps 3-5 for every year.

Please refer to the accompanying images below for an example of this procedure for both the 2022 and 2021 test sets.

No alt text provided for this image
No alt text provided for this image


Also, to clarify, below is a table that compares in-sample and out-of-sample metrics, such as the Sharpe ratio, number of trades, etc. In this instance, the out-of-sample year is 2016, and the minimum in-sample window is 10 years, while the maximum in-sample window is 19 years.


No alt text provided for this image

From Validation to Equity Control

After completing all the defined calibrations, we can observe the shapes of the equity line of our model for each test set. For every year, there will be n lines associated with it, where n is the number of non-equivalent best solutions from the calibration. These solutions may provide the best Sharpe ratio in the in-sample or maximize any other objective functions.

No alt text provided for this image


Combining these equity lines, we obtain a set of entirely out of sample equity lines.

No alt text provided for this image



Now, here comes the crucial point. We have generated a sample space of out-of-sample equity lines, and each sample is, by definition, derived solely from observations of past data.

From the distribution of out of sample lines we can define best, worst and expected scenarios by looking at the quantiles.?

No alt text provided for this image


These scenarios can be used to compare with live trading and help assess the appropriate time to discontinue using the model.

For example, if the live trading drawdown is worse than the maximum drawdown of the worst-case scenario as defined above, then the strategy needs to be reassessed.

I hope you enjoyed! Pls like and/or comment this article to let me know your thoughts!

Thank you

Francesco

Emlyn Flint

Derivatives & Quant Research, Peresec | Adj. Associate Professor, UCT

1 年

Thanks Francesco for an interesting article. There are definite similarities here with what Patrick Burns proposed in the mid-2000s with out-of-sample random portfolios (i.e. unoptimised trading strategies) rather than optimised oos portfolios, for creating better benchmarking distributions for a strategy. One question: when you combine the equity lines, how do you deal with the multiplicative branching issue when joining the periods? For example, 2016 graph has 7 lines and 2017 has 9 lines. Each 2016 line can attach to any of the 9 lines in 2017, leading to 63 total possibilities. Now assuming the lines grow by 1 each year, that means 630 lines by 2018, then 6 930 by 2019, 83k by 2020, 1.1m by 2021, and finally 15.1 million lines by end of 2022! The lines explode quickly, and the explosion rate is linked to the total number of periods. What happens if instead we'd used 6m periods? If I assume that the possible lines per 6m period are the same as your yearly graphs above (7 lines for 1H16 and 7 lines for 2H16, etc.), then combining all the possibilities from 2016 to 2022 (14 periods) leads to 229 trillion possible branching paths!

要查看或添加评论,请登录

Francesco Landolfi的更多文章

社区洞察

其他会员也浏览了