登录查看更多内容

Decoding the concept of probability of backtest overfitting: a step-by-step guide with python scripts and visual?aids.

Francesco Landolfi

发布日期: 2023年4月18日

The goal of this article is twofold:

1) to simplify the concepts behind the very popular paper?:”The probability of backtest overfitting” https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2326253

by Bailey, https://www.dhirubhai.net/in/david-h-bailey-72b471/, Borwein https://www.dhirubhai.net/in/jonathan-borwein-2157a835/, de Prado https://www.dhirubhai.net/in/lopezdeprado/ and Zhux https://www.dhirubhai.net/in/qiji-zhu-1669b71a/, and present them in a way that is easily understandable to readers,

2) to visualize these concepts to further aid in comprehension.

Many scientific articles rely on complicated notation and specialized terminology, which can be a barrier for readers who are unfamiliar with the field. By breaking down the ideas into simple, step-by-step explanations and providing Python codes for readers to replicate the process, I hope to make the concepts more accessible. Additionally, by visualizing the concepts, I aim to make them even easier to grasp.

Defining the model and collecting data:

The first ingredient is a Parametric Model, which means that the strategy is a function of the data sets and of n parameters.

These parameters have predefined values that determine the model’s combination space, which is represented by all possible n_plet combinations [p1, p2….pn].

Each of these unique n_plets generates a specific trading signal, and every signal corresponds to a particular value of the fitness function, which we select to calibrate the model.

Calibrating the model involves identifying the n_plet that maximizes the fitness function across the parameter space.

For instance, a very basic seasonal model in which each day I open a trade at a specific hour and close it at another specific hour, is described by 2 parameters: the entry hour and the exit hour, and the parameter space is formed by all the pairs (T entry, T exit).

If we assume the instruments trades 23 hours each day, there are 23*22 possible pair or 2_plets.

The initial step is to generate a Matrix that encompasses all the time series of returns associated with the parametric space. The number of columns in this Matrix is equivalent to the total possible parameter combinations (e.g., 23*22 in the previous simple example). Meanwhile, the number of rows is determined by the number of timestamps in the return time series.

For instance, suppose we have ten years of price observations, and each year has 250 trading days. In that case, the number of columns in the Matrix would be (250*10)-1 (excluding one because the calculation transitions from price to return).

Defining the?CSVC.

CSVC stands for combinatorically symmetric cross-validation.

The CSVC procedure involves dividing the dataset into 2N equal parts. Half of them will be used for calibrating the model, and the other half will be used to evaluate the performance of the strategy with unseen data. This procedure is repeated for every possible way in which N parts can be chosen among the 2N parts, and the number of possible ways is given by the binomial coefficient C(2N,N), as described in the link: https://en.wikipedia.org/wiki/Combination

To illustrate this concept, let’s consider an example where the dataset is divided into 6 parts. In this case, 3 parts will be used for the in-sample/training/calibration set, and the remaining 3 parts will be used for the out-of-sample/testing/validation set. The number of possible combinations in which 3 parts can be selected from a set of 6 is given by the binomial coefficient C (6,3) = 20. The figure below shows all 20 possible combinations:

On each partition, the target function is calculated for all the possible parameter combinations both on the red and on the green area.

To make it clear let’s picture the methodology for a model with 5 possible combinations of parameters. Each combination will give rise to a strategy and hence to an equity line. The target function is calculated separately for the dataset intervals corresponding to the red areas and for the green areas.

领英推荐

What are Sets in Python and How to use them? NareshIT

Naresh i Technologies 2 年前

Structural Pattern Matching in Python II

Coditation 2 年前

Structural Pattern Matching in Python III

Coditation 2 年前

Assuming that the framework is now clear, the next step is to rank the parameter combinations based on the value of the target function in both the in-sample and out-of-sample datasets.

In the table OSRik is the ranking in the out of sample of the parameter combination that ranks ith in the kth partition of in sample/out of sample.

Considering only the first row of the previous table, we end up with a list of out of sample ranks corresponding to the best solution in the in-sample. In other word we have a list that tells us for every partition of the dataset, how the parameter combination that maximizes the target function performs in the out of sample compared to all the other parameter combinations.?

The idea is that if a model is not overfitted, the parameter combination that performs best in the in sample will not consistently rank low compared to the other parameter combinations if tested in the out of sample.

Next step consists in using ?the list of out of sample rankings as defined above in order to assess a “probability of overfitting”.

An Example:

Let's consider the idea above by examining a set of 5000 artificially generated returns, each comprising 1000 observations. If this were a real-world model, it would imply that there are 5000 possible combinations of parameters, and the dataset covers a span of 1000 days.

Let us then decide to divide the 1000 timestamps into 16 parts. The way to choose 16/2 = 8?elements from a set of 16 is given by the binomial coefficient (16,8) = 12870. This means the number of partitions of In-sample/Out of sample in our example is equal to 12870.?

To represent the vector of out-of-sample rankings OSRik ?in a more mathematically elegant way, we can use the logit function.?

Specifically, if the rank of the best in-sample parameter combination in the out-of-sample set is in the bottom half, then the logit value is negative; otherwise, it is positive.

Plotting a histogram of the logit values can help us visualize the model's overfitting. If the histogram is concentrated on the negative side, then the model is likely overfitting; if it is on the positive side, then it is underfitting.

In our specific example, the histogram looks like this:

To calculate the probability of overfitting, we simply divide the total number of occurrences where the logit value is less than zero by the total number of occurrences.

Appendix: Python script.

In this section you will find the python script that will allow you to calculate the probability of overfitting giving:

A dataframe of returns were each columns represent a timeseries of returns of a given parameter combination of the model -> labelled as SUMMARY
The number of parts in which the dataset is split. -> labelled as n_partition

from multiprocessing_utils import 
from itertools import combinations
import matplotlib.pyplot as plt
import bottleneck as bn
import numpy as np
import pandas as pd






class oft:


? ? def __init__(self, SUMMARY, n_partitions):
? ? ? ? # Initialize instance variables
? ? ? ? self.SUMMARY = SUMMARY? # A pandas dataframe
? ? ? ? self.n_partitions = n_partitions? # An integer
? ? ? ? self.len_sample = len(self.SUMMARY.columns)? # The number of columns in the dataframe
? ? ? ? self.oft_combinations = self.create_IS_OS_combinations()? # A list of tuples containing index sets
? ? ? ? self.logits = run_simulation(self.calc_logit, self.oft_combinations)? # A list of logits
? ? ? ? self.show_results()? # A method to print the results


? ? def create_IS_OS_combinations(self):
? ? ? ? # Create index sets for in-sample (IS) and out-of-sample (OS) combinations
? ? ? ? combs_IS = list(combinations(range(self.n_partitions), int(self.n_partitions/2)))
? ? ? ? combs_OS = [list(set(range(self.n_partitions)) - set(comb)) for comb in combs_IS]
? ? ? ? # Split the rows of the dataframe into partitions
? ? ? ? splitted = np.array_split(self.SUMMARY.index, self.n_partitions)
? ? ? ? # Combine the partitions into IS and OS sets
? ? ? ? OS = [np.concatenate([splitted[i] for i in [*comb]]) for comb in combs_OS]
? ? ? ? IS = [np.concatenate([splitted[i] for i in [*comb]]) for comb in combs_IS]
? ? ? ? # Return a list of tuples containing the IS and OS sets
? ? ? ? return zip(IS, OS)


? ? def find_OS_rank_of_best_IS(self, tuple_IS_OS):
? ? ? ? # Given an IS and OS combination, find the rank of the best IS in the OS set
? ? ? ? IS = tuple_IS_OS[0]
? ? ? ? OS = tuple_IS_OS[1]
? ? ? ? # Calculate the Sharpe ratio for the IS and OS sets
? ? ? ? ranked_sharpe_IS = self.rank(self.calc_sharpe, self.SUMMARY, IS)
? ? ? ? ranked_sharpe_OS = self.rank(self.calc_sharpe, self.SUMMARY, OS)
? ? ? ? # Find the position of the best IS in the ranked OS set
? ? ? ? postion_of_best_IS_in_OS = np.where(ranked_sharpe_OS.index == ranked_sharpe_IS.index[0])[0][0]
? ? ? ? return postion_of_best_IS_in_OS


? ? def rank(self, function, SUMMARY, mappa):
? ? ? ? # Apply a function to each row of the dataframe for a given index set and return a sorted dataframe
? ? ? ? combs = SUMMARY.loc[mappa].T.values
? ? ? ? return pd.DataFrame([function(i) for i in combs]).sort_values(by=0, ascending=False)


? ? def calc_logit(self, k):
? ? ? ? # Given an IS and OS combination, calculate the logit
? ? ? ? os_rank = self.find_OS_rank_of_best_IS(k)
? ? ? ? rel_os_rank = os_rank/self.len_sample
? ? ? ? logit = np.log(1/(0.5+rel_os_rank))
? ? ? ? return logit




? ? def calc_sharpe(self,pl):
? ? ? ? N = len(pl)
? ? ? ? summa = bn.nansum(pl)
? ? ? ? if summa!= 0:
? ? ? ? ? ? sharpe = summa/np.sqrt(N*bn.ss(pl)-summa**2)
? ? ? ? ? ? return sharpe
? ??


? ? def show_results(self):
? ? ? ? df = pd.DataFrame(self.logits)
? ? ? ? fig, ax = plt.subplots(figsize=(15, 5))
? ? ? ? df.hist(bins=400, ax=ax)
? ? ? ? ax.set_xlabel('OUT OF SAMPLE RANK OF THE BEST PARAMETER COMBINATION')
? ? ? ? ax.set_ylabel('number of occurences')
? ? ? ? ax.set_title("Logit Histogram")
? ? ? ? dpi = 500


? ? ? ? # Save the figure to a file with the desired resolution
? ? ? ? plt.savefig('histogram_2.png', dpi=dpi)
? ? ? ??
? ? ? ??

import multiprocessing
import time

def wrapper(args):
? ? return args[0](*args[1:])


def run_simulation(funct,combs):
? ? print(funct)
? ? pool = multiprocessing.Pool(multiprocessing.cpu_count())
? ? start_time = time.time()
? ? print("Calibrating")
? ? results = pool.map(wrapper, [[funct,comb] for comb in combs])
? ? pool.close()
? ? print("--- %s seconds ---" % (time.time() - start_time))
? ? return resultsg

Sergey Frolov

Quantitative Researcher

1 个月

How do you come up with this elegant logit representation? ln the source article author use ln(rank/(1-rank)). I checked several more realization for this methods and they use the formula from the source article. Now I see, that the result is quite similar for both formulas. Thank you

1 次回应

Martin Mayer-Krebs, CQF

Quantitative Portfolio Manager

1 年

Very interesting article Francesco Landolfi. Thanks for writing it! Could it be applied to k-fold cross-validation in general and not only to CSVC in particular?

1 次回应

Carlos Castro

Quant Intern @ MassPRIM - MSc. Finance, MSc. Data Science

1 年

How is it that using OOS periods previous in time to In Sample ones does not introduce look ahead bias?

Piotr Pomorski, PhD, CFA

Machine Learning & AI | Data Science | Quant Research

1 年

Insightful, thanks! Isn’t it dangerous though to perform calibration/validation/testing on all possible window combinations? Won’t that result in de facto parameter overfitting over the data used?

查看更多评论

要查看或添加评论，请登录

Francesco Landolfi的更多文章

Validation Results of a Parametric Model for European Equity Index Futures

2023年7月12日

Validation Results of a Parametric Model for European Equity Index Futures

Introduction This article presents the validation results of a parametric trading model, which has shown remarkable…
Validation of parametric short-term strategies with python: how fast can you really go?

2023年6月28日

Validation of parametric short-term strategies with python: how fast can you really go?

Introduction In this article, the goal is to find the most efficient method in Python for calculating the backtest of a…

3 条评论
Mapping sharpe ratio with critical performance measures to improve live strategy assessments

2023年6月12日

Mapping sharpe ratio with critical performance measures to improve live strategy assessments

Introduction This article draws inspiration from the notable work titled "You are in a drawdown. When should you start…

20 条评论
Estimating live trading confidence levels from multiple walk forward generation

2023年5月26日

Estimating live trading confidence levels from multiple walk forward generation

In the previous articles, https://www.linkedin.

8 条评论
Model-Specific equity control methodology from walk forward validation"

2023年3月3日

Model-Specific equity control methodology from walk forward validation"

In the previous articles, https://www.linkedin.
Deriving a Model-Specific Equity Control Methodology from Time-Aware Validation Techniques"

2023年3月2日

Deriving a Model-Specific Equity Control Methodology from Time-Aware Validation Techniques"

In the previous articles, https://www.linkedin.

2 条评论
What's the threshold for concern when a strategy's PL remains negative? A numerical study on the connection between Sharpe and latest negative time.

2023年2月22日

What's the threshold for concern when a strategy's PL remains negative? A numerical study on the connection between Sharpe and latest negative time.

Systematic investing has a certain charm that draws many investors to its purely mathematical approach to…

3 条评论
From Sharpe Ratio to Max-Drawdown, a numerical approach.

2023年1月30日

From Sharpe Ratio to Max-Drawdown, a numerical approach.

Sharpe ratio is the most widely accepted measure to quantify the quality and the performance of a systematic strategy…

10 条评论
Validation of a short term parametric trading model with genetic optimization and walk forward analysis with python

2021年11月19日

Validation of a short term parametric trading model with genetic optimization and walk forward analysis with python

In this article we will go through a complete validation of a parametric short term trading model for futures. The…
Validating a parametric trading system calibrated through a genetic algorithm with python.

2021年11月2日

Validating a parametric trading system calibrated through a genetic algorithm with python.

In this document I am presenting one approach to assess the robustness of a parametric model to trade futures…

17 条评论

See all articles

Decoding the concept of probability of backtest overfitting: a step-by-step guide with python scripts and visual?aids.

Francesco Landolfi

Defining the model and collecting data:

Defining the?CSVC.

领英推荐

An Example:

Appendix: Python script.

Francesco Landolfi的更多文章

社区洞察

其他会员也浏览了

NaN, NaT and None - What's the difference?

DSA Questions on Lists using Python

Python basics

Python for AI/ML - Day 4

Handling strings in python

Explanation of Mutable, Immutable and the Principle "Everything is an Object."

Day # 2 ?? Unlocking the Power of Python: Operators in Python

Can one Line of Python Win a Contest at Microprediction.Org?

How to Simulate the Monty Hall Problem to Predict Your Chances of Winning in Python + R

Introduction to LaModel: Automate Results Saving with Python

Defining the model and collecting data:

Defining the?CSVC.

领英推荐

An Example:

Appendix: Python script.

Francesco Landolfi的更多文章

Validation Results of a Parametric Model for European Equity Index Futures

Validation of parametric short-term strategies with python: how fast can you really go?

Mapping sharpe ratio with critical performance measures to improve live strategy assessments

Estimating live trading confidence levels from multiple walk forward generation

Model-Specific equity control methodology from walk forward validation"

Deriving a Model-Specific Equity Control Methodology from Time-Aware Validation Techniques"

What's the threshold for concern when a strategy's PL remains negative? A numerical study on the connection between Sharpe and latest negative time.

From Sharpe Ratio to Max-Drawdown, a numerical approach.

Validation of a short term parametric trading model with genetic optimization and walk forward analysis with python

Validating a parametric trading system calibrated through a genetic algorithm with python.

社区洞察

其他会员也浏览了

NaN, NaT and None - What's the difference?

DSA Questions on Lists using Python

Python basics

Python for AI/ML - Day 4

Handling strings in python

Explanation of Mutable, Immutable and the Principle "Everything is an Object."

Day # 2 ?? Unlocking the Power of Python: Operators in Python

Can one Line of Python Win a Contest at Microprediction.Org?

How to Simulate the Monty Hall Problem to Predict Your Chances of Winning in Python + R

Introduction to LaModel: Automate Results Saving with Python