Measuring the value-added of algorithmic trading strategies
Standard performance statistics are insufficient and potentially misleading for evaluating algorithmic trading strategies. Metrics based on prediction errors mistakenly assume that all errors matter equally. Metrics based on classification accuracy disregard the magnitudes of errors. And traditional performance ratios, such as Sharpe, Sortino, and Calmar are affected by factors outside the algorithm, such as asset class performance, and rely on the normal distribution of returns. Therefore, a new paper proposes a discriminant ratio (‘D-ratio’) that measures an algorithm's success in improving risk-adjusted returns versus a related buy-and-hold portfolio. Roughly speaking, the metric divides annual return by a value-at-risk metric that does not rely on normality and then divides it by a similar ratio for the buy-and-hold portfolio. The metric can be decomposed into the contributions of return enhancement and risk reduction.
For full post and references to the underlying paper please view the?Systemic Risk and Systematic Value site.
Popular algorithm performance metrics
“We reviewed 190 articles presenting either several ML and DL algorithms aiming at predicting future asset returns or RL algorithms proposing investment strategies. The performance metrics found in the analysed articles are very diverse…
Error-based metrics?estimate the performance of an algorithm in?measuring the error in prediction between the effective return computed ex-post and the value predicted?by the algorithm. These metrics include mean squared error (MSE), mean absolute error (MAE) and evolutions thereof…
Accuracy-based metrics?measure the?accuracy of the class assigned by the algorithm to the predicted return compared to the class of the effective return computed ex-post. The classification can be binary with two classes (positive expected return vs negative expected return, or investment vs no investment) or more complex…These metrics are based on confusion matrices…and include…accuracy, F1, precision or recall.
Investment-based metrics?measure the results derived from an investment strategy proposed by the algorithm with buy-hold-sell signals. These metrics can be subdivided into [two types].
Why popular performance metrics are misleading
“Error-based metrics are among the most popular ones with 187 occurrences in the 190 reviewed articles. Error-based metrics are used in any domain as soon as regressions are involved, but for the specific task considered,?error-based metrics suffer from two severe weaknesses:
“Accuracy-based metrics…focus on a different criterion: the right or wrong classification or the right or wrong investment decision. But?accuracy-based metrics might miss the magnitude of the relative gain from a good decision versus the magnitude of a loss from a bad decision.”
Insights from an empirical exercise
“We?prove the inefficiency of the error-based and accuracy-based metrics …We apply several AI regression algorithms: (i) multi-layer perceptron (MLP), (ii) Long Short-Term Memory neural networks (LSTM), (iii) residual neural networks (ResNet), (iv) Support Vector Machine (SVM) and (v) a decision tree-based algorithm “eXtreme Gradient Boosting” (XGB) to 28 stocks of the Dow Jones. We use different hyper-parameters with each algorithm to generate 980 series of daily returns. We?use 20 years history of daily prices: 15 years are used to train our algorithms and 5 years (1260 days) for testing as out-of-sample data.”
“We compute the MSE, RMSE, MAE (mean absolute error) and MAPE (mean absolute percentage error) of the regressions. We benchmark each of the 980 series with the ‘back-trading’ of a perfectly informed agent that invests when the return is positive and doesn’t invest when the return is negative or zero. We compute R, R2, accuracy, F1, precision & recall and Matthew’s correlation coefficient.”
“We apply the following investment strategy: if the predicted return of the next day is positive, we invest for one day, otherwise we take no open position. In each case, the model integrates direct transactions costs13 of 0.10% per transaction applied to the value of the transaction. From that investment strategy and assuming a risk-free rate at 0.0%, we compute the annual return (RoI), the volatility (Vol), the yearly maximum drawdown (MDD) in percentage of the investment and the Sharpe, Sortino and Calmar ratios.
“With the error-based metrics, we expect a negative correlation with the return, Sharpe, Sortino and Calmar ratios: the lower the error, the better the expected result. In italic, the metrics that are positively correlated.?Against expectations for efficient metrics, correlations are positive, except between MAPE and the risk/return performance metrics, but not significantly different from 0 at 5% significance, as illustrated with the p-values. MAPE is the only metric whose correlation is negative and significantly so.”
领英推荐
The issues with Sharpe and Sortino ratios
Sharpe and Sortino ratios suffer from two important issues…
Proposal for an algorithm performance metrics
“The objective of…[trading] algorithms…is to optimize the expected return of investments under the constraint of the risks generated by the investment. Our analysis will therefore?focus on the ability of metrics to provide a good proxy for the ability of an algorithm to achieve the objective of improving the risk-adjusted return.”
“We propose a new performance metric that improves the risk measurement and which has the ability to compare the efficiency of algorithms over time and across assets.
The overall formula is:
D-ratio = 1 + (R[algo] – R[B&H]) / Abs(R[B&H])
where
R[algo] is the tisk-adjusted return ratio of algo
R[B&H] is the risk-adjusted return ration of buy and hold
The D-ratio can be decomposed to assess whether the added value of the algorithm is more linked to the improved expected return or to the risk reduction ability.
D-ratio = D-return ratio * D-VaR ratio
where
D-return ratio = D-ratio / D-VaR ratio
D-VaR ratio = CF-VaR[B&H] / CF-VaR[algo]
[The] D-return ratio evaluates the ability of the algorithm to increase the expected return. If D-return is above 1, the algorithm outperforms the buy & hold strategy for its expected return. Otherwise, the Buy & Hold strategy is return-wise more efficient than the algorithm.
If D-VaR is above 1, the algorithms outperforms the buy & hold strategy for its risk management, as the CF-VaR of the Buy & Hold is greater than the CF-VaR of the algorithm.
PhD Fellow in Quantitative ML @ UCL - 2026 | Quant Researcher
3 年Niels Escarfail would be a way to tackle the issue discussed earlier.
Quantum Physicist | DXT Commodities SA
3 年interesting ...