登录查看更多内容

If your pricing models do not run fast enough

Thomas Obitz, MSc, FRM

Market and Model Risk Transformation - Hands-on FRTB and AI Expert

发布日期: 2021年3月26日

Approximating Pricing Functions by Neural Networks vs RBFs

Some day in 2017, I attended a talk at an industry conference on approximating XVA pricing by neural networks. Having some background in approximation theory, I was wondering how well this would work, and whether this was a good idea. Surprisingly, there were loads of theoretical papers on what neural networks could approximate (well, "everything"), but I did not come across any paper looking at the approximation quality on a real-world pricing function.

This question expanded into a thesis comparing the approximation behaviour of artificial neural networks (“ANNs”) to radial basis functions, the leading multi-dimensional approximation method from functional analysis.

And the results was, as always: you can make it work either way. But with a bit of mathematical insight, RBFs may yield significantly better results with a fraction of the effort. And no – throwing a random neural net of arbitrary topology and activation function at a problem usually is not the best solution.

Why approximation of pricing functions is “en vogue”

XVA calculation – which is notoriously calculation intensive – is not the only reason why approximation is a rather “hot” topic at the moment. FRTB has resulted in an explosion of computational demand in market risk calculation. Furthermore, with given compute capacity, it can be more precise calculating prices with higher precision in some support points and interpolate, rather than running the pricer at lower precision for each point where a price is needed. Or, alternatively: A bit of approximation theory can save many millions in hardware investment or cloud CPU cost.

Approximation – the “classical” way

Most of us have come across polynomial approximation; many of us know that it behaves in a fairly unimpressive way unless we use Chebyshev points as supports. Extending this approach to multiple dimensions does not look very compelling, amongst others because the grid to support the approximation quickly becomes so large that it is faster to price the derivative than to populate every grid point. A few theoretical issues (such as the Meyerhuber—Curtis theorem) get in the way as well. So multi-dimensional approximation requires a more sophisticated approach.

Figure 1: Approximation by radial basis functions

Radial basis functions are the “pocket knife” provided by functional analysis to deal with this issue. Introduced in the 1930ies and booming since the 70ies and 80ies of the last century, they are a powerful tool of approximation in a multi-dimensional space. The idea is as simple as it is intuitively compelling: As shown in the picture, the approximation is supported by a number of smooth “bumps” which collectively approximate the surface (it’s obviously a bit more complicated – and how to optimize this approximation is still an area of active research). Small modifications, such as small moves in support points which I described in my thesis, can improve precision of the method by orders of magnitude.

Approximation by neural networks

We train a neural network either on the outputs of a pricing model, or even on market prices themselves, and we obtain a highly efficient pricing engine producing “good enough” prices. Too good to be true? Indeed. Let’s unpick the complexities, and how to deal with them.

First of all – the activation function. Sigmoid is bad, ReLU is great, but in the end they all do the same, so pick one by trial and error (forgive me, “hyper-parameter grid search” sounds much better), and all is good, right? No. Nothing further from that. The activation function carries the interpolation, and if it is not smooth, neither will be the interpolation. Specifically, a ReLU function (i.e. max(0,x)) is not differentiable in 0, and the neural network will produce nothing but a linear spline. That’s all? And why, then, is everybody so excited about it? Well… if you are doing a grid search, you will use a limited number of iterations, so you will find the activation function which converges fastest, not the one which results in the best fit. Consistent pattern: ReLU converges quickly, but the final result is – not so good. Sigmoid converges slowly, but if you give it time, works much better. You want both? Try Swish.

Figure 2: After 1000 iterations: Swish far ahead

Second – network complexity. A fully connected (“dense”) neural network with 10 by 10 nodes in three layers has more than 30,000 individual parameters you need to calibrate. You only have 500 data points? Good luck. However, using a rather recent result of Mashkar and Poggio, a neural network with a depth aligned with the calculation tree of the pricing function can provide surprisingly good out of sample performance. Magic? No, maths.

Technical challenges add to the complexity. It is always helpful to sanity-check your results – there were constellations in which TensorFlow seemed to degrade to 16 bit precision. And if you are trying to cram a massive tensor operation into the memory of your GPU, a bit of basic linear algebra will come in quite handy.

If you made it up to here, I guess I built enough credibility with you to have a swipe at my pet pieve, the universal approximation theorem. It shows that a one-layer neural network with sigmoid activation can approximate any continuous function at any level of accuracy and is quoted in every presentation on machine learning for pricing at least once (without being used any further). The result is as intuitive as it is meaningless – a sigmoid has values from zero to one, and what Cybenko’s proof does is (more or less) to show that if you string enough of these sigmoids together, you can follow the ups and downs of any arbitrary output function at arbitrary precision. That’s not that impressive.

However, there are much more interesting results, establishing much better bounds – some of them even independent of dimension of the problem! – and for a broad range of activation functions. Very recent results explore the role of depth of the network in out of sample performance of predictions. For me, one of the most exciting findings is the link mentioned above between the structure of the calculation tree of a pricing function and the optimal depth of the network approximating it.

Approximating derivatives pricing functions – the proof of the pudding…

My thesis uses our old friend Black-Scholes as a a toy example. It’s rather smooth, apart from a kink at the strike when vola goes to zero. This kink actually makes it interesting, and both RBFs and neural networks are running into difficulty in that region, as the red spots in Figure 3 show.

Let’s start with the most interpretable results: Approximating Black-Scholes over a two-dimensional price—volatily grid has an MSE of 0.004 using a Gaussian RBF approximation vs 0.013 in the best neural network configuration. Maximum squared error is 0.77 vs 0.58. This gives a win to RBFs on MSE, and almost a draw for maximum error (with some advantage for NNs).

Figure 3: Approximating Black Scholes

Looking at compute performance though, RBFs win hands-down: Finding the optimal lambda (the only configuration parameter of the RBF approximation) takes about two seconds on a standard GPU; fitting an individual approximation takes milliseconds (it’s basically a matrix inversion). In contrast, optimizing the hyper-parameters for the neural network requires several hours, and a few minutes for training an instance.

In higher dimensions, RBF approximation still produces (kind of) reasonable results, with an MSE of 0.03 for a strike/ vola/ time grid, and a (not quite practical) MSE of 0.21 in four dimensions (with the interest rate added). Neural networks were so slow they were not practical on the hardware available.

In a nutshell: In terms of precision, RBFs have an advantage. In terms of speed, they are light years ahead of neural networks.

It is worth mentioning that RBF Networks combine both approaches. Their approximation quality probably is a topic for the next project…

Optimizing results

In terms of approximation quality, there are a number of approaches for improving approximation quality of the RBF approximation by about two digits; that’s something to try once I find a bit of time in my day job.

But more importantly: The role of GPU can’t be over-estimated. A mid-range GPU (NVIDIA 2070S) provides a two to four times accelleration to training a neural net (compared to an Intel i7 processor). However, the RBF approximation (which heavily depends on matrix operations) gain a factor of 50 – 60. The effort for porting the Python code of the RBF approximation to the GPU (using cuPY) was about one afternoon, including all memory optimizations and slicing of the matrices to fit into the GPU memory. That was probably less than what it took me to convince Tensorflow to cooperate with my GPU…

In summary

Yes, neural networks are hot. And indeed, they can be used as approximators without much knowledge of approximation techniques. However, classical methods of functional analysis will often perform better and be way more compute effective in accelerating a pricing or risk management platform. So it is a good idea trying out these methods before investing into more hardware. Or talking to an expert. Looking forward to hearing from you...

Thomas Obitz, MSc, FRM的更多文章

FRTB delayed – but CVA to go ahead

2024年8月13日

FRTB delayed – but CVA to go ahead

EU hedges its bets against the US – but with exceptions It’s official now: After the EU Commission has adopted a…

1 条评论
FRTB SA-CVA: The Standardized Approach that is much more model than you thought

2023年3月8日

FRTB SA-CVA: The Standardized Approach that is much more model than you thought

Some banks see a more than 200% increase for CVA Risk from FRTB – a good reason to use the more sophisticated SA-CVA…
Australia deprioritizes FRTB capital rules

2023年2月6日

Australia deprioritizes FRTB capital rules

Go-live planned only in 2026 Last week the Australian regulator APRA outlined its policy priorities in an Information…
28th June 2023: Do not get caught on the wrong side of the FRTB trading book/ banking book boundary

2022年11月18日

28th June 2023: Do not get caught on the wrong side of the FRTB trading book/ banking book boundary

FRTB Trading Book/ Banking Book transfer and IRT rules are scheduled to go live in Europe in Summer 2023 as part of CRR…

3 条评论
The EBA Work Programme for 2023 – 2025 will get Europe ready for FRTB

2022年11月2日

The EBA Work Programme for 2023 – 2025 will get Europe ready for FRTB

The new market risk framework is drawing to conclusion The EBA has published its work programme for 2023 and following…

3 条评论
Updates on the capital impact of Basel III (“Basel IV”) and FRTB – Oct 2022

2022年10月31日

Updates on the capital impact of Basel III (“Basel IV”) and FRTB – Oct 2022

It may look like the Basel Committee is cheating – but estimates have large error bars On Sept 30th 2022, the Basel…

4 条评论
FRTB: Picking up Speed

2022年8月9日

FRTB: Picking up Speed

FRTB is now firmly in the process of implementation in most major legislations. In this article we review the state of…

8 条评论
Why Model Risk is a Risk Worth Taking

2020年4月20日

Why Model Risk is a Risk Worth Taking

Model Risk has often been classified as an operational risk, and it was tackled using a “control and avoid” strategy…

10 条评论
TRIM is ready: The Risk-Type Specific part of the ECB Guide to Internal Models (TRIM) July 2019

2019年7月9日

TRIM is ready: The Risk-Type Specific part of the ECB Guide to Internal Models (TRIM) July 2019

By Thomas Obitz, FRM Yesterday (July 8 2019), the ECB has published the final version of the risk type specific…

6 条评论
FRTB Update March 2018: Getting closer

2018年3月23日

FRTB Update March 2018: Getting closer

Basel starts consultation on the FRTB update: Today’s document addresses many of the challenges of FRTB – and invites…

1 条评论

See all articles

Approximating Pricing Functions by Neural Networks vs RBFs

Why approximation of pricing functions is “en vogue”

Approximation – the “classical” way

Approximation by neural networks

Approximating derivatives pricing functions – the proof of the pudding…

Optimizing results

In summary

Further Reading

Thomas Obitz, MSc, FRM的更多文章

FRTB delayed – but CVA to go ahead

FRTB SA-CVA: The Standardized Approach that is much more model than you thought

Australia deprioritizes FRTB capital rules

28th June 2023: Do not get caught on the wrong side of the FRTB trading book/ banking book boundary

The EBA Work Programme for 2023 – 2025 will get Europe ready for FRTB

Updates on the capital impact of Basel III (“Basel IV”) and FRTB – Oct 2022

FRTB: Picking up Speed

Why Model Risk is a Risk Worth Taking

TRIM is ready: The Risk-Type Specific part of the ECB Guide to Internal Models (TRIM) July 2019

FRTB Update March 2018: Getting closer