Deep Reinforcement Learning for minimizing portfolio variance 2
https://unsplash.com/photos/ZiQkhI7417A

Deep Reinforcement Learning for minimizing portfolio variance 2

This post should be read as an extension to my previous post (link).

In this post I will dive deeper into mechanics of Reinforcement Learning, going through how the agent is able to optimize his behavior. Further, I will implement some my own suggested improvements from the previous post. Specifically, I will implement a Convolutional Neural Network (CNN) architecture and I will allow the agent to use additional input parameters when optimizing his actions.

The conclusion I reach is that both of the network strategies (the old FC and new CNN), are able to outperform a benchmark strategy, even though they choose very different investment strategies.

How to optimize agent behavior

Deep Reinforcement Learning (DRL) is one of the three primary subfields of Machine Learning (the other two being supervised learning and unsupervised learning). What differentiates DRL from the other two fields is the use of an agent/environment setup. In DRL an agents action affects the environment, which in return gives the agent a reward for his action. The agent is concerned with optimizing the cumulative reward. This means that the agent can take suboptimal short-term decisions, to reach a better long-term performance.

But how does the agent go about improving his decisions in the first place? Let’s revisit the network architecture from the previous post in Figure 1. At t = 0, the agent has available the State of the world S_t. This is a [3x30] matrix, containing AT1, Crossover, and Bund returns for the previous 30 trading days. If we look at the other end of the network, it outputs an Action A_t, which is a [1x2] matrix, containing the Crossover and Bund weights.

Figure 1

Der er ingen alternativ tekst for dette billede

If we were doing a supervised learning optimization, we would try to map some input to a known labelled output. Imagine looking at housing prices. In this situation we could try to map some inputs (size, age, condition, location etc.) to the sales prices. The way this supervised learning optimization would be done, is to minimize the cumulative squared difference between the models estimated price, y_hat, and the actual price, y, for all i out of N observations. This is a type of loss function and can be written as below. If the cumulative loss is zero, the model is able to perfectly map the input to labelled output.?

Der er ingen alternativ tekst for dette billede

In our DRL case, we do not have a labeled output that we can tell the model to map to. Instead we want it to minimize portfolio variance. To achieve this, we add another step after the model outputs the Action A_t; we create an expression for portfolio variance – see the equation below. The ‘Market ret’ contains the Bund and Crossover returns for time t. This variance is the reward, R_t+1, the agent receives after choosing his action A_t, and it is this expression we want to minimize. In order for the agent to be able to see what has happened in the past, I add the variance from the previous period (effectively this means that the agent minimizes the cumulative variance).?

Der er ingen alternativ tekst for dette billede

So this is all fine, but how does one go about finding the values that minimize the above expression? The ML model shown in Figure 1 has approximately 1,800 weight parameters available to optimize over. You could iteratively try to randomly set alle the weights and find the optimal solution, but this would be hopelessly slow. The solution I have used is Backward Propagation of Errors (Backprop).

Backprop is a method that lets you see the instantaneous effect of an incremental change to each of the weight parameters with respect to the loss function. This allows you to alter all the weights incrementally in the right direction.

The general way Backprop works, is that you feed the input to the model and let is pass through the neural network, with the weights initially set randomly. This is called the forward pass. Now you have an initial value for the variance, which is probably sub-optimal, due to the random weights. Then you adjust each weight to lower the variance function slightly. This is called the backward pass, because we move from the variance back through the network to adjust the weights. Now we are back where we started, but with weights altered slightly in the right direction. This is a single loop in the algorithm, which is then iteratively repeated.

In Figure 2 below, I have shown an illustrative network architecture with only one hidden layer with one neuron. This is only an illustration, missing some key details, but it conveys how the optimization work. I have zoomed in on what happens inside the neuron. ?In the forward pass, the state of the world S_t gets scaled with the weight W_1, which yield Z_1. It is the W_1 weight we want to optimize to lower the variance. Then we perform a non-linear differentiable transformation, f(Z_1), which is at the core of ML optimization. I have used a Rectified Linear Unit (ReLU) activation function. This yields the output of the network, A_t, which we use to calculate Var_t.

Figure 2

Der er ingen alternativ tekst for dette billede

Because the variance function is produced through a sequence of differentiable equations, we can use the chain rule to derive an expression for the partial derivative of the variance with respect to the weight parameter. This partial derivative is called the gradient, and it shows which way to alter each weight (up or down) to lower the portfolio variance.

Der er ingen alternativ tekst for dette billede

After changing the weight, we have completed a loop in the backprop algorithm, and we can now start a new forward pass, with the updated weights.

The hope is that the weight optimization settles down at an optimum. The real work is often the trial and error process of trying different network architectures and hyper parameter settings to test if you can find a better optimum and out of sample performance for your trained network.

This was a brief and mathematical superficial walkthrough of how the agent actually go about optimizing his behavior.

Convolutional Neural Networks

I will also touch briefly on Convolutional Neural Networks in this post, as it is a wide topic but yet interesting for multivariate time-series analysis, and therefore portfolio optimization.

CNNs revolutionized image recognition in ML because it was able to drastically reduce the number of weight parameters you need to optimize over. Even small pictures can have thousands of pixels, and the number of pixels increase exponentially as the pictures become higher resolution. Quite fast, it becomes infeasible to use fully connected neural network architectures for image recognition.

What CNNs do, as the name suggest, is to use a convolution on the data to transform it. A convolution is a mathematical operation where you sequentially evaluate part of the input to create part of the output. It’s like a small prism that you hold over part of the input to get part of the output. When you have systematically evaluated all of the input with this prism, you have created an output that only depends on this prism you use – and the number of parameters here are quite low. ?

In Figure 3, you can see an illustration of a convolution. The larger light blue 5x5 matrix is the input data, the dark blue smaller 3x3 matrix on top is the convolution. The green 3x3 matrix to the right is the output after having been convolved. You can also see the sequential nature of the operation; starting from the top left of the input to create the top left of the output. ?

This also illustrates how convolutions keep the number of estimation parameters low; no matter how large the input is, the dark blue convolution will only have 3x3 parameters we need to estimate. We could slide this 3x3 dark blue convolution across a 10.000x10.000 pixel image for example.

This might seem too simple to actually work, which is somewhat true. Actual CNN architectures used in state of the art image recognition often use a complex combination of convolutions and fully connected layers (and other types of transformations). But the basic idea is still the same; convolutions drastically reduce the number of estimation parameters – and it works.?

Figure 3 (Source)

Der er ingen alternativ tekst for dette billede

The reason why I spend some time introducing CNNs, is because of its use in multivariate timeseries analysis. Instead of feeding the CNN a square image, you could feed it a matrix of returns. For example, 10 securities with 100 observations would be a 100x10 matrix. The operation above would be exactly the same, apart from the dimensions being different. Where CNNs get really useful is, like in image recognition, when your input data gets really large. Imagine feeding the network 10 years of daily observations for thousands of securities – a fully connected architecture would probably struggle at some point.

To give an intuitive explanation of what a convolutional layer does; you could see it as searching for a specific feature in the input that repeats itself. When you have many convolutional layers on top of each other, your network is able to locate a range of different features in the input data.

When we use a convolution on multivariate timeseries data, the hope is that each layer is able to capture some important feature in the data. This could be a layer looking for what happens when volatility spikes, another could be looking for decouplings, where one of the inputted time-series start to deviate from the normal relationship with the market. And so on.

Ideally, after having trained the network, it will be able to recognize if some of these complex features appear in the future. History does of course not repeat itself, but if it rhymes, the network could be able to detect it and have an idea of what worked in the past. The real strength is that the network is able to recognize features in the data that would be impossible to recognize with human perception, as the features can be scaled, twisted, or inversed across this multidimensional space.

Data

As an improvement to the previous analysis, I have increased the number of timeseries’ that I input into the model. The result is that the network can see a larger part of the market when finding the optimal investment strategy. The hope is that this larger input dataset provides additional useful information.

The input data consist of 12 timeseries’ now; iTraxx Main spread, iTraxx Main total return, iTraxx Crossover spread, iTraxx Crossover total return, Bund futures total return, AT1 spread, AT1 total return, SXXP index total return, DAX index total return, the VIX index absolute changes, US ISM index levels, and European 5-year zero coupon inflation swap absolute changes. ?

I have used 80% of the data for training (05JAN2015 to 28JAN2021), 10% for validation (29JAN2021 to 02NOV2021), and 10% for testing (03NOV2021 to 04AUG2022). I have trained the model on the training data, but chosen the model configuration that minimizes the validation loss.

Source: Bloomberg Finance L.P.

Network architecture

I have set up the analysis slightly different this time; I have allowed the network to allocate between 3 securities when minimizing the portfolio variance. The network has available the iTraxx Main, iTraxx Crossover and Bund Futures as hedging instruments.

For the Fully Connected (FC) network?I have used the same overall architecture as in the last post. I have used 30 days of historical returns. With the updated input and output dimensions, it looks as follows

Figure 4

Der er ingen alternativ tekst for dette billede

For the CNN I have used the following network architecture, with 50 days historical returns. I have not covered how filters, kernels, strides, and MaxPooling layers work, but they are hyperparameters and transformations in the CNN. Further, I have not covered how the dimensions come to be as they are, but it’s a result of reshaping the ‘normal’ [12x50] input to fit the TensorFlow convolution setup. You can find more info here (link).

Figure 5

Der er ingen alternativ tekst for dette billede

Model result

The graphs are a bit more busy now than last time, but I found comparison the easiest when plotted on the same graph. In Figure 6 below I have shown the total return of the constituents and the portfolios to the left, and the iTraxx Main, iTraxx Crossover, and Bund weights to the right. FC is the fully connected network, CONV is the convolutional network. I have only shown the validation and test sets, as the training set graph contained too much data to meaningfully deduct anything from.

If we start by looking at the portfolio return for each network strategy to the left, we see that they are quite similar, keeping within a couple of percentage points total return of each other at all times. They are of course both driven by the returns on the AT1 securities, but none the less, each network strategy has produced a somewhat similar outcome. This is especially interesting when we look at the weight composition of each network strategy to the right; they are remarkably different.

Both network strategies have used Bunds only sparsely, but especially the convolutional network does almost not utilize Bunds at all. This fits well with an AT1 being a credit risk-intensive security, containing relatively less interest rate risk. If we look at the FC network strategy, it has chosen a strongly correlated relationship between the iTraxx Main and Crossover, being short in both and moving the weights in a kind of scaled tandem. On the other hand, the convolutional network has chosen an inverse relationship between the iTraxx Main and Crossover, where it is short the Crossover and long the Main in a much more volatile fashion.

Intuitively I would prefer the FC network strategy, as it seem more ‘straight forward’ and less volatile in its composition. My fear with the CONV strategy is that this “long Main short Xover” strategy could suddenly blow up, because you rely on a more complex Main/Xover dynamic, in order to lower the AT1 volatility.

Figure 6

Der er ingen alternativ tekst for dette billede

Model performance

To gauge the performance of my two network strategies, I need to compare it to a benchmark strategy. I have used the same approach as in the last post; my benchmark strategy is the fixed weight optimum from the training set. If this is unclear, please see the previous post for a more detailed explanation. On Figure 7 below you can see the performance in the training set.

The strategies presented in Figure 7 has the following annualized standard deviations; FC 7.93%, CONV 8.03%, and fixed weight optimum benchmark 8.03%. The weights of the fixed weight optimum are; Xover -78.2%, Bund -6.5%, and Main -37.7%.

We can see that the network strategies match or outperform the benchmark in the training set. We also see that the volatility of the network strategies are less below the benchmark than in the previous post. This is not necessarily a bad thing, as a very small training set standard deviation could indicate overfitting. I am interested in producing an all-weather strategy, that performs well out of sample.

Figure 7

Der er ingen alternativ tekst for dette billede

Figure 8 below shows the validation set. We can see that both of my strategies produce a lower annualized standard deviation than the benchmark strategy.?

Figure 8

Der er ingen alternativ tekst for dette billede

On Figure 9 below you can see the test set; the true out of sample test for my strategies. In the previous post my FC network was not able to outperform the benchmark strategy; this has changed. Now, both of the network strategies are able to produce an annualized standard deviation that is lower than the benchmark. The standard deviations are; FC 6.80%, CONV 6.83%, Benchmark 7.23%.?

Figure 9

Der er ingen alternativ tekst for dette billede

Concluding remarks

The result presented here show that Deep Reinforcement Learning can indeed outperform a simple static benchmark strategy when tasked with lowering portfolio variance. Especially the FC strategy seems reasonably logical and stable – both being important if one were to move forward with implementing the strategy.

Compared to the previous post, the major change was that I added more input variables to the network, which gives the network a greater visibility when optimizing its behavior. From the results presented here, increasing the input is indicated to improve network performance, regardless of architecture type.

I also gave a brief introduction to how a convolutional neural networks function. The CNN architecture allows us to increase the amount of inputted data considerably. How would the network perform if I inputted 5.000 or 10.000 security prices? The only hindrance for this is the amount of data one has available. Inputting huge amounts of data into a DRL analysis has shown extraordinary capabilities when it comes to portfolio allocation (link).

Feel free to reach out with any questions, comments, or suggestions to the above analysis.

Further work

-??????Even though my strategies include trading costs of 25 bps, it could be interesting to make the network strategies easier to implement operationally. An idea I would like to look at, is to make the trading less frequent, perhaps restricting the network to changing the weights on a weekly or monthly basis – but still giving it daily visibility on the data. This would be easier to rebalance in real life, as opposed to daily trading.

-??????Another interesting next step would be to change the universe and start looking at listed equities. It could be interesting to create an investment strategy for ETFs, that is investable for retail investors.?

要查看或添加评论,请登录

Mathi Danmark的更多文章

社区洞察

其他会员也浏览了