How Statistical Arbitrage based Algorithm Trading combined with ML can be a winner in Developing Economies
INTRODUCTION
Algorithmic Trading Strategies, from the outside, sound extremely fancy and too complicated. However, they are nothing but a bunch of rules that are used on a Stock Exchange to automate the execution of orders without any human intervention. Algorithmic Trading Strategies require a quantitative model which runs upon live market data parameters such as price, volume, etc. and ensure a positive Alpha at the end of the trade.
All algorithmic trading strategies can be broadly classified as:
1. Momentum/ Trend based strategies - Follows the current Market momentum. In simple words: Buy high and sell higher, and vice-versa.
2. Market Making strategies - Market Makers provide liquidity by quoting both Buy & Sell prices, thereby profiting on the Bid-Ask spread.
3. Arbitrage strategies – These are event-driven strategy, and are thus market movement-neutral. Arbitrage can be across different Stock Exchanges, geographies, currencies etc.
4. Statistical Arbitrage strategies - Also known as Pairs Trading strategy, such arbitrage opportunities arise out of misquoting of prices in pairs.
This document will focus on the Statistical Arbitrage Strategy, formulate our own model and apply it on the Indian Stock Markets.
Statistical Arbitrage, or Pairs Trading Strategy, is an extremely effective strategy, especially in developing markets which have high liquidity and significant arbitrage opportunities. This strategy is based on the concept of Mean Reversion of a pair. Here, stocks that exhibit historical co-movement in prices are paired using fundamental similarities. When one stock outperforms the other, the outperformer is sold short and the other stock is bought long. Here, the expectation is that the short term diversion will end in convergence. One of the major advantages of this strategy is that this strategy is Beta neutral and hedges market risk.
Let us understand this with the following example:
It is known from historical data that IndusInd Bank and Axis Bank share prices are highly correlated. If IndusInd Bank share prices rise by 2%, we know that Axis Bank share prices will rise by >1%. If they haven’t risen yet, Axis Bank shares can be bought to make a profit.
METHODOLOGY
Below, we’ll formulate our own strategic model and quantify the stability using R programming.
We’ll apply this strategy in the context of Indian markets on Nifty200 Index companies, which represents 80%-90% of the Indian Stock Market. Historical data for these tickers is obtained from the beginning of 2014 till date, thereby increasing the number of data points, adding to the robustness of the model. This is just a demonstration of the application of this strategy, and assumes that stocks are allowed to be sold short. Complications might arise in the Indian markets where SEBI mandates the usage of Securities Lending and Borrowing System for Short selling.
This data is divided into 2 parts:
a) In-Sample data – To understand the pattern and derive the model. All data points from the start of 2014 through to the end of 2017 were used.
b) Out-of-Sample data - To test the model. All data points from 2018 till date were used
Establishing Statistical Significance
Before designing the model, it is essential to determine which stocks will be eligible for trading. The data is first cleaned for uniformity. Only tickers with complete set of data points over the time horizon and a minimum daily trading volume of 15 million INR are eligible for trading.
Now that the data has been tested for uniformity and liquidity, we form pairs of all the stocks and each pair is tested for co-integration, i.e. to check if the prices have any similarity over a time-series data. For this purpose, we use the inbuilt function in R for an ADF (Augmented Dickey-Fuller) test to check for stationarity in the time-series. The resultant p-value is used to reject the null hypothesis that a unit root is present in the sample data. We reject all stock pairs which have co-integration p-value less than 0.015.
The final filter used for data cleaning is correlation. A simple R function provides the correlation value between all the pairs, and only pairs with correlation > 97.5% are retained for our model. This leaves us with a total of 65 pairs of stocks to form our model.
The following is a scatter plot of some of the remaining pairs of ticker data, showing that the price points are highly correlated over time:
Building a Trading Model
For the remaining price ratios, we formulate a z-score table to understand the distribution and deviations from the mean. We will be using a time horizon of 60 days to calculate the moving average and Standard deviations.
The following is a graph of some of the remaining pairs which shows the z-score distribution with +2/-2 standard deviations:
The most important step in formulating a trading strategy is deciding on the Buy and Exit signals. Based on the distribution charts, we formulate the following trading model:
· A Primary Sell signal is generated for the trading pair if the pair is trading between 2 and 2.25 Standard Deviations above the mean
· The Primary sell signal effectively triggers orders to short the relatively expensive stock and simultaneously go long on the relatively cheaper stock in the pair
· For a Primary Sell, we will invest 75% of the allocated Risk capital
· Another Secondary Sell signal is triggered for the remaining 25% of the Risk capital when the z-score crosses 2.25 of the Standard Deviation
· An analogous Primary Buy & Secondary Buy signals are triggered when pair is trading below -2 & -2.25 times of Standard Deviation below mean
· Exit signals for the trades are triggered when the pair’s z-score crosses 0
Back-testing and Optimization
One of the most important steps in building a trading model is back-testing it across a wide range of data points. The pairs are back-tested on the data available from 2014. The following table provides the final output of the overall model:
The following equity curve plots the returns of the entire strategy:
We can observe that the out-of-sample results are lower than the in-sample statistics. They, however, still give exceptional Risk-Adjusted returns which are higher than many instruments. The model gives an overall CAGR of 20.6%, with an out-of-sample CAGR of 13.5%. The strategy has a low drawdown, i.e. the max loss in a trade is significantly low, allowing flexibility in the use of leverage.
An overall Sharpe Ratio of 5.3 is derived, using a Risk-free rate of 6.47 %( the current 10-year Government of India Sovereign Bond Rate). This indicates an attractive rate of Risk-adjusted returns.
SCOPE OF MACHINE LEARNING
The above described trading model involves implementation basis a fixed period (in above case it is 60 days) to calculate Moving average for z-score. The trading signals, trading boundary and Stop-loss boundary is designed on the basis of this z-score and Standard deviations.
However, what if the pair doesn’t follow Mean reversion trading model?
Figure: A Traditional Pairs Trading Strategy
The above figure shows a traditional Pairs Trading Strategy, which is implemented in the above report. The graph drawn in blue is a spread made of two stocks that are co-integrated, the red lines are the trading boundaries, and the green lines are the stop-loss boundaries. When this spread reaches the trading boundaries, the portfolio is opened and only closed when the spread returns to the average. However, losses are incurred when prices reach the stop-loss boundaries after the portfolio is opened and do not return to the average. Furthermore, after the portfolio is opened, if the trading signal is not reversed to mean during the trading window, the portfolio is closed by force; this is called the exit position of the portfolio.
To overcome this drawback, Deep Machine Learning can be implemented to make the portfolio intelligent enough to modify its boundaries based on market conditions.
Optimized pairs trading strategy using machine learning
The Pairs Trading Strategy can be optimized using a game known as Deep Q Network(DQN). In the case of the DQN, two hidden layers are set up and the number of neurons is optimized by taking half of input size through trial and error.
A pairs-trading system can make a profit if the spread touches the threshold and returns to the average such that the portfolio is closed in each trading window. On the other hand, if the trading boundary is touched and the stop-loss boundary is reached, the system tries to minimize losses by stopping trades. If the spread touches the trading boundary but fails to return to the average, the strategy may end up with a profit or a loss. In this study, the pairs-trading strategy is therefore considered as a kind of game; closing a portfolio yields a positive reward and a portfolio that reaches its stop-loss threshold yields a negative reward. Although an exited portfolio may possibly generate a positive profit, there is also a possibility that losses will occur and it is therefore set to yield a negative reward.
Figure: Steps for Proposed Pairs Trading Strategy with Machine Learning DQN Method
The above Machine Learning DQN method can be applied to traditional Pairs Trading Strategies to reduce the uncertainty, thereby making the portfolio optimized and extremely efficient, increasing the bottom line.
A study of the application of this DQN method in Pairs Trading Strategy has shown a steadily increasing average of Q-values, which is evidence that the DQN machine is learning well.
KEY FINDINGS
- When compared with other instruments like the Nifty200 Index, Gold, a highly rated MF, this strategy provides exceptional returns of 20.68% and far outperforms other instruments. The out-of-sample CAGR of 13.4% also well exceeds the Nifty200 Index.
- Machine Learning can help further improve the efficiency and performance of this strategy, resulting in an increased CAGR.
- There are significant opportunities available to be seized in developing economies. The share markets in such geographies provide high levels of liquidity, and greater opportunities for arbitrage. Additionally, these markets are not yet tapped using advanced algorithmic trading, unlike the advanced markets of US and Eurozone, where such arbitrage opportunities are few. Mutual Funds and Hedge Funds houses operating in Developing Economies like LatAm, South Asia etc. are increasingly adopting Analytics and Algorithmic Trading. The same can be replicated across other instrument classes like Indices, Futures, Options as well as Crypto Currency trading.
Transforming Business and Data | Ambassador of Diversity, Equity and Inclusion | Director at UBS
4 年Great work Samrat. This is really nice??
Adjunct Faculty | Data Science; Data Analytics Manager - Deloitte Middle East (Dubai)
4 年Good work ??