登录查看更多内容

Deep Reinforcement Learning in Trading

Saeed Rahman

Quantitative Researcher

发布日期: 2018年7月29日

Reinforcement learning is an exponentially accelerating technology inspired by behaviorist psychologist concerned with how agents take actions in an environment so as to maximize some notion of cumulative reward. This technology is the key driver behind AlphaGo, self driving cars and many other Artificially Intelligence (AI) applications.

Thanks to opensource framework like OpenAI Gym and Deepmind Lab anyone can create cutting edge AI's and test on thousands of games and real life scenarios, whether it be the game Super Mario or modeling a robot to walk.

In this simple example our agent (Mario) sees an opponent and the most optimal action to do is jump to avoid the opponent, meanwhile if Mario can hit the block or move forward to the end of the level, it would get reward (points or advance to next level) . The only way to win this game is to get as much reward as possible without getting killed. How can an agent do this. That's where reinforcement learning comes into play, I won't go into the details of it and make this article boring, but in short its a combination of supervised and unsupervised learning applied in an optimization technique borrowed from control theory. While supervised learning gives us the power of function approximation, reinforcement learning gives us policies, which tell what actions to take.

The agent(Mario) is our AI. During the learning process the agent will see the environment (game screen in pixels) and it will take random actions in the beginning, and these random actions would result in some rewards (resulting from scoring points, rescuing the princess, advancing to the next level...) or getting killed. Say the agent test out all the buttons at the beginning of the game and it found that the "right" button gives it the maximum reward, since it moves forward in game. Similarly after each episodes of interaction with the environment, our agent will learn new maneuvers.

This mechanism of learning might sound so similar, because we have done it and experienced it from our very existence.

A child (a.k.a. the agent) learns to walk through many failures, but after each failure it is the dopamine and serotonin rush inside its brain invoked by the curiosity seeing other 2 legged agents walking around him that motivates (rewards) him to repeat this action until he nails it. Even when we grow up, all our learning whether it be in school or at home have been incentivised by some reward-penalty structure.

Its not just humans, look around and you can see this learning process in all living beings. Reinforcement learning has adopted it principles from nature and neuroscience.

Brain's neuroplasticity is well recognized fact, which states that our brain is plastic and the physical connections between neurons inside the brain changes through an individuals life mainly through learning and its experience's from the interaction with external world. Similarly in reinforcement learning, our agent has a brain, made using Neural Networks. The agent's interaction with the environment which changes the weights (or connections) between these neurons gives it the ability to take actions that will maximize the reward.

Well, why not stock markets? If we can build an agent to play a game, let's create an agent to trade the market. However, the markets unlike a game, is filled with randomness and complexity. In a game, the states are deterministic, if Mario sees an opponent coming its way, then the rationale thing to do is jump over it and the outcome is predictable, but in the stock markets its not the case, states are random in nature. For example, an agent might have learned a particular policy from a pattern that occurred in the historical data during training, but its not necessarily true that the next time this pattern occur in the market the outcome would be the same, that is the market has a lot of uncertainty in it. But just like in the case of price prediction in general, we don't have to be always right. When fused with the right risk management techniques (like position sizing, diversification... ) strategies can be profitable in the long run even if its not always right.

During my final semester at school, I decided to try reinforcement learning in trading. Long story short, I used a Double-Dueling DQN (improvement over the original DQN paper Human-level control through Deep Reinforcement Learning published by Google's Deepmind to play 2600 Atari games) algorithm to design an agent to trade in a single stock environment.

With a few tweaks and examples from OpenAI's Gym game environments and some other open source projects(Trading environment framework), I was able to setup a single security environment that could ingest historical market data for training and testing.

First experiment:

The state that the agent receives at each step would be:

[price(???1), price(??), ???????????????? ??????????, ????????????????]

??????????(??) = current price
??????????(??-1) = price before 1 time period,
???????????????? ?????????? = position entry price
???????????????? = current position ( can be ????????, ????????? ???? ????????)

The agent can take three actions – Buy, Sell or Hold

However, the agent wasn't learning much from the two price points. When a trader looks at price chart, they look at 100,1000 or the entire price series before taking a position. Then why not feed 100's of past price points, but to keep things simple I decided to think in terms of a trader. What does a proprietary trader consider before making a trading decision. Most of the times price is just too complicated to decipher, so traders use function of price and volume commonly known as technical indicators. So I decided to stick with 3 Oscillators, which is already stationary and only scaling is needed before feeding into the neural network.

The design of the trading environment has been that only realized P&L is passed onto to as rewards, but in the real world a trader also looks at the unrealized P&L to know when to close their positions. So to imitate this, unrealized returns was also included as one of the state factors.

New State

[??????(??), ??????(??), ??????(??), ??????(??), ????????????????, ???????????????????? ????????????]

?????? = ?????????????? ?????????????????????? ??????????
?????? = ???????????????? ??????????????? ??????????
?????? = ?????????????????? ????????????? ??????????
?????? = ????????????

Reward: Realized P&L - Trading Fee - Time Fee

To emulate the real world and control the actions of the agent , I also included a commission and a Time Fee (like Rollover rate in Forex or in general time value of money). These rewards are what controls and optimize the agents behaviors during the training phase.

This new State-Reward structure improved the performance of the agent multi-folds. But, whatever the agent learned wasn't transferable across different securities or in a way what works for Apple doesn't always work for Microsoft.

The agent was trained and tested with all the different stocks in S&P 500 between the year 2013 to 2018 with a 3:1 split and the average results were better than a buy and hold strategy during the same time for these respective stocks.

There are many improvements that I would make to this project, starting with

Prioritized experience replay instead of experience replay
LSTM network instead of Vanilla Neural Network
Convolutional Neural Network (a bit counter-intuitive since we already have the price, but worth testing out to see if it can overcome the effect of memory loss resulting from forcing stationarity, an idea from Dr. Marcos Lopez)
A3C (Asynchronous Actor-Critic Agents) algorithm instead of DDDQN.
Curiosity-driven Exploration by Self-supervised Prediction
Multi-agent algorithms and many more....

A typical Quant strategy workflow involves several risk, portfolio and control steps after the predictive modelling and before the execution. Most of the time these modules are loosely coupled and involves frequent calibration and optimization in the strategy life cycle. But with the use of reinforcement learning, we can engineer these constraints within the reward and state structure.

A simple example could be an agent dealing with portfolio of stocks. A common requirements among most investors is to build a portfolio with Beta close to 0 (uncorrelated to the market) and a popular method to do this is to build a Long-Short Factor based portfolio. Using reinforcement learning, we can create a similar portfolio in which we penalize the agent during the training phase if Beta doesn't stay within a range close to zero.

Supervised and unsupervised learning had made great strides in trading, but I believe the next biggest thing is reinforcement learning and it has limitless potential because it is the closest technology to general intelligence.

To learn more - Check out my report and repository

Nasirudeen Yusuf

Algorithmic Trader||Quantitative Researcher||Machine Learning Engineer||Data Analyst

4 年

Great article! I wish to improve upon this project, it's really got me fascinated. Thanks for sharing @Saeed Rahman

1 次回应

Mitra Talebi

5 年

Thanks for great article. Can you explain what is the position you've used as a part of a state?

Shyama Sasikumar

6 年

Congrats Saeed ??

Professor Lisa Wilson

?? TOP 5 WEF2024 Most Important Sustainability Announcements ?? Digital Ecosystems, Climate, ESG, SMEs ??Top20 Global Thought Leaders & Influencers on ESG, Digital Assets, FinTech & Blockchain 20,21, 22??

6 年

Great read!

查看更多评论

要查看或添加评论，请登录

Saeed Rahman的更多文章

AI Assisted Data Catalogs: An LLM Powered by Knowledge Graphs for Metadata Discovery

2024年6月30日

AI Assisted Data Catalogs: An LLM Powered by Knowledge Graphs for Metadata Discovery

Data teams in investment firms play a crucial role in equipping portfolio managers (PMs), quants, and researchers with…

1 条评论
Hedge Fund Data Mastery: Governance and Quality Essentials

2024年4月16日

Hedge Fund Data Mastery: Governance and Quality Essentials

Data is the new oil and this is especially true for hedge funds and asset management firms. Their competitive edge, or…

1 条评论
Quant Platform for Cash Equities

2024年1月24日

Quant Platform for Cash Equities

Embarking on a journey through the realms of data analysis, quantitative research, and machine learning has led me…

3 条评论
Poor man's Serverless Quant Fund Infrastructure

2021年5月8日

Poor man's Serverless Quant Fund Infrastructure

With services like Quantopian shutting its door to the public, I thought it would be useful to share a serverless…

7 条评论
Time-series Classification using Convolutional Neural Networks (CNN) and Long Short Term Memory (LSTM)

2020年4月27日

Time-series Classification using Convolutional Neural Networks (CNN) and Long Short Term Memory (LSTM)

One of the most striking facts about neural networks is that they can approximate a wide range of functions and for…

2 条评论
Microservices Based Algorithmic Trading System

2020年1月7日

Microservices Based Algorithmic Trading System

Say that you have an algorithmic trading strategy idea. Whats next ? you need to code it, test it and deploy it in the…

6 条评论
Seeking Alpha in Alternative Data

2019年12月22日

Seeking Alpha in Alternative Data

A retail transaction creates thousands of data points, based on where you shop online or a brick and mortar store. That…

See all articles

Deep Reinforcement Learning in Trading

Saeed Rahman

Quantitative Researcher

First experiment:

[price(???1), price(??), ???????????????? ??????????, ????????????????]

New State

Saeed Rahman的更多文章

社区洞察

其他会员也浏览了

Reinforcement Learning: AI’s Autonomous Evolution

Artificial Intelligence: What Is Reinforcement Learning?

AI Reinforcement Learning Overview

Foundations of Markov Decision Processes and Reinforcement Learning

BxD Primer Series: SARSA Reinforcement Learning Models

Human-Guided Reinforcement Learning: Exploring Techniques, Real-World Applications, and Ethical Implications

Your AI Researcher: Exploring AI Through Reinforcement Learning

How Reinforcement Learning Helps Bridge The Gap And Pave The Way To Smarter LLMs

How Reinforcement Learning helps Decision Making

First experiment:

[price(???1), price(??), ???????????????? ??????????, ????????????????]

New State

Saeed Rahman的更多文章

AI Assisted Data Catalogs: An LLM Powered by Knowledge Graphs for Metadata Discovery

Hedge Fund Data Mastery: Governance and Quality Essentials

Quant Platform for Cash Equities

Poor man's Serverless Quant Fund Infrastructure

Time-series Classification using Convolutional Neural Networks (CNN) and Long Short Term Memory (LSTM)

Microservices Based Algorithmic Trading System

Seeking Alpha in Alternative Data

社区洞察

其他会员也浏览了

Reinforcement Learning: AI’s Autonomous Evolution

Artificial Intelligence: What Is Reinforcement Learning?

AI Reinforcement Learning Overview

Foundations of Markov Decision Processes and Reinforcement Learning

BxD Primer Series: SARSA Reinforcement Learning Models

Human-Guided Reinforcement Learning: Exploring Techniques, Real-World Applications, and Ethical Implications

Your AI Researcher: Exploring AI Through Reinforcement Learning

How Reinforcement Learning Helps Bridge The Gap And Pave The Way To Smarter LLMs

How Reinforcement Learning helps Decision Making