From Cleaning Floors to Clearing Profits: How AI Brains in Vacuums Could Revolutionize Your Forex Trading Strategy
At the core of a machine learning approach known as reinforcement learning (RL), where an AI learns autonomously through trial and error to make future decisions, lies a method called Q-learning.
Everyday examples of where this type of AI is deployed include robotic vacuum cleaners, the optimization of ads you see on platforms like Google, the training of non-human players in video games, self-driving cars, recommendation systems used by the likes of Netflix or YouTube, and online dynamic pricing strategies.
It is the AI strategy that, for me, has been the easiest to understand, implement, and program, and then tweak to optimize performance in developing an AI agent to trade forex currencies.
Forex currency trading is the largest and most liquid financial market in the world, significantly larger than other financial markets. Its daily trading volume is estimated to exceed $7.5 trillion. This immense size comes from the global nature of currency exchange, involving participants like governments, central banks, corporations, and retail investors. Compared to stock markets, where the combined global capitalization (such as the New York Stock Exchange, NASDAQ, etc.) is around $95 trillion, they have a much smaller daily trading volume, roughly around $200 billion to $300 billion. Bond markets globally are larger than stock markets in terms of market capitalization, estimated at $130 trillion, though their daily trading volumes are smaller, with U.S. Treasury bonds being the most actively traded, reaching daily volumes of around $500 billion to $1 trillion. Commodities markets, where daily trading volumes are estimated to be around $100 billion, include markets for oil, gold, and agricultural products. In short, the forex market dwarfs other markets in terms of daily trading volume, making it highly liquid and accessible around the clock.
It is a market rich in easily accessible data and with enough noise in the data to be a real challenge for anyone interested in training an artificial intelligence to understand that particular world.
When RL training is completed, you end up with a two-dimensional Q-score or rule table to which we ascribe the persona of an independent acting "Agent." It is worth noting that in the AI world, an agent is a common designation for an AI, even though the processes used to create them can be varied.
At the moment, my two currency trading agents, working in tandem, have enjoyed such a level of success during their back testing and early demo (monopoly money) phases that I can now imagine a time when they will be the beginnings of a virtually staffed Wise trading company.
The major steps involved in creating an RL AI agent include:
·?????? Obtaining good quality data on which you will first train the agent and then separate data used to test the trained agent(s).
·?????? Describing the environment in which the agent will work.
·?????? Developing rewards and penalties to be applied to the agent while it learns.
·?????? Defining the action options that an agent could take.
·?????? Coding a couple of well-understood algorithms like the Epsilon Greedy process and the Bellman equation and the hyperparameters therein.
·?????? Coding the learning and testing frameworks.
The learning process is actually very simple to understand.
领英推荐
The agent at first is simply a blank two-dimensional table with the first column describing the environment at a given point in time. It is called the ‘State’ column. Then there are additional columns, one for each of the possible actions that the agent could take given the state in the same row of the table.
In the case of the robotic vacuum cleaner mentioned previously, the state is the vacuum’s current location in relation to walls, furniture, etc. The possible actions could be to go forward, turn left or right, or stop.
At the beginning of the learning process, each time the agent finds a new state, it randomly guesses an action. That action column for that state (row) is then issued a numeric reward or penalty based on how well that action allowed the agent to progress toward its goal.
For our vacuum agent, bumping into a wall or traversing a part of the floor that it has already covered are not ideal outcomes, so some degree of penalty could be applied toward the score for the chosen action for that state. Covering new ground results in a reward for the action.
By repeating this process hundreds, thousands, or even hundreds of thousands of times for a single agent (called episodes), the Q-score table starts to take shape. The agent becomes more intelligent as it experiences more states. As the episodes increase and the table grows with experience, the randomness of the action decisions by the agent diminishes, and more and more, the agent uses the maximum action score for a particular state to determine what to do next.
The action's score is continiously incremented or decremented depending on how well that action for a particular state progressed toward the goal(s).
The art in Q-learning lies in how one describes the state, designs the rewards and penalties, fine-tunes the action's random decay rate, and maintains an optimistic persistence — knowing that while many agents may fail, eventually, you can develop one that works. The science, on the other hand, is in the precise coding and the construction of unbiased training and testing frameworks.
Training thousands of agents, because of the randomness in the training process, creates thousands of different agents with varying capabilities for the same set of states and rewards.
My first useful tweak—and I tried many—was evolution. The laptop-based code could pick a random number of episodes to train an agent. Once training was completed, it would be tested on more recent seperate test data and scored for its capability.
I would go to bed, leaving a couple of laptops running the code over and over again, waking the next morning to thousands of agents all scored for their performance effectiveness.
Randomly selecting a pair of the best-scoring agents, I would create child agents through an evolution process, resulting in a new child generation again consisting of thousands of child agents. The very best of these would again be used to create the next generation. With each generation, I could typically improve the top performance outcomes, though the improvements would diminish with each generation. By about the fourth generation, I was finding great, great grandchild agents that were star pupils.
I will publish more details for those interested in currency trading and/or Q-learning in a separate post. For now, suffice it to say that back testing, with all the shortcomings anyone experienced in testing trading robots knows, has suggested that these agents can turn $1,000 into a six time account balance multiple over the testing period from January 2024 to September 2024.
Early live demo trading, again with its known limitations compared to actual live agent / robot trading, is showing the same positive agent trading capability.
?