How do I design high-frequency trading systems and its architecture. Part III
Ariel Silahian
Global Leader in Electronic Trading & High-Frequency Trading Systems | Hands-On Expertise & Executive Leadership in Market Infrastructure
This is the last part of 3 series of articles I've been writing. In this part, I'm going to explain what I've found as the best approach to ultra-low latency systems.
Even though I've been focused on trading systems, this can be applied to any low-latency systems: communication, audio, video, etc
So, the pattern I use is the following:
Busy/Wait or spinning: this is not categorized as a pattern, actually, it is considered the “anti-pattern” and usually is not recommended. But, when you are designing low latency systems, we don’t care how nice it is or if it follows the good practices. We only care about latency.
The process will be in a tight loop waiting for something, and that loop will consume 100% of CPU cycles. In our case, we are going to be reading market data, from our limit order book module, and if we meet certain strategy criteria, we will send specific orders to execute that trade. This is by far, the fastest way to get the data available from other modules.
But not only that, having this kind of process, we mostly will be avoiding cache misses and CPU’s context switching. Something that I have talked about in my last article.
The following is a basic code snippet on how it works
But, not everything as good as it seems, busy/wait processes are very hard to design and are too dangerous for the overall performance since it could take the entire CPU power bringing down the entire system’s performance.
Now, the key part to use busy/waits in our systems is set a thread affinity to a specific CPU core. That means, that we will say to our system to run this busy/wait process in only one CPU core (could be core 1, 2, 3, etc). And we will be able to “pin” as many processes as CPU cores we have. If we don’t do this because how the thread schedule works, will use the entire CPU power.
Using this type of methods, threading model, I/O model and memory management should be designed to collaborate with each other to achieve best overall performance. This goes against the OOP concept of loose coupling, but it’s necessary to avoid runtime cost of dynamic polymorphism.
Of course, you still need to take care of synchronization methods (locks) where is needed. My approach to this is to design my data structures in a way that I will need to have a low amount of synchronization.
Conclusion, this part is the most sensitive thing in our system, and using this technique will give you the best latency.
6. Position & Risk Management
All orders sent by the strategy should be consolidated in positions, so you can keep track of your open/close orders and most important, how your exposure to the market is. Ideally, your strategy should keep a flat exposure, but in certain strategies, like market making, you may allow having controlled exposure (if holding inventory).
From having stop loss per position or for the overall exposure to portfolio management, the risk module it is an important piece that will interact with your strategy and will be monitoring in real-time all open positions and the overall exposure to the market.
The following are some popular risk management rules:
- Position limit: Control the upper limit of the position of a specified instrument, or the sum of all positions of instruments for a specified product.
- Single-order limit: Control the upper limit of the volume of single order. Sometimes, control the lower limit of the volume of single order, which means that the quantity of your order must be a multiple of it.
- Money control: Control the margin of all positions not to exceed the balance of the account.
- Illegal price detection: Ensure the price is within a reasonable range, such as not exceed price limit, or not too far from the current price.
- Self-trading detection: Ensure the orders from different strategies will not cause any possibility of trade between them.
- Order cancellation rate: Calculate the order cancellation situation and ensure it does not exceed the limitation of exchange.
Also, within this module, you may want to analyze different allocations on strategies or trades. There have been many studies proving that having an allocation strategy could lead you to lower volatility in your returns and a great insurance if things go wrong.
7. Monitoring systems
Since we are building a fully automated system that must be able to open and close position within milliseconds, we must ensure proper monitoring systems to control the overall operation.
Imagine what would happen if a human realize that some strategy is not doing what it should, or if any venue is not providing prices as it should. When we must stop the system, unrecoverable losses may already be made.
How many minutes will take to this person to shut the system down? 5 minutes? 1 minutes?
We can have more than thousands of wrong open orders within that time frame. Scary!
That’s why we need to put monitoring systems in place, to check some of the following:
- Overall PnL: if there is, let’s say, a flash crash, out system must be able to close all open position and shut it down itself.
- Connectivity between venues: making sure that no one has been disconnected, activating reconnection systems in place.
- Monitoring latencies: let’s say some switch start to fail, and you start to receive data with some delays. You will never realize of that until you start to analyze some logs. We need to monitor latencies between venues, to assure data delivery and alert us in the case of any issue.
If you have read these articles, I would like to hear from you and discuss new or different approaches. Please share !!
Ariel Silahian
https://www.sisSoftwareFactory.com/quant
https://twitter.com/sisSoftware
Keywords: #hft #quants #forex #fx #risk $EURUSD $EURGBP $EURJPY #trading
TUNE
6 年Busy/Wait or spinning: I think this is all about Leader/Follower pattern by having a single reactor/proactor shared between threads which configured for it. And to keep synchronization least, a ring buffer could be helpful in network IO. Concerning memory IO, the same pattern could be implemented in non-blocking/blocking mode for orders book. Just one more thing, TSC is not a portable approach and not supported by some cpu and not always reliable. A better approach could be QPC. But sometimes if we need a fast access time because of resolution, it could be decided on compile time if available. Risk Management: Your approach is reasonable and practical. But for optimization purposes, we made it in two steps divided between two closely related modules. First module operates in low-level in our STREAM component as a pluggable module to act as a filter. And second module operates in high-level in risk management module. Monitoring: In our framework, we have a monitoring component which one its tasks is monitoring latencies through sensory a system and a control bus through which throttling made possible. Good quality articles and experience sharing. OMID SHAHRAKI IF1 - IFoundationOne [email protected] #IF1 #IFoundationOn
1LT Fire Support Officer at US Army
6 年Thank you for writing all of these articles and making this information available. I have enjoyed reading all of them. Currently in a High-Performance Computing Class that covered the Busy/Waiting process but only for a single core system. May have not learned about the importance of using pthread_attr_setaffinity_np to pin a thread to a specific core.
Ariel, OOP is not the only game in town. You can have loose coupling without dynamic polymorphism, for example look at STL containers. Also, running a spinning loop with wait on a mutex in the middle seems ... excessive. I would suggest either use atomic flag, or do not spin at all and simply wait on a condvar - because with mutex you will get contention and context switch at the exact moment when new content is being published from the producer thread.