How to speed up TCA in Python
Somewhat abusing a quotation by Dickens, coding is the best of times and also the worst of times. The worst of times are those hours spent debugging, what appears to be some innocuous code, that throws an exception for some totally inexplicable reason. When you find the problem it is usually something which was incredibly trivial. Of course, something is always obvious after you've discovered it, but never beforehand.
In between the bugs, the frustration and the tears (admittedly, the last part of that tricolon has been used purely for an exaggerated literary effect), there is the question of optimization. As Donald Knuth has noted, "premature optimization is the root of all evil". The priority is making your code work. However, once it works, if it is very slow and is likely to be executed repeatedly, it might be opportune to ask how you can speed it up. At the same time, we also want our code to be readable, to make maintenance easier. Open sourcing the project forces you to think about making your code to be as elegant as possible, given you know lots of folks will be looking at!
We recently open sourced tcapy, Cuemacro's transaction cost analysis library (download it from GitHub). It is one of the first open source libraries for TCA. Most solutions tend to be closed source, and if you want to build your own internal TCA library it is likely to cost many hundreds of thousands of dollars, if we also count the maintenance costs. Use tcapy and save hundreds of thousands of dollars! Essentially, tcapy takes in large amounts of market tick data and your own trade/order data. It then calculates various statistics using a combination of this data, to tell you how much you are paying for your trading activity. It allows you to compare between different liquidity providers, trading styles, algos etc. We are faced with several time consuming steps:
- IO intensive: Loading large amounts of data from disk is slow
- Compute intensive: Making calculations on large amounts of data is slow
- Compute intensive: Generating graphical output on large amounts of data is slow
We also face constraints....
To read the rest of the article on the Cuemacro website, please click here!