Speeding up tick data calculations in Python
Time, time, time. At the current moment, we likely have a bit more time than usual. Despite this, it's unlikely any of us actively want to wait longer for code to execute! Last week, I wrote about libraries for working with large datasets, like Dask and Vaex, or using databases like kdb+/q. This week, I'll continue the theme, given all the feedback and suggestions I've got about the article. This time, the focus is on Python tools which can be useful for speeding up calculations, plus other tips and tricks you can use with tick data, which aren't necessarily Python specific (thanks to @ewankirk for a few of these in reply to my original tweet, in particular with respect to tricks with tick data).
Cython - https://cython.org/
Python is an interpreted language, hence it doesn't need compiling to run it. The flip side is that it tends to be slower than compiled languages. Cython allows you to "compile" some of your code. Essentially, it is basically special Python-like code that can be converted into C and statically compiled down into machine code. You can also "release the GIL" with Cython allowing true parallelization of your code. Many Python libraries use Cython to speed up computation such as Pandas.
If you take time to annotate your code with type declarations for Cython and rewrite it, it can help to speed it up more. How fast it will be depends upon how much of the Python code you've written can be converted by Cython. If you want to play around with Cython, it's pretty easy to do so in Jupyter notebooks. All you need to do is add %%cython to your code cell and your code in that cell with be compiled
Numba - https://numba.pydata.org/
Numba is similar to Cython, in that it can convert....
To read the rest of the article on the Cuemacro website, please click here!
Systematic Trader (Quant-Algo) , Cofounder and CEO of Causal Experts
4 年Good one :)