Speed up Python data access by 30x & more

Speed up Python data access by 30x & more

Let's say you send a letter from London to Tokyo. How long would it take to get a reply? At the bare minimum, it takes 12 hours for a letter to fly there, and then another 12 hours for a reply to fly back, so 1 day at least (and this ignoring the time it takes for your letter to be read, the time it takes to write a reply, the time it takes to post it etc.). We could of course use faster means of communication like the phone or an e-mail. Whilst the delay is going to be much lower, the delay will be at least a few hundred milliseconds.

Whenever you are analysing market data in Python or indeed any other language, a lot of time is spent loading data, even before you do computations and statistical analysis. Just as with our letter example, often the data you are trying to access might be across a network. Hence, it takes time to fetch this data before you can put it into your computer's RAM. The difficulty is every time you make changes to your Python code to change your analysis, whatever you loaded up into memory is lost, once its finished running. So next time you run it, you have to go through the process of loading up the data, even though it's precisely the same dataset. In my Python market data library findatapy, I've written a wrapper for arctic (my code here), which has been open sourced by Man-AHL. It basically takes in pandas DataFrames, which can hold market data, compresses them heavily and sends them to MongoDB for storage. By compressing the data, it reduces the amount of storage on disk when it is stored by MongoDB. Also because the compression is done locally, it takes a load of the network when the data is before send to your computer.

As a bit of an experiment I used my library findatapy (via arctic) to access 1 minute data from 2007 to the present day for 12 G10 FX crosses, which is stored on my MongoDB server. The output of this query amounts to around 40 million observations. The Python code also does joins together all the time series and aligns them, which takes a bit of time. In total it took around 58 seconds to load all this FX data across my network and align it into a single dataset, which will be number crunched. My MongoDB setup is far from optimal, and the database I was accessing was across a wifi network, rather than a wired gigabit network etc. If every time I rerun my Python script, I have to go through this 58 seconds process to get a dataset, it's going to seriously slow down the process of market analysis, which is often an iterative process. Luckily, there are lots of tricks you can do to make this process faster. One solution is to cache the data in our local RAM in such a way that it will still be available even if we have to restart the process. We can use Redis to do this, which is a simple in memory database (basically a key/value store). When we've loaded up the data simply push it Redis to store temporarily. Whenever we need it, just pull it from Redis! When we fetch this large dataset via Redis, it takes under 2 seconds, nearly 30 times quicker! Why is it so much quicker? We list some reasons below...!

Read the rest of the article on the Cuemacro website here

Jose Antonio Carpio

Director de Tecnología y Transformación (CTO) en Abanca Gestión de Activos | Profesor de IA en Finanzas

8 年

Great article. I've been using arctic from around one year know. I haven't use Redis but instead a mongo replica in local. I think your set up is fastest.

回复
Marco Jean Aboav, PhD

CEO @ Etna Research - Frontier AI for public capital markets | @macro_fintech on X

8 年

great stuff!

Amit Sinha

Alternatives, Private Markets and Multi Asset investing

8 年

Another example of how open source is enabling folks with financial skills get things done faster and better - good post Saeed Amen

要查看或添加评论,请登录

Saeed Amen的更多文章

  • Learning from running financial models live

    Learning from running financial models live

    Let's say you are the world's best burger chef (we all have ambitions, right). You'd be serving up all manner of…

    1 条评论
  • DeepSeek, objectives and constraints

    DeepSeek, objectives and constraints

    When a new burger joint opens up, there's often a buzz. Everyone (well, at least me) wants to try the new burger.

    1 条评论
  • Hundreds of quant papers from #QuantLinkADay in 2024

    Hundreds of quant papers from #QuantLinkADay in 2024

    I tweet a lot (from @saeedamenfx and at BlueSky at @saeedamenfx.bsky.

  • What we’ve learnt from reading thousands of Fed communications

    What we’ve learnt from reading thousands of Fed communications

    We recently had the last FOMC decision of 2024. Market l participants reacted to the hawkish tone including Powell’s…

    1 条评论
  • Takeaways from QuantMinds 2024 in London

    Takeaways from QuantMinds 2024 in London

    Over the past years, the quant industry has changed substantially. My first visit to Global Derivatives was just over a…

    12 条评论
  • Takeaways from Web Summit 2024

    Takeaways from Web Summit 2024

    Think of Lisbon and no doubt it’ll conjure images of explorers setting sail in centuries past across the ocean, the…

    1 条评论
  • FILS Europe 2024 Takeaways

    FILS Europe 2024 Takeaways

    Paris is home to many things, the Eiffel Tower, the Arc de Triomphe, burgers (ok, I made that one up!). In recent…

    4 条评论
  • Don't look back in hangar steak

    Don't look back in hangar steak

    I'm currently in the queue for Oasis tickets. Rather than mindlessly watching the counter of people in the queue ahead…

    5 条评论
  • The Olympic spirit for forecasting

    The Olympic spirit for forecasting

    The Olympics finally finished, and the Paralympics are about to begin. I managed to go to some of the Olympic football…

  • Eleven years of independence

    Eleven years of independence

    Regrets become ever more edged with the passing of time. Recalling a time long gone, when perhaps a decision made, was…

    5 条评论

社区洞察

其他会员也浏览了