登录查看更多内容

Speed up Python data access by 30x & more

Saeed Amen

Co-founder at Turnleaf Analytics / Macro forecasting with ML

发布日期: 2017年2月11日

Let's say you send a letter from London to Tokyo. How long would it take to get a reply? At the bare minimum, it takes 12 hours for a letter to fly there, and then another 12 hours for a reply to fly back, so 1 day at least (and this ignoring the time it takes for your letter to be read, the time it takes to write a reply, the time it takes to post it etc.). We could of course use faster means of communication like the phone or an e-mail. Whilst the delay is going to be much lower, the delay will be at least a few hundred milliseconds.

Whenever you are analysing market data in Python or indeed any other language, a lot of time is spent loading data, even before you do computations and statistical analysis. Just as with our letter example, often the data you are trying to access might be across a network. Hence, it takes time to fetch this data before you can put it into your computer's RAM. The difficulty is every time you make changes to your Python code to change your analysis, whatever you loaded up into memory is lost, once its finished running. So next time you run it, you have to go through the process of loading up the data, even though it's precisely the same dataset. In my Python market data library findatapy, I've written a wrapper for arctic (my code here), which has been open sourced by Man-AHL. It basically takes in pandas DataFrames, which can hold market data, compresses them heavily and sends them to MongoDB for storage. By compressing the data, it reduces the amount of storage on disk when it is stored by MongoDB. Also because the compression is done locally, it takes a load of the network when the data is before send to your computer.

As a bit of an experiment I used my library findatapy (via arctic) to access 1 minute data from 2007 to the present day for 12 G10 FX crosses, which is stored on my MongoDB server. The output of this query amounts to around 40 million observations. The Python code also does joins together all the time series and aligns them, which takes a bit of time. In total it took around 58 seconds to load all this FX data across my network and align it into a single dataset, which will be number crunched. My MongoDB setup is far from optimal, and the database I was accessing was across a wifi network, rather than a wired gigabit network etc. If every time I rerun my Python script, I have to go through this 58 seconds process to get a dataset, it's going to seriously slow down the process of market analysis, which is often an iterative process. Luckily, there are lots of tricks you can do to make this process faster. One solution is to cache the data in our local RAM in such a way that it will still be available even if we have to restart the process. We can use Redis to do this, which is a simple in memory database (basically a key/value store). When we've loaded up the data simply push it Redis to store temporarily. Whenever we need it, just pull it from Redis! When we fetch this large dataset via Redis, it takes under 2 seconds, nearly 30 times quicker! Why is it so much quicker? We list some reasons below...!

Read the rest of the article on the Cuemacro website here

Jose Antonio Carpio

Director de Tecnología y Transformación (CTO) en Abanca Gestión de Activos | Profesor de IA en Finanzas

8 年

Great article. I've been using arctic from around one year know. I haven't use Redis but instead a mongo replica in local. I think your set up is fastest.

Marco Jean Aboav, PhD

CEO @ Etna Research - Frontier AI for public capital markets | @macro_fintech on X

8 年

great stuff!

1 次回应

Amit Sinha

Alternatives, Private Markets and Multi Asset investing

8 年

Another example of how open source is enabling folks with financial skills get things done faster and better - good post Saeed Amen

2 次回应

查看更多评论

要查看或添加评论，请登录

Saeed Amen的更多文章

Learning from running financial models live

2025年3月9日

Learning from running financial models live

Let's say you are the world's best burger chef (we all have ambitions, right). You'd be serving up all manner of…

1 条评论
DeepSeek, objectives and constraints

2025年2月1日

DeepSeek, objectives and constraints

When a new burger joint opens up, there's often a buzz. Everyone (well, at least me) wants to try the new burger.

1 条评论
Hundreds of quant papers from #QuantLinkADay in 2024

2025年1月4日

Hundreds of quant papers from #QuantLinkADay in 2024

I tweet a lot (from @saeedamenfx and at BlueSky at @saeedamenfx.bsky.
What we’ve learnt from reading thousands of Fed communications

2024年12月22日

What we’ve learnt from reading thousands of Fed communications

We recently had the last FOMC decision of 2024. Market l participants reacted to the hawkish tone including Powell’s…

1 条评论
Takeaways from QuantMinds 2024 in London

2024年11月24日

Takeaways from QuantMinds 2024 in London

Over the past years, the quant industry has changed substantially. My first visit to Global Derivatives was just over a…

12 条评论
Takeaways from Web Summit 2024

2024年11月17日

Takeaways from Web Summit 2024

Think of Lisbon and no doubt it’ll conjure images of explorers setting sail in centuries past across the ocean, the…

1 条评论
FILS Europe 2024 Takeaways

2024年10月6日

FILS Europe 2024 Takeaways

Paris is home to many things, the Eiffel Tower, the Arc de Triomphe, burgers (ok, I made that one up!). In recent…

4 条评论
Don't look back in hangar steak

2024年8月31日

Don't look back in hangar steak

I'm currently in the queue for Oasis tickets. Rather than mindlessly watching the counter of people in the queue ahead…

5 条评论
The Olympic spirit for forecasting

2024年8月26日

The Olympic spirit for forecasting

The Olympics finally finished, and the Paralympics are about to begin. I managed to go to some of the Olympic football…
Eleven years of independence

2024年7月27日

Eleven years of independence

Regrets become ever more edged with the passing of time. Recalling a time long gone, when perhaps a decision made, was…

5 条评论

See all articles

Speed up Python data access by 30x & more

Saeed Amen

Co-founder at Turnleaf Analytics / Macro forecasting with ML

Saeed Amen的更多文章

社区洞察

其他会员也浏览了

Python for Big Data: Leveraging Python's Ecosystem for Data-Driven Decisions

Exploring Chroma DB: A Python Approach in Jupyter Notebooks

Unlocking the Power of Python through Libraries

PyData London

Automating Weather Data Processing with Airflow, Docker, and Python

Python Data Types: A Deep Dive for Experienced Developers

40 intresting Python packages; Not necessarily the most popular one

Python’s Collections Module: Unlocking Powerful Data Structures

Introduction to Polar: A Modern DataFrame Library for Python

How to Connect Python to Google Sheets

Saeed Amen的更多文章

Learning from running financial models live

DeepSeek, objectives and constraints

Hundreds of quant papers from #QuantLinkADay in 2024

What we’ve learnt from reading thousands of Fed communications

Takeaways from QuantMinds 2024 in London

Takeaways from Web Summit 2024

FILS Europe 2024 Takeaways

Don't look back in hangar steak

The Olympic spirit for forecasting

Eleven years of independence

社区洞察

其他会员也浏览了

Python for Big Data: Leveraging Python's Ecosystem for Data-Driven Decisions

Exploring Chroma DB: A Python Approach in Jupyter Notebooks

Unlocking the Power of Python through Libraries

PyData London

Automating Weather Data Processing with Airflow, Docker, and Python

Python Data Types: A Deep Dive for Experienced Developers

40 intresting Python packages; Not necessarily the most popular one

Python’s Collections Module: Unlocking Powerful Data Structures

Introduction to Polar: A Modern DataFrame Library for Python

How to Connect Python to Google Sheets