Using vaex to do Python calcs with 30x speed up

Using vaex to do Python calcs with 30x speed up

Within finance, time series data is usually the bedrock of most analysis. If we're using Python, what are the best ways to analyse this data? In this article, I discuss the various time series library available in Python. In particular, I focus on the vaex library in Python for dealing with large time series datasets, and compare its speed with Dask.

Python libraries for working with time series

Below, I've listed a few libraries we can try in Python if we're dealing with time series. Note it isn't an exhaustive list, and there are lots of other time series style libraries which I haven't included modin etc.

  • Pandas - This is the most popular time series library and I use it a lot! However, when your datasets are very large you need to batch your calculations
  • Dask - This is a library for parallel computing with task scheduling. It has Dask DataFrames which look like Pandas DataFrames to the user, but they can be much bigger than memory, and underneath Dask handles all the batching and construction of a graph for computation for us
  • NumPy - It's main library for working with arrays in Python. Whilst, it isn't designed purely for time series, we can use NumPy arrays to represent time series, and computations can be quickly using pure NumPy than Pandas.
  • TensorFlow - Whilst TensorFlow is primarily a library for machine learning, the newest version has an NumPy like interface, to make it easy to use it instead of NumPy. It can also target the GPU.
  • Vaex - we'll talk about that shortly..!

There are all sorts of tips and tricks we can use to speed up Python and the tools above, without having to resort to rewriting all our Python in another faster language like C...

To read the rest of the article on the Cuemacro website, please click here!

Laurent Bilke

CEO and Head of Research at Alternative Macro Signals | NLP & ML applied to Macro | Economics and Monetary Policy | Digital Innovation Enthusiast |

3 年

Impressive result, thanks for sharing Saeed!

回复
William Smith

Senior Developer at Beaufort Energy

3 年

Useful post. If you don't already, you should post on medium.com and earn some money. The quality of articles on medium is often very high, like yours.

回复
Raymond Troy

Senior Product Manager - GE Air and Water Solutions - HVAC Unitary

3 年
回复

Thanks Saeed, looks very interesting!

Saeed Amen

Turnleaf Analytics / Visiting Lecturer at QMUL

3 年

Here's the Jupyter notebook with the code I used for vaex and Dask https://github.com/cuemacro/teaching/blob/master/pythoncourse/notebooks/vaex_example.ipynb - and it also has links to my tcapy library to show how to download FX tick data, which are necessary to do the computation.

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了