Making Python parallel: a few points

Making Python parallel: a few points

If you want code to run super quick, you could write it in C++. If you want to code something super quick, you could write it in Python. Is there a middle ground where you can get Python code to execute quickly, without having to go through the process of writing everything in C++? One way to speed up code is to make it run in parallel. With Python, the way to make code truly parallel is to literally kick off new processes, to get around GIL (global interpreter lock) limitations. You can also write specific code bottlenecks in Cython, which is a subset of Python, which is then converted into C code and hence it can be compiled to run quickly. There's also Numba, a LLVM, which can be used to speed up certain Python code. My colleague, Shih-Hau Tan has done work to benchmark various computations using Numba and Dask (which can be seen as a distributed version of pandas). Below I've written a few points to consider when trying to make Python code parallel.

Which libraries can I use to make Python truly parallel?

The simplest library is Python's threading library. Whilst this can speed up tasks which involve a lot of IO blocking, however, for those where we want to do truly parallel processing, as we mentioned, we need to use libraries which work across different processes. In Python these libraries include multiprocessing and several variants (which differ in the way they "pickle" objects inside each process - I have found multiprocess a bit more forgiving...

To read the rest of the article on the Cuemacro website, please click here

You can surely use Cupy (https://github.com/cupy/cupy) for GPU acceleration based on the Numpy syntax, connect your Python interface with Boost.Python and voila! You can see some examples here?https://github.com/dendisuhubdy/boost-python-examples

回复
Jamil Raza

Time series, graphs, causality - modelling & inference.

6 年

If you want to code something super quick and you want it to run super quick as well, why not write it in Julia?

回复
Yasser M.

AI x Investment Research

6 年

The methods you've discussed are usually limited to a single machine, whereas cloud-based approaches can achieve parallelism across multiple machines. For the latter, I typically choose Apache Spark or AWS Lambda, depending on my use case.

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了