登录查看更多内容

Europython 2018 - Day 3

Bruno Hanzen

CTO

发布日期: 2018年7月25日

Today was the first conference day proper, after 2 days devoted to trainings. As there was some rush at the registration desk, I started with volunteering some time, and helped hand conference T-shirts over. I did not attend the keynote, but, anyway, I am no big fan of such speeches.

Python in scientific computing: what works and what doesn't

Michele Simionato is a software developper for the Global Earthquake Model foundation, where he support the scientists doing, well, earthquake research. He works in a typical scientific environment, meaning distributed computing with complex processing under CPU, memory and data transfer constraints.

In that context, even the most robust packages can experience some sizing issues. For example, he has recently experienced issues with numpy structured arrays. He uses h5py to access hdfs file systems, and the migration from 1.8 to 1.10 was a "debacle".

He uses geospatial packages, which can sometimes exhibit different behaviors on different platforms.

Communication between machines is handled by celery/rabbitmq, which are used in an atypical use case: long messages instead of small. He has experienced some issues and has started experimenting with zmq (his "plan B") which gives better result. He will not adapt the existing codebase part which does not experience any problem ("if it's not broken, why fix it?").

He has no dask experience yet, but is open to experimenting.

He does not use the following accelerations techniques:

C extensions: because of gcc dependencies, and lack of C skills in his organisation. Most of his use cases with C extensions are covered with a clever use of numpy.
Cython: more usable for him than the C extensions, but the speedup he achieves is not impressive enough.
numba: it delivers a speedup in a parallelised environment, but his code is already parallelised, so little benefit and risk of oversubscription.
Intel Python distribution: he fears a vendor lock-in, and his tests showed a 20% slowdown.

His conclusion is "Algorithmic solutions are better than technical solutions".

Reliability in distributed systems

Jiri Benes is a technical leader for kiwi.com. They use Python heavily in production, and he explained some techniques he used to improve the resilience of the services:

use of time-outs and circuit breakers in the code, by using specific function decorators
"let it crash" philosophy: no over-engineering to prevent a crash, and no hiding of errors.
development of logging, monitoring (including Application Performance Monitoring) and alerting systems
definition of escalation procedures in case of alerts or incidents.

Using Pandas and Dask to work with large columnar datasets in Apache Parquet

Peter Hoffmann is a senior developper for Blue Yonder, a company providing datascience (like sales forecast) services to mass retailers.

Blue Yonder uses a typical Python ecosystem (Jupyter, Dask, Pandas) in a distributed computing environment (Apache Aurora, Apache Mesos). The exchange of data between different ecosystems is difficult, so they rely on Parquet file to exchange data with the "external" world. They use the Pyarrow and fastparquet libraries to access the files. These techniques have replaced the native access to their databases and have increased the speed of the data transfers.

Fuzzy Matching - Smart Way of Finding Similar Names Using Fuzzywuzzy

Cheuk Ting Ho is a datascientist working for Hotelbeds group. She has to work with company names, and needs to do some deduplication of (frequently badly spelled) company names.

She uses the "fuzzywuzzy" library, which calculates the Lehvenstein distance between words. The Lehvenstein distance is "the number of deletions, insertions and substitutions" necessary to transform one word into another one. The shorter the distance between two words, the bigger the similarity. There are several variations included in the library, including the possibility to work on sets of several words, through tokenization

I use this library for words comparison (Hotelbeds is not the only company to have issues with company names), and the presentation gave me some ideas for additional use cases.

What's new in Python 3.7

Stephane Wirtel is an independant Python developper and a Python expert and contributor. Python 3.7 has been released last month, he has explained the most important changes in the new version:

breakpoints: a new elegant interface to the debuggers. setting breakpoints is now done with a function defined in the language, which is no more debugger-specific.
Dataclasses (which is a class decorator): allows the definition of complex datastructures in an elegant way, with excellent performance (for old blokes like me, it looks very much like a C structure).
New time functions, with nanosecond resolution. This had become necessary, because of the speed increase of the systems.

Asyncio in production

Hrafn Eiriksson works as a software developper for smarkets, a fintech.

Asyncio is a relatively new (since v. 3.4) functionality in Python. It allows asynchronous programming. This kind of service was already provided by some libraries, but they are now integrated in the language.

Hrafn has migrated a mission-critical micro-service to asyncio. His experience is that it is important to trace all dependencies before starting the migration. Debugging can be difficult, especially when synchronous code is calld within an asynchronous function. Bit the final result is worth the effort: the large performance improvement has allowd to remove 80% of the servers, with the associated reduction in cost!

要查看或添加评论，请登录

Bruno Hanzen的更多文章

Identify this device

2023年3月30日

Identify this device

I bought this device in a curiosa shop. It is probably a device used to demonstrate the propagation of electromagnetic…

7 条评论
Investir en production d'électricité pilotable

2022年9月2日

Investir en production d'électricité pilotable

Voici une rapide estimation pour la Belgique. Il s'agit bien entendu d'ordres de grandeur, pas d'un plan financier.

2 条评论
Ah, les grandes familles!

2019年3月8日

Ah, les grandes familles!

Où les archives d'un ancien Premier Ministre reviennent sur la liaison du roi Baudouin avec sa maratre. Reprenons au…

7 条评论
Une alternative aux sodas?

2019年2月9日

Une alternative aux sodas?

Hier, j'ai préparé un plat à la bière. J'ai utilisé la "bière de table" brune de mon enfance.

2 条评论
Insoutenable

2019年2月6日

Insoutenable

Les chiffres du Hainaut Des chiffres comme il en tombe à peu près tous les jours. Parmi ceux-ci, il y en a un qui m'a…
Tous foutus?

2019年2月2日

Tous foutus?

Nous sommes tous exceptionnels! La jeune génération est angoissée par le climat. Ma génération a été angoissée par la…

1 条评论
Europython 2018 - Day 4

2018年7月26日

Europython 2018 - Day 4

Fourth day of the conference. Fatigue starts to take its toll, with some people taking naps where they find a quiet…
Europython 2018 - Day 2

2018年7月24日

Europython 2018 - Day 2

This was the second and last day of training. I managed to participate to 2 sessions (no more schedule…

1 条评论
Europython 2018 - Day 1

2018年7月23日

Europython 2018 - Day 1

I just finished my first day at Europython, in Edinburgh. Travelling always brings its lot of surprises, like waiting 1…
Too old to rock and roll: too young to die

2016年1月6日

Too old to rock and roll: too young to die

Dear Connections, Remember Jethro Tull? Well, you may be too young for that. But you might nevertheless want to listen…

14 条评论

See all articles

Europython 2018 - Day 3

Bruno Hanzen

CTO

Bruno Hanzen的更多文章

社区洞察

其他会员也浏览了

My New Article on Dynamical Image Generation

Quantum Software Insights from Classiq - February 2025

?? Mathematical Art Generation: Exploring Linear Algebra & Complex Plane ??

Python in Quantum Lands: Navigating Data Science and Quantum Computing by Hussain Shtia, 2023

Einstein Summation in Numpy

It’s Been a Long Short Road: The Monumental Past 2 Years of pyOpenSci

Fixing the reproducibility crisis in science: Lifebit CloudOS meets Jupyter

Numerical Solutions of ODEs with Python: Euler, Runge Kutta and Beyond

Quantum & Python

Exploring Fluid Dynamics Using Python: A Numerical Approach with Navier-Stokes Equations

Bruno Hanzen的更多文章

Identify this device

Investir en production d'électricité pilotable

Ah, les grandes familles!

Une alternative aux sodas?

Insoutenable

Tous foutus?

Europython 2018 - Day 4

Europython 2018 - Day 2

Europython 2018 - Day 1

Too old to rock and roll: too young to die

社区洞察

其他会员也浏览了

My New Article on Dynamical Image Generation

Quantum Software Insights from Classiq - February 2025

?? Mathematical Art Generation: Exploring Linear Algebra & Complex Plane ??

Python in Quantum Lands: Navigating Data Science and Quantum Computing by Hussain Shtia, 2023

Einstein Summation in Numpy

It’s Been a Long Short Road: The Monumental Past 2 Years of pyOpenSci

Fixing the reproducibility crisis in science: Lifebit CloudOS meets Jupyter

Numerical Solutions of ODEs with Python: Euler, Runge Kutta and Beyond

Quantum & Python

Exploring Fluid Dynamics Using Python: A Numerical Approach with Navier-Stokes Equations