Optimize Your Data Workflow: Discover the Fastest Python Engine for Excel to DataFrame

Optimize Your Data Workflow: Discover the Fastest Python Engine for Excel to DataFrame

The Task

I needed to convert many excel files to many data frames, but I needed to use some features that are only available in pandas.

The problem

?????????????? The issue is that using the default pandas excel to data frame function was, taking way too long to. So, the goal was to reduce the runtime of the code.

The solution

?????????????? The solution was to change the engine that pandas were using. You see pandas default to using xlsxwriter or openpyxl for converting excel file. Luckly there is another option, which increases the speed exponentially that being calamine. The reason for the increase in speed is since calamine is built in the more optimized rust.

?

Openpyxl vs Calamine

For this test I used a excel sheet with ?111742 rows and 23 columns (you can find it on the Crimes - 2001 to Present | City of Chicago | Data Portal).

After importing the models in python, I converted the excel file using both engines first Openpyxl and then Calamine.


?

As seen here it took Openpyxly 31.7s vs calamine which only took 8.0s. That reduced the time by more than 1/3 showing the benefit of using calamine for excel conversion. For more information on calamine and what its capable of check out its documentation (python-calamine · PyPI).

Eduardo Lopez Sanchez

Bachelor of Science in Computer Science

5 个月

Very insightful post will need to give this engine a look.

回复
Kristian Correa

Software Engineer | CS Student @ FIU | 2024 ShellHacks Winner (Vanguard)

5 个月

Just goes to show how fast Rust is as a language.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了