Optimize Your Data Workflow: Discover the Fastest Python Engine for Excel to DataFrame
The Task
I needed to convert many excel files to many data frames, but I needed to use some features that are only available in pandas.
The problem
?????????????? The issue is that using the default pandas excel to data frame function was, taking way too long to. So, the goal was to reduce the runtime of the code.
The solution
?????????????? The solution was to change the engine that pandas were using. You see pandas default to using xlsxwriter or openpyxl for converting excel file. Luckly there is another option, which increases the speed exponentially that being calamine. The reason for the increase in speed is since calamine is built in the more optimized rust.
?
Openpyxl vs Calamine
For this test I used a excel sheet with ?111742 rows and 23 columns (you can find it on the Crimes - 2001 to Present | City of Chicago | Data Portal).
After importing the models in python, I converted the excel file using both engines first Openpyxl and then Calamine.
?
As seen here it took Openpyxly 31.7s vs calamine which only took 8.0s. That reduced the time by more than 1/3 showing the benefit of using calamine for excel conversion. For more information on calamine and what its capable of check out its documentation (python-calamine · PyPI).
Bachelor of Science in Computer Science
5 个月Very insightful post will need to give this engine a look.
Software Engineer | CS Student @ FIU | 2024 ShellHacks Winner (Vanguard)
5 个月Just goes to show how fast Rust is as a language.