Power BI - Excel vs CSV vs MySQL - a Real Case

Power BI - Excel vs CSV vs MySQL - a Real Case

In this short article, I want to comment on a real case of performance issue and how we solved it.

Case

We received from a client an Excel file containing a data sample to develop a new BI application.

As the first activity, we started to analyze the content, columns, data formats, nothing special, and we connected the Excel file as a Power BI Desktop application data source. Again, nothing different from what we did before in a similar situation.

However, we noticed that the data load on Power BI was slow, very slow.

The Excel file has a usual design, columns with labels and values, and about 82,000 records. The file size is 11 MB.

Could this be the reason?

But we have models much bigger than this.

The notebook used was a Ryzen 7 with 16 threads and 32 GB of memory, enough power to do the task easily.

Analyzing the issue

We made tests with interesting results:

  1. Excel file is slow. Even removing all steps from Power Query process, it is really slow for this volume;
  2. Converting to a CSV, the file size doubled, 22 MB, but the same process ran almost instantaneously;
  3. We loaded the data to a MySQL database located in an AWS Instance in the USA, and the result was similar to loading the CSV file.
  4. We also put the Excel file in a SharePoint folder, but we had the worst results.

Conclusion

We finished the day with the clear idea that the problem with Excel was the file's opening and reading process. Considering the compression level, the CSV file with the same content is double the size, the uncompressing process could be consuming much time.

On the other hand, the MySQL connection had an incredible performance if we think about the complexity of establishing a database connection and collecting data across the Atlantic Ocean (I'm in Portugal).

So, I finished with the tip:

To perform better, avoid connecting big Excel files. Convert them to CSV or use a database.


要查看或添加评论,请登录

社区洞察

其他会员也浏览了