Challenging the Excel – How to reduce workbook size and improve calculation speed

Challenging the Excel – How to reduce workbook size and improve calculation speed

This might not be the exhaustive document, which will guide us on how to optimise our workbook. Though I want to share major findings which I recently came across while working on 200 MB size Excel based financial model.

To share the background, we were challenged, to prepare data intensive financial model. We have to prepare Monthly model for 6 years which contains data of 100,000 Rows. We have to use sales data book to prepare financials. As a result, we have to use long ranges formula due to which workbook take lot of time to calculate the model.

Hence, we tried to challenge some of the basic Principles of financial modelling (Best Practices) to overcome this situation and it works. Among many of the small small changes, top 5 points which changed the scenario are:-

1)?????Use of Offset

2)?????Use of Daisy chain

3)?????Hardcode the calculated values using macros (With an option to paste formulas)

4)?????Separate input sheets based on input classification

5)?????Use of Tables / Pivot / Slicer to prepare summary

In today’s world, instead of file size, Excel calculation speed matters the most. You can still deal with file size but if calculation speed it impacted due to size, we need to find some optimized solution.

Before touching the above points in details, let’s focus on basic check list to ensure, we are left with no other option and have to challenge financial modelling principles.

a)?????First and foremost, check whether last used cell (End cell) is set throughout the model. You can check ‘How to reduce file size’ document for more details (Link ) or you can use ‘Inquire’ tab feature i.e., ‘Clear Excess formatting’ or both

b)?????There should not be any external link in your workbook

c)?????Try to avoid conditional formatting in such a large financial model

d)?????Try to avoid objects and ensure there are no hidden objects

e)?????Calculation mode should be ‘Manual’

f)??????Workbook format should be ‘.xlsb’ instead of default format i.e., ‘.xlsx’ or ‘.xlsm’ as it also save the file size

g)?????No circularity should be present in the model

h)?????No Hidden worksheet or rows or column in your workbook

i)??????Check Names ranges, Data validation to ensure no errors / long formula is present.

(In case, any point mentioned above needs to be discussed in details, please leave a comment)

Might be these point valid in different scenarios. Still, we need to ensure, only during unavoidable cases, we are having these in the model as in 90% of the cases, these will help in reducing the file size and increasing the speed with 100% success rate.

Now let’s move back to the topic... i.e., top 5 points which will help in improving the workbook performance.

1)?????Use of Offset: - Due to non-traceability of actual precedents, it is always being mentioned that Offset should not be used in the financial models. We will never able to check unused calculations and Inputs, if offset is being used in the model or which value is driving my output.

Though, in data intensive modelling, if ranges refers to 100,000 rows, you will unable to track the cell which is impacting your output. For e.g., if Sumif has been used on 100,000 rows, you cannot directly check, where your condition is matching and picking the correct cell. You need manual crosscheck to track back your precedents.

Hence, in such cases, if we will use Offset, where, we will sum only those cell where condition held true, then it will reduce the calculation time.

For e.g., you have 10 stores, which sells 1000 products. Now you want to sum the monthly revenue of 10 stores. Hence, either we can use Sumif, which will check the condition for all the 10,000 cells in 12*6 (6 year monthly) columns or we can sum 1000 cells only, no need to check condition.

We can sort the stores by their name and use offset to calculate the desire result.

No alt text provided for this image


Therefore, in large set of data, we can use offset to keep hold on calculation speed. In addition, it will increase the Workbook performance, as only single cell is linked instead of 10,000 cells as a result, if earlier it takes 2 mins to insert a row or move sheet from one position to another, it will now take only 1 second. Try it now and feel the difference.


2)????Use of Daisy Chain:- As per the best practices, calculation or reference should be linked to source cell instead of creating chain of links for e.g., we have to build 10 different types of revenue, where logic is Price * Quantity * conversion rate. In this case, you may have 10 different price and 10 quantity but single conversion rate. Hence, in all the calculation blocks, conversion rate should be linked to single cell, instead of linked from different cells of different revenue.

Benefit of the above logic is, when user press ‘Ctrl + [’?it will land to the source cell and user can change the input directly. Otherwise it will required the effort to track back the cell 10 times.

No alt text provided for this image

In our case, when we have to link 10,000 cells to a single cell, that particular sheet or cell becomes heavy. If we have to insert row, it has to change the reference of all the linked cells. Hence, in such models, we need to compromise with tracking source cell with daisy chain. As in daisy chain, you will insert a row and it has to change only one linked cell instead of thousand other cells, which will increase the performance of the workbook. Below is the example of daisy chain.

No alt text provided for this image

3)?????Hardcode the calculated values: - In this case, you required bit of VBA knowledge. Though several codes are available if we search how to copy paste the values. Let me share the logic first.

In our case, we have 100,000 rows with 72 columns i.e., 7,200,000 cells which we have to calculate and in financial models, logic starts with revenue and ends with financial statements. Hence, you might need to calculate many folds of 7,200,000 cells.

Now, we might need formula all the time, but whenever you do any calculation, sensitivity analysis, you might need to wait for several minutes (10-60 minutes) to calculate such heavy model.

Therefore, we need to make judgement, where we can calculate the values once and paste the formula. So that rest of the model will not be impacted by such formulas.

For e.g., in our first point, we can paste the values of range D20:F23. We just need a macro, which will copy the formula from cell B31, paste the formula in our range, calculate the model and paste it as value later. So that, user will not face issue will working on the other parts of the model.

No alt text provided for this image

This will not only increase calculation speed, but also drastically reduce your file size. In our case, 200 MB model can be converted to 50 MB model (Easy to send as well). As long ranges formula are not available. Only in one cell as compare to 7,200,000 cells.






For the users who are not familiar with Macros, please feel free to connect with me for the code.

4)????Separate input sheets: - As per my understanding, this is the most simple logic to explain.

According to the modelling standards and industry demand, all the inputs should be kept at one place. So that while auditing the workbook, we will not miss any assumptions. Though in this case, we might have revenue, cost, quantity, price, tax etc for each of the 100,000 items. Hence, if we keep all the inputs on the same tab, either we will reach to 600,000 rows or 100,000 rows with several columns. In both the cases, your worksheet size will increase and you will face difficulty to insert row, increase timeline if required or perform any calculation.

Hence, we can create separate sheets for each of the items and create one Global Input tab, where we can place the list of all the inputs area. So that, user can easily navigate and update the inputs without impacting single worksheet size.

Easy to do, right?

5)????Use of Pivots / Tables / Slicer: - There are several materials available online which can guide us, how to use slicer, tables or pivot. So, I will just focus, why we should use it even after modelling best practices says, we should avoid pivots.

  • First, why should we avoid pivots / slicer?

Answer is simple, you cannot do any calculations or apply logics using pivots. As soon as you change the filter, your values may change. Also formulas will consider the values which are between the filter cells except (Subtotal). Hence, user needs to be very caution while using pivots / slicer.

  • When we should use it

Again, simple answer, when we are not supposed to do any calculation using Pivots. We can use it to present outputs.

So what is the point of discussion?

Here, I just want to highlight that, when we uses heavy functions like ‘VLookup’ , ‘Index Match’, ‘SumIF’ or ‘SumProduct’, just to extract calculated data for presentation, model calculation time will increase. Hence, Pivot or Slicer are the inbuilt functionality which provide summarized information without impacting calculation time.

Just make sure, you create separate data sheet with ‘start range’ and ‘end range’ so that even if my mistake user apply any formula, your output will be correct.


Might be these are very small points, but it helps a lot in optimising the Excel worksheet. We just need to make the judgement, what benefit it will provide us over the disadvantage we are facing.

At the end, based on my experience, I can say, there is no perfect solution in Excel. You can learn Excel every day, you can always find alternatives to your problem (just like you face in your life). All you need is, think about it, search the solution, implement it and check the output. Still, if it is not solved, you can contact me anytime. Drop me message on linkedIn or Email at [email protected]

You can also consider to use the expert tools like 'EMF Pro' to resolve few of these problems.

Please share your views or if you have any alternative solution or any idea which we can work together.

HAPPY LEARNING :)

Very well noticed Mayank. Thanks for sharing. The size of an Excel file depends on the ingredients of the file. E.g. sheets, cells, cell contents and formatting, objects, formula algorithm, macros, etc. Each of these elements contribute towards the size of the file. Each of these elements has their own weightage of contribution. I wonder if these could form an equation to arrive at the file size. If so, then we can pin point the parts of the model which is heavy. E.g. a cell with a long range SumIf would be heavier than a cell with an input value. Not sure whether this is practically possible, but this could help dig out the bug from the model.

回复
Craig Hatmaker

Microsoft MVP | BXL | 5g Modeling Founder

6 年

I'd be very interested in seeing benchmarks especially because these suggestions are situational. You are using them when performance is a problem and you are improving performance at the expense of other factors - especially traceability. As you point out, "We will never able to check unused calculations and Inputs,..." Alternatives not explored Section 1's diagram could use grouping and subtotals which would greatly simplify development, reduce risk by eliminating several formulas, and would seem to improve performance because of fewer formulas. A benchmark would confirm that. "Why not pivots" I agree Pivots are best used for the presentation side and I personally do not use them in calculations. That said, the reasoning does not apply: "cannot do any calculations or apply logics using pivots." Actually, we can add calculated fields to Pivots (though limited). And Power Pivot adds a very rich function library. But for normal pivots, all the functionality required to aggregate that table is provided. So this does not apply. "change the filter" If we allow users the ability to change anything in our calculations then this applies to everything. Not just pivots. Since we are sacrificing things for performance Pivots would greatly improve performance as well as greatly simplify the aggregation process. "Separate input sheets:" Under this section you describe inputs for 100,000 items. Hopefully that is a download using PowerQuery; otherwise, the chance of entry error is huge. And if we are using PowerQuery, we might be able to perform some calculations and aggregations during the download process which would greatly reduce formula count and associated risk and performance lag.

Dhruva Poonia

Project Finance & Financial Modelling Expert | Head of Project Finance at Tamasuk

6 年

Well done Mayank. Really interesting and useful ideas. I think many of us come up against such problems, some of us find some creative solutions, but most of the times we don't go through the process of writing down the learning and sharing it with others. Good to see you have done it. Keep it up.

Peter Bartholomew

Technical Fellow at NAFEMS

6 年

I agree that pin-pointing the range you wish to aggregate? can be more effective than hitting the entire range with SUMIFS.? The SUMIFS functions are remarkable fast though, so I would typically switch if I only needed to aggregate 5-10% or less of the data at a time.? On small workbooks I would use OFFSET because the syntax is clear, e.g. = OFFSET( Array, 0, 0, , n ) returns the first n columns of the table/array.? The trouble is with OFFSET is that it is a volatile function so the recalculation keeps on getting triggered even when nothing has changed to affect it.? The alternative is to use INDEX? = INDEX( Array, 0, 1 ) : INDEX( Array, 0, n ) which is non-volatile.

Micha? Pucha?owicz

Controlling Manager

6 年

Good job. I knew most of points, but using xlsb format was new for me.?

回复

要查看或添加评论,请登录

Mayank Jain的更多文章

社区洞察

其他会员也浏览了