My Array Formula is Slow...
First things first. Yes I know Power BI exists. Yes I know Power BI rocks. I also know from experience that many companies are still locked into older versions of Excel and don't have access to Power BI. If the only useful comment you make is to tell me to use Power BI expect to be called a cock-womble!
Swiftly moving on...
I hear time and time again that array processing formula are pants because they're slow. Well if you point one formula at half a million records and ask a half dozen Boolean questions per row, yeah of course it is, over 3 million processes per formula is going to take it's sweet time in what is usually a 32bit environment.
I've put together a simple workbook with just over 600k records of reported murder and manslaughter cases in the US between 1980 & 2014 (freely available to download on that internet thingamy here). A very scary number but a good source to showcase a couple of alternative methods to reduce that calculation time when producing your metrics.
The three methods I've used in my example are:
- An array formula pointing at my entire 600k+ records
- A near identical array formula working in conjunction with Advanced Filter
- Using SQL's Pivot & Transform method via ADO
Download the file and have a test of each of the methods by clicking on the Option buttons on the summary tab and see how they compare for speed. Feel free to take a look at the VBA that drives the Advanced Filter & ADO methods. Code exists in the ThisWorkbook, Summary, Methods & clsADO VB Components.
Please be aware that Excel will hang for a few seconds on workbook open as it saves a copy of the data sheet to your local AppData folder here C:\Users\%username%\AppData\Local to demonstrate the ADO/SQL method - the file should be removed on Workbook close but do check as the raw data in xlsx format is fairly sizeable.
Please also make sure you force a couple of recalculations by using the drop-downs to get a real feel for how quick the SQL is. The first time it runs it can hang for a couple for seconds, from there on in it's pretty damn quick.
The example workbook can be downloaded from my Google Drive here.
As all of our machines have different specs, I'd be interested in how each of the methods perform on your machines, let me know in the comments.
Senior Data Architect / Engineer
7 年I'll always be a womble! (Hopefully a nice cuddly womble though...)
BSc at Enersea
7 年Very cool. Good to see the different options and the way you set the worksheet up. All options worked for me and the difference in speed is huge. From just under a minute, to 2 second to almost instant. The pivot-table Dennis Tuttle proposed worked very fast as well, plus is intuitive for most users and gives you more filter options. /offtopic Over 65% of all cases were using any form of firearm. And still people want to keep them freely available to all [baffled]
Manager, Analytics Solutions Group at BDK, Inc.
7 年Hello Michael, the difference in using AdvancedFilter with the array formula is pretty dramatic. Sumproduct alone took 46.75 seconds, and adding advancedfilter dropped that to 1.3 seconds. The ADO function didn't work for me though. For all tests I used year 2000, male, sheriff in Maryland. Looks like the final demo isn't filtering correctly. Because my 'thing' is pivottables I also created a simple pivot to compare speed. The pivot seems as fast or faster than your other methods and gives me the same results as the first 2 examples. I accept your point of working with data too large for pivottables and NOT having access to Power Pivot, so I will study this from a learning perspective. But I often work with this volume of data in basic pivottables. Sometimes things can get slow, but generally when the target data is too large I look for ways to summarize on the host and provide options for selecting exports with subsets of data. Not perfect, but so far still 99% appropriate for my consumers.
Senior Executive Logistics at VNR Seeds Private Limited
7 年CoolPost :)