Pivot Tables are HORRIBLE!
Sufyaan Kazi
Head of Customer Engineering, working with Series A/B organisations in Fintech, Martech, Adtech, Video and Gaming
This article and its contents are my own, they do not represent the views of my current, prior (or future) employer (s) … However, full disclosure -> I do work for Google at the time of writing this article and so may (do) express biased views !!
OK - At first, in case it's not clear, I hate and have always hated Pivot Tables. They're ugly, they are hard to navigate and can't handle complexity. In this article - I used some free public data and ask the question:
"Show me the numbers in July, Feb, and Mar since 2012 for only the 'Advertising' Business Unit, but only when the Account contains the word 'Expense' "
I started using Lotus 123, hmm I think when I was about 15 or 16, then I progressed to MS Excel, I used to help my uncle who was an accountant. More recently I've switched to Google Sheets given I'm a Google employee but am also a big fan of Workspace. However, this isn't a "which is the best spreadsheet tool article". Simply put - I just hate pivot tables no matter which spreadsheet I use. This became more relevant when I discovered the FILTER method. But, why Filter?
So, let's start with some sample data I got from here, I made a copy of the Sales Data example here.
The reason I use filter is because I can have better control of how data is summarised. This example spreadsheet has 352 rows of data, with 17 columns. It's actually much smaller than the data I tend to analyse, as that is usually large data sets from Big Query Connected Sheets.
Simply; Filter is used to filter our columns and rows of data from a large data set. For example in one filter I could fetch 1 column of data for 10 of my rows, that match a specific condition.
Try to remember that Filter kinda "only" takes two arguments (if you stick to this I find it easier to write a filter formula):
1) A column of data you want returned
2) A condition to use to filter or restrict the rows you want to get back
So , at first glance it seems Filter is horrible not pivot tables. But wait .... You use filter basically to give you some rows and columns from source data.
I hate and have always hated Pivot Tables
BUT, Filter is far more powerful than this. The reason is .... it actually takes more than two arguments.
In reality, the way it works is:
Example 1
So ... why does this improve things ... well because you can now ask more intelligent questions.
For example, "show me the numbers in July, Feb, and Mar for every year but only for the 'Advertising' Business Unit, but only when the Account contains the word 'Expense'.
Can you really really do this with a Pivot table? Yes possibly, but the output is really clunky and difficult to use, and has loads of collapsible sections.
To write this as a Filter you do this:
Immediately, notice a few things. We are free to pick columns in a different order that they appear in the source data, and we are using many conditions, this is different to standard countif or sumif type functions, and much neater than the IFS() function. This is what it looks like (don't worry, I'll simplify it later):
=filter({'Financials Raw Data'!L:L,'Financials Raw Data'!G:G,'Financials Raw Data'!H:H},'Financials Raw Data'!B:B="Advertising",regexmatch('Financials Raw Data'!A:A,"Expense"))
If you've never used a Filter method below, you are thinking this looks really complicated, but it's not.
Think about it slowly, the curly braces first arg; is the selection of columns to pull only, the remainder arguments are a series of conditions.
When you think about it, the Filter method is actually more like a classic SQL clause, perhaps more so than using the QUERY function (See Example 3 to hear why I think Filter is better than QUERY). This example is a bit like:
"select, July, Feb, Mar from mydata where business_unit = 'Advertising' and account like '%Expense%'"
You can see this filter in cell A2 in the Example1 tab of my trix.
Example 2 - Ok - but so What?
Now I have got the raw data for three months over time, I could do some additional summarisation. e.g. What if I just want the sum (or average) for each year of these conditions?
I could do this by ....... separating my formula into three filters, one for each column of data and wrap each one in a sum or average
OR
I could simply add 1 extra filter condition, which was the year. I could then use column headings for the years and row headings for the months I am interested in, like so:
领英推荐
the Filter method is actually more like a classic SQL clause
So what is different in this example?
transpose(sort(unique('Financials Raw Data'!D2:D)))
regexmatch('Financials Raw Data'!$A:$A,$A1)
'Financials Raw Data'!$D:$D=$B1
Example 3 - This still seems messy
You're probably right. However, if you have done enough formulas in Excel, Sheets or a.n.other, you will know that this will happen if you always refer to columns using coordinates or cell references. So, let's use names:
So now our formulas are much neater, easier to read and self documenting. You can't do this with QUERY !!
This has improved two of my formulas:
transpose(sort(unique(filter(Year,isnumber(Year)))))
sum(filter(Jul,BusinessUnit="Advertising",regexmatch(Account,$A1),Year=$B1))
What about bad data?
Yes, sometimes we may have bad data or even no data - so how do we stop things falling over with a #N/A error?
The answer is the simple iferror function, which quietly catches any error, e.g. iferror(sum(filter(Jul,BusinessUnit="Advertising",regexmatch(Account,$A9),Year=$B9)))
Common Errors
Filter is great, but you are going to abuse it, I do sometimes and so you will typically get one of the following errors (See the Common Errors tab)
Example 4 - Putting all this together
In the last example, I sum all the data for every year that matches specific conditions, and then count the matching rows based on a ranged numeric value. this tells us for example, that Advertising Sales was between 200 and 200M in 11 years, but then in one year it was between 400 & 500M.
sum(filter({Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec},Year=B$2,BusinessUnit=$A3,Account="sales"))
counta(iferror(filter($B3:$M3,$B3:$M3>B$9,$B3:$M3<C$9)))
Since a filter may return a result, I wrapped it in an error first, so the counta method doesn't crash. I used counta rather than count, since it's more accurate in this usecase.
Anyway, I hope you enjoyed my ramblings. While I work for Google, I am not in anyway responsible or connected in anyway to Sheets, I just love using the product !
-- Suf