Pivot Tables are HORRIBLE!
https://www.northportsolutionsllc.com/post/what-are-pivot-tables

Pivot Tables are HORRIBLE!

This article and its contents are my own, they do not represent the views of my current, prior (or future) employer (s) … However, full disclosure -> I do work for Google at the time of writing this article and so may (do) express biased views !!


OK - At first, in case it's not clear, I hate and have always hated Pivot Tables. They're ugly, they are hard to navigate and can't handle complexity. In this article - I used some free public data and ask the question:

"Show me the numbers in July, Feb, and Mar since 2012 for only the 'Advertising' Business Unit, but only when the Account contains the word 'Expense' "

I started using Lotus 123, hmm I think when I was about 15 or 16, then I progressed to MS Excel, I used to help my uncle who was an accountant. More recently I've switched to Google Sheets given I'm a Google employee but am also a big fan of Workspace. However, this isn't a "which is the best spreadsheet tool article". Simply put - I just hate pivot tables no matter which spreadsheet I use. This became more relevant when I discovered the FILTER method. But, why Filter?

Source:

So, let's start with some sample data I got from here, I made a copy of the Sales Data example here.

The reason I use filter is because I can have better control of how data is summarised. This example spreadsheet has 352 rows of data, with 17 columns. It's actually much smaller than the data I tend to analyse, as that is usually large data sets from Big Query Connected Sheets.

Simply; Filter is used to filter our columns and rows of data from a large data set. For example in one filter I could fetch 1 column of data for 10 of my rows, that match a specific condition.

Try to remember that Filter kinda "only" takes two arguments (if you stick to this I find it easier to write a filter formula):

1) A column of data you want returned

2) A condition to use to filter or restrict the rows you want to get back

So , at first glance it seems Filter is horrible not pivot tables. But wait .... You use filter basically to give you some rows and columns from source data.

I hate and have always hated Pivot Tables

BUT, Filter is far more powerful than this. The reason is .... it actually takes more than two arguments.

In reality, the way it works is:

  1. The first argument can be more than one column, simply nest the columns in curly braces to form one argument
  2. You can then have as many conditions you want. The conditions don't need to be perfect matches either.

Example 1

So ... why does this improve things ... well because you can now ask more intelligent questions.

For example, "show me the numbers in July, Feb, and Mar for every year but only for the 'Advertising' Business Unit, but only when the Account contains the word 'Expense'.

Can you really really do this with a Pivot table? Yes possibly, but the output is really clunky and difficult to use, and has loads of collapsible sections.

To write this as a Filter you do this:

  1. Argument one; wrapped in curly braces will be the columns we want, i.e. the sales columns for July, Feb, and Mar
  2. The second, and third arguments are all conditions, the first one filters for Accounting, the third is to match for text that contains 'expense'

Immediately, notice a few things. We are free to pick columns in a different order that they appear in the source data, and we are using many conditions, this is different to standard countif or sumif type functions, and much neater than the IFS() function. This is what it looks like (don't worry, I'll simplify it later):

=filter({'Financials Raw Data'!L:L,'Financials Raw Data'!G:G,'Financials Raw Data'!H:H},'Financials Raw Data'!B:B="Advertising",regexmatch('Financials Raw Data'!A:A,"Expense"))        
If you've never used a Filter method below, you are thinking this looks really complicated, but it's not.

Think about it slowly, the curly braces first arg; is the selection of columns to pull only, the remainder arguments are a series of conditions.

When you think about it, the Filter method is actually more like a classic SQL clause, perhaps more so than using the QUERY function (See Example 3 to hear why I think Filter is better than QUERY). This example is a bit like:

"select, July, Feb, Mar from mydata where business_unit = 'Advertising' and account like '%Expense%'"        

You can see this filter in cell A2 in the Example1 tab of my trix.

Example 2 - Ok - but so What?

Now I have got the raw data for three months over time, I could do some additional summarisation. e.g. What if I just want the sum (or average) for each year of these conditions?

I could do this by ....... separating my formula into three filters, one for each column of data and wrap each one in a sum or average

OR

I could simply add 1 extra filter condition, which was the year. I could then use column headings for the years and row headings for the months I am interested in, like so:

the Filter method is actually more like a classic SQL clause

So what is different in this example?

  • In Example1, a single filter method was used to give many results. In this example, there is one filter producing each result, yet performance is still really fast, no lags loading the page
  • My column headings are dynamic, i.e. I will automatically get a new column heading when the source data has a new year, because in cell b2 I used the following formula, to basically get the unique years and sort the results and then print them horizontally:

transpose(sort(unique('Financials Raw Data'!D2:D)))        

  • The keyword "Expense" is not hardcoded in the filter, it's actually referenced in the formula. So, for example I could copy this whole block of data and formulas underneath and repeat for a different keyword, such as "Payroll" -

regexmatch('Financials Raw Data'!$A:$A,$A1)        

  • Each filter has an extra condition to only match rows in the year for the heading name:

'Financials Raw Data'!$D:$D=$B1        

Example 3 - This still seems messy

You're probably right. However, if you have done enough formulas in Excel, Sheets or a.n.other, you will know that this will happen if you always refer to columns using coordinates or cell references. So, let's use names:

So now our formulas are much neater, easier to read and self documenting. You can't do this with QUERY !!

This has improved two of my formulas:

  • It's easy to see which data my column heading refers to. I used a filter this time to exclude the source column heading "Year" and only get the actual years

transpose(sort(unique(filter(Year,isnumber(Year)))))        

  • The filter is now much easier to read:

sum(filter(Jul,BusinessUnit="Advertising",regexmatch(Account,$A1),Year=$B1))        

What about bad data?

Yes, sometimes we may have bad data or even no data - so how do we stop things falling over with a #N/A error?

The answer is the simple iferror function, which quietly catches any error, e.g. iferror(sum(filter(Jul,BusinessUnit="Advertising",regexmatch(Account,$A9),Year=$B9)))        

Common Errors

Filter is great, but you are going to abuse it, I do sometimes and so you will typically get one of the following errors (See the Common Errors tab)

  • #N/A - This is either because you specified conditions that returns no errors (wrap your filter in IFERROR() to avoid this
  • or ... one or more of your arguments has less rows than all the others. Filter can only compare conditions that have the same amount of rows.
  • #REF - This typically means that more than one row is returned by your filter, but there are no empty cells for filter to write the results.

Example 4 - Putting all this together

In the last example, I sum all the data for every year that matches specific conditions, and then count the matching rows based on a ranged numeric value. this tells us for example, that Advertising Sales was between 200 and 200M in 11 years, but then in one year it was between 400 & 500M.

  • The sum was performed using this formula:

sum(filter({Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec},Year=B$2,BusinessUnit=$A3,Account="sales"))        

  • However the count was performed using counta:

counta(iferror(filter($B3:$M3,$B3:$M3>B$9,$B3:$M3<C$9)))        

Since a filter may return a result, I wrapped it in an error first, so the counta method doesn't crash. I used counta rather than count, since it's more accurate in this usecase.

Anyway, I hope you enjoyed my ramblings. While I work for Google, I am not in anyway responsible or connected in anyway to Sheets, I just love using the product !


-- Suf



要查看或添加评论,请登录

Sufyaan Kazi的更多文章

  • Autism, they said ...

    Autism, they said ...

    Two important moments happened to me this year, 2024. 1) On the 23 May, I got on stage in front of approx 400 of our…

    23 条评论
  • Six Years working at Google

    Six Years working at Google

    This article and its contents are my own, they do not represent the views of my current, prior (or future) employer (s)…

    16 条评论
  • The lonely shirt(s) in the wardrobe, an ode to 2020

    The lonely shirt(s) in the wardrobe, an ode to 2020

    The day started as usual, well the 'new' normal during lockdown. I "commuted" from the breakfast table (which was now…

    11 条评论
  • Concurrency is not just a Computer Science problem

    Concurrency is not just a Computer Science problem

    This article and its contents are my own, they do not represent the views of my current, prior (or future) employer (s)…

    3 条评论
  • So, I joined Google ….

    So, I joined Google ….

    This article and its contents are my own, they do not represent the views of my current, prior (or future) employer (s)…

    4 条评论
  • Openshift vs Cloud Foundry, pt2 ... K8S, Ansible and BOSH

    Openshift vs Cloud Foundry, pt2 ... K8S, Ansible and BOSH

    This article and it's contents are my own, they do not represent the views of my current, prior (or future) employer…

    9 条评论
  • So long and thanks for all the fish .....

    So long and thanks for all the fish .....

    This article and it's contents are my own, they do not represent the views of my current, prior (or future) employer…

    5 条评论
  • You're hired .......

    You're hired .......

    This article and it's contents are my own, they do not represent the views of my current, prior (or future) employer…

  • Cloud Foundry vs OpenShift

    Cloud Foundry vs OpenShift

    This article and it's contents are my own, they do not represent the views of my current, prior (or future) employer…

    18 条评论
  • Fasting in Ramadan

    Fasting in Ramadan

    This article and it's contents are my own, they do not represent the views of my current, prior (or future) employer…

社区洞察

其他会员也浏览了