Excel LAMBDA Spotlight: UNPIVOT
CREDITS
In this article I'll look at a LAMBDA function called UNPIVOT. This one was written by?Erik Oehm, creator of the amazing Excel Robot.
This article is a continuation of last week's LAMBDA spotlight, which covered the CROSSJOIN function. UNPIVOT uses CROSSJOIN, so if you want a full understanding of both, you can read the CROSSJOIN article first.
OVERVIEW
In Erik's words, UNPIVOT, "given a table range with headers and array of header names, unpivots the specified columns in place, optionally removing any blank entries."
In the made-up example below, I've used UNPIVOT in cell I2 to select a cross-tabulated table of monthly sales by product and category. I've the whole table to the first argument (B2:G12), and the headers of the columns I want to unpivot to the second argument (D2:G2).
Both arguments can also be a Table, a named range or an array, whether produced by another function, or a typed array.
Hopefully you can instantly see the utility of UNPIVOT. Let's unpack how it works.
FUNCTION DEFINITION
/
Name: Unpivot Table (UNPIVOT)
Description: Given a table range with headers and array of header names, unpivots the
? ?specified columns in place, optionally removing any blank entries.
Written By: Erik Oehm / Excel Robot (@ExcelRobot)
Category: Array
*/
UNPIVOT = LAMBDA(table, columns_to_unpivot, [attribute_name], [value_name], [remove_blanks],
? ? LET(
? ? ? ? _AttributeLabel, IF(ISOMITTED(attribute_name), "Attribute", attribute_name),
? ? ? ? _ValueLabel, IF(ISOMITTED(value_name), "Value", value_name),
? ? ? ? _FirstColumnToUnpivot, MATCH(INDEX(columns_to_unpivot, , 1), INDEX(table, 1, ), 0),
? ? ? ? _UnpivotColumnCount, COLUMNS(columns_to_unpivot),
? ? ? ? _ColumnNumbers, SEQUENCE(1, COLUMNS(table)),
? ? ? ? _IncludeColumns, (_ColumnNumbers >= _FirstColumnToUnpivot) *
? ? ? ? ? ? (_ColumnNumbers < _FirstColumnToUnpivot + _UnpivotColumnCount),
? ? ? ? _UnpivotColumns, FILTER(_ColumnNumbers, _IncludeColumns),
? ? ? ? _OtherColumns, FILTER(_ColumnNumbers, NOT(_IncludeColumns)),
? ? ? ? _FullOuterJoin, CROSSJOIN(
? ? ? ? ? ? CHOOSECOLS(table, _OtherColumns),
? ? ? ? ? ? VSTACK(_AttributeLabel, TRANSPOSE(columns_to_unpivot)),
? ? ? ? ? ? TRUE
? ? ? ? ),
? ? ? ? _WithValues, HSTACK(
? ? ? ? ? ? _FullOuterJoin,
? ? ? ? ? ? VSTACK(_ValueLabel, TOCOL(DROP(CHOOSECOLS(table, _UnpivotColumns), 1)))
? ? ? ? ),
? ? ? ? _RemoveBlanks, IF(
? ? ? ? ? ? OR(ISOMITTED(remove_blanks), remove_blanks),
? ? ? ? ? ? FILTER(_WithValues, INDEX(_WithValues, , COLUMNS(_WithValues)) <> ""),
? ? ? ? ? ? IF(_WithValues = "", "", _WithValues)
? ? ? ? ),
? ? ? ? _ColumnOrder, LET(
? ? ? ? ? ? n, COLUMNS(_RemoveBlanks),
? ? ? ? ? ? s, SEQUENCE(1, n),
? ? ? ? ? ? IFS(
? ? ? ? ? ? ? ? s < _FirstColumnToUnpivot,
? ? ? ? ? ? ? ? s,
? ? ? ? ? ? ? ? s < _FirstColumnToUnpivot + 2,
? ? ? ? ? ? ? ? s + n - _FirstColumnToUnpivot - 1,
? ? ? ? ? ? ? ? TRUE,
? ? ? ? ? ? ? ? s - 2
? ? ? ? ? ? )
? ? ? ? ),
? ? ? ? _ReorderColumns, CHOOSECOLS(_RemoveBlanks, _ColumnOrder),
? ? ? ? _ReorderColumns
? ? )
);*
UNPIVOT takes five parameters:
BREAKDOWN
The function starts by defining LET variables:
? ? LET(
? ? ? ? _AttributeLabel, IF(ISOMITTED(attribute_name), "Attribute", attribute_name),
? ? ? ? _ValueLabel, IF(ISOMITTED(value_name), "Value", value_name),
_AttributeLabel and _ValueLabel use the ISOMITTED function to assign a default value in case either attribute_name or value_name are omitted.
This is such a common formula, I've added a feedback request to Microsoft to add an IFOMITTED function, which would shorten the definition of _AttributeLabel to this:
_AttributeLabel, IFOMITTED(attribute_name, "Attribute"),
Please vote for the idea here.
Next:
_FirstColumnToUnpivot, MATCH(INDEX(columns_to_unpivot, , 1), INDEX(table, 1, ), 0)
_UnpivotColumnCount, COLUMNS(columns_to_unpivot),
_ColumnNumbers, SEQUENCE(1, COLUMNS(table)),
_IncludeColumns, (_ColumnNumbers >= _FirstColumnToUnpivot) *
(_ColumnNumbers < _FirstColumnToUnpivot + _UnpivotColumnCount),
_FirstColumnToUnpivot takes the first value in the columns_to_unpivot argument and searches for it in the first row of table, then returns the position of the column in the table. This is used to separate what can be considered the 'grouping' columns (which aren't unpivoted) and the 'value' columns (which are).
_UnpivotColumnCount is the count of the columns to unpivot. Saving this simple function call like this is an indication it will be used in several places later on.
Similarly, _ColumnNumbers creates a row of integers representing the relative column index of each column in the table. In the example above, this is {1,2,3,4,5,6}.
_IncludeColumns is an interesting calculation. To illustrate this calculation, consider the example where I only want to unpivot February and March:
You can see that January and April have not been unpivoted. In this example:
_ColumnNumbers = {1,2,3,4,5,6}
_FirstColumnToUnpivot = 4
_UnpivotColumnCount = 2
_ColumnNumbers >= _FirstColumnToUnpivot
= { 1>=4, 2>=4, 3>=4,4>=4,5>=4,6>=4}
= {FALSE,FALSE,FALSE,TRUE,TRUE,TRUE}
_ColumnNumbers < _FirstColumnToUnpivot + _UnpivotColumnCount
= {1,2,3,4,5,6} < 4 + 2
= { 1<6, 2<6, 3<6, 4<6, 5<6, 6<6 }
= {TRUE,TRUE,TRUE,TRUE,TRUE,FALSE}
(_ColumnNumbers >= _FirstColumnToUnpivot) *
(_ColumnNumbers < _FirstColumnToUnpivot + _UnpivotColumnCount)
= {FALSE,FALSE,FALSE,TRUE,TRUE, TRUE} *
{ TRUE, TRUE, TRUE,TRUE,TRUE,FALSE}
= {FALSE,FALSE,FALSE,TRUE,TRUE,FALSE}
= {0,0,0,1,1,0}
I know it seems long-winded, but I hope it makes things crystal clear. The calculation is creating a boolean array where columns to be unpivoted are TRUE, and columns which won't be unpivoted are FALSE.
There is a limitation of this approach. This assumes that the columns to unpivot are contiguous in the table. If I try to unpivot only January and March, the function unpivots January and February:
I'd hesitate to call this a bug, as I've recall speaking to Erik about this earlier this year. Hence, it's a known limitation. Realistically, cross-tabulated data in Excel usually involves contiguous values in a domain across columns. It's not likely that I would only want to unpivot January and March, for example.
Nevertheless, Excel is famous for offering many ways to achieve the same goal. One other way we might consider for _IncludeColumns is:
? ? ? ? _IncludeColumns, BYCOL(
? ? ? ? ? ? TAKE(table, 1),
? ? ? ? ? ? LAMBDA(c, NOT(ISERROR(XMATCH(c, columns_to_unpivot))))
? ? ? ? ),
That is to say: iterate through each cell in the first row of the table (i.e. the header row). For each cell value, try to match it with the list of columns_to_unpivot . If there's a match, NOT(ISERROR( returns TRUE and FALSE if there isn't a match.
领英推荐
In fact, this "check for an item in a list" function is pretty common, and we can create a curried "list.CONTAINS" LAMBDA which can be used as shown.
list.CONTAINS = LAMBDA(array,
? ? LAMBDA(item,
? ? ? ? NOT(ISERROR(XMATCH(item, array)))
? ? )
);
_IncludeColumns, BYCOL(
TAKE(table, 1),
list.CONTAINS(columns_to_unpivot)
),
That's a variant of a function in my list module, which you're welcome to explore.
Getting back to UNPIVOT, next we have:
_UnpivotColumns, FILTER(_ColumnNumbers, _IncludeColumns)
_OtherColumns, FILTER(_ColumnNumbers, NOT(_IncludeColumns)),,
These two variables are just the column indices of the columns to unpivot and those of the other columns.
So, in the original example unpivoting all four months,
_UnpivotColumns = FILTER(_ColumnNumbers, _IncludeColumns)
= FILTER({1,2,3,4,5,6}, {0,0,1,1,1,1})
= {3,4,5,6}
_OtherColumns = FILTER(_ColumnNumbers, NOT(_IncludeColumns))
= FILTER({1,2,3,4,5,6}, {1,1,0,0,0,0})
= {1,2}
Next is where we use the CROSSJOIN:
? ? ? ? _FullOuterJoin, CROSSJOIN(
? ? ? ? ? ? CHOOSECOLS(table, _OtherColumns),
? ? ? ? ? ? VSTACK(_AttributeLabel, TRANSPOSE(columns_to_unpivot)),
? ? ? ? ? ? TRUE
? ? ? ? ),
I won't go into details (please read the article linked above if you'd like to understand how it works), but put simply: cross join the columns that won't be unpivoted with the vertical stack of the attribute label and the transpose of the columns to unpivot array (i.e. {"Attribute";"January";"February";"March";"April} ). The result of _FullOuterJoin looks like this:
Next we add the values to the cross-joined data:
? ? ? ? _WithValues, HSTACK(
? ? ? ? ? ? _FullOuterJoin,
? ? ? ? ? ? VSTACK(_ValueLabel, TOCOL(DROP(CHOOSECOLS(table, _UnpivotColumns), 1)))
? ? ? ? ),
Take the result from _FullOuterJoin (shown above) and to its right, drop the first row from the columns to be unpivoted, convert the result into a single column, then finally stack the value label on the top. The result:
If the function call has supplied TRUE to the, or omitted the, remove_blanks argument, then some filtering happens:
? ? ? ? _RemoveBlanks, IF(
? ? ? ? ? ? OR(ISOMITTED(remove_blanks), remove_blanks),
? ? ? ? ? ? FILTER(_WithValues, INDEX(_WithValues, , COLUMNS(_WithValues)) <> ""),
? ? ? ? ? ? IF(_WithValues = "", "", _WithValues)
? ? ? ? ),
In English: if either the remove_blanks argument is omitted, or remove_blanks is TRUE, then filter the _WithValues result (shown above), for those rows in the right-most column which are blank. Otherwise, just set _RemoveBlanks to _WithValues.
We're getting close to the end now.
? ? ? ? _ColumnOrder, LET(
? ? ? ? ? ? n, COLUMNS(_RemoveBlanks),
? ? ? ? ? ? s, SEQUENCE(1, n),
? ? ? ? ? ? IFS(
? ? ? ? ? ? ? ? s < _FirstColumnToUnpivot,
? ? ? ? ? ? ? ? s,
? ? ? ? ? ? ? ? s < _FirstColumnToUnpivot + 2,
? ? ? ? ? ? ? ? s + n - _FirstColumnToUnpivot - 1,
? ? ? ? ? ? ? ? TRUE,
? ? ? ? ? ? ? ? s - 2
? ? ? ? ? ? )
? ? ? ? ),
? ? ? ? _ReorderColumns, CHOOSECOLS(_RemoveBlanks, _ColumnOrder),
? ? ? ? _ReorderColumns
? ? )
);
_ColumnOrder is creating an array of integers which will be used to produce the output array. Consider the example of unpivoting only January and February:
The non-unpivoted columns are first, followed by Attribute and Value. In terms of the original table, the column order is 1,2,5,6,3,4
The important thing to remember about using IFS with an array is that it will test each element of the array against a condition and pass them separately through each condition until it returns TRUE for that element.
So, because the first element of s is 1, it returns TRUE from condition1 and is not passed to condition2. Similarly, because element 3 of s is 3, it returns FALSE from condition1 and is then passed for testing against condition2, from which it returns TRUE.
The net result can be visualized with this image:
I hope this makes sense. I think this is the trickiest part of this function to explain.
After calculating the column orders, we use them to reorder the columns:
? ? ? ? _ReorderColumns, CHOOSECOLS(_RemoveBlanks, _ColumnOrder),
? ? ? ? _ReorderColumns
? ? )
);
And the reordered columns are returned as the result of UNPIVOT.
That's it for now. I hope you enjoyed this article.
All credit for the function described in this article and in the CROSSJOIN article goes to?Erik Oehm.
If this article has been interesting, please leave a reaction, a comment, or preferably both, and share the article with your network.
If you have a suggestion for a function which you'd like put under the spotlight, please message me.
Thanks for reading!
--
6 个月when column header more than two or more, formula error
Technical Fellow at NAFEMS
1 年Impressive! Another proof that "He who thinks life is simple, hasn't understood the problem. My formula for unpivoting was UPλ(item, mask) = LET( ? ? itemArr, IF(mask, item, NA()), ? ? field, TOCOL(itemArr, 3, TRUE), ? ? field ) I call the module 3 times, once for the values and once for each of the categories and attributes which are broadcast over the mask. That left me everything to do in the calling Lambda, but it did allow me to filter by category, attribute or value fields as required. For example = LET( ??values,??DROP(table,,2), ??category, TAKE(table,,2), ??recordId, SEQUENCE(ROWS(table)), ??month,??DROP(table[#Headers],,2), ??mask,???(values > 50)*(TAKE(category,,1)="CategoryA"), ??UPValues, UPλ(values, mask), ??recdIdx,?UPλ(recordId, mask), ??mnthAttr, UPλ(month, mask), ??HSTACK(CHOOSEROWS(category, recdIdx), mnthAttr, UPValues) ?) returned the values from CategoryA that exceeded the 50.00 threshold. Not so much a utility as a knife and fork job!
Ph.D. in Economics
1 年That's great, Owen! Super useful! I'm really looking forward to a PIVOT's spotlight.
Accounting Consultant | Excel ?? | SQL ?? | Python ?? |
1 年A lot easier in Power Query ??
Data-driven by nature
1 年UNPIVOT strikes again... :) https://stackoverflow.com/a/76679211/1571950