Pivot ANYTHING in Snowflake, Without the SQL PIVOT Function
The SQL PIVOT function has very limited functionality. It is only useful for numeric data, with very explicit, "hard coding" requirements. We frequently require more flexibility in pivoting data.
The desired flexibility can be accomplished with a combination of LISTAGG and SPLIT_PART SQL functions. This approach allows more robust pivoting functionality; it can be used to pivot any type of data as well as complex expressions. The examples may also be feasible in other RDBMS, as most have both LISTAGG and SPLIT_PART functions.
LISTAGG is an "aggregation" function, requiring GROUP BY column(s), while allowing complex column expressions for the pivoted values.
The basic code is similar to PIVOT:
SELECT
<group_by_columns>, -- Typically Primary Key Columns
LISTAGG(<column_expr>,'<seperator>')
[WITHIN GROUP ( <orderby_clause> )] COLUMN_LIST
FROM
<source_object>
[WHERE]
GROUP BY
<group_by_columns>
We will look at three different types of pivoting.
Example One
In this example we know all of the desired pivoted column names and effectively hard code them. Following is the management structure of a consulting firm, consisting of the Account Mgr, the Project Mgr and the Chief Architect.
First, create the table:
CREATE TABLE
IF NOT EXISTS PROJECT_MGMT
(
PROJECT_NAME VARCHAR,
PROJECT_ROLE VARCHAR,
PERSON VARCHAR
);
Populate the table:
INSERT INTO "project_mgmt" ("PROJECT_NAME", "PROJECT_ROLE", "PERSON") VALUES
('Organic Foods', 'Project Mgr', 'Bug Bunny'),
('Ring a Ding Phones', 'Project Mgr', 'Ernestine Smith'),
('Fast Cars Dealership', 'Project Mgr', 'Wilber Jones'),
('Organic Foods', 'Account Mgr', 'Daffy Duck'),
('Ring a Ding Phones', 'Account Mgr', 'Marjorie Jackson'),
('Fast Cars Dealership', 'Account Mgr', 'TBD'),
('Organic Foods', 'Chief Architect', 'Dilbert Clark'),
('Ring a Ding Phones', 'Chief Architect', 'Nora Eclaire'),
('Fast Cars Dealership', 'Chief Architect', 'Linda Brock');
The elements in the LISTAGG are ordered by the desired column name. The SPLIT_PART function is used to pick the desired column from the ordered LISTAGG elements to create the desired pivoted column(s).
SELECT
<group_by_columns>
SPLIT_PART(COLUMN_LIST,',',1) PIVOTED_COLUMN_1,
SPLIT_PART(COLUMN_LIST,',',2) PIVOTED_COLUMN_2 ,
SPLIT_PART(COLUMN_LIST,',',3) PIVOTED_COLUMN_3,
SPLIT_PART(COLUMN_LIST,',',4) PIVOTED_COLUMN_4
FROM
( SELECT
<group_by_columns>, -- Typically Primary Key Columns
LISTAGG(<column_expr>,'<seperator>')
[WITHIN GROUP (ORDER BY <orderby_clause>)] COLUMN_LIST
FROM
<source_object>
[WHERE]
GROUP BY
<group_by_columns>
);
The ORDER BY is critical to this type of pivot, as it orders the elements in the LISTAGG by the desired column names.
For the project management pivot query:
SELECT
PROJECT_NAME,
SPLIT_PART(PROJ_MGMT,',',1) "Account Mgr",
SPLIT_PART(PROJ_MGMT,',',2) "Chief Architect",
SPLIT_PART(PROJ_MGMT,',',3) "Project Mgr"
FROM
( SELECT
"PROJECT_NAME",
LISTAGG(PERSON,',') WITHIN GROUP (ORDER BY PROJECT_ROLE)
AS PROJ_MGMT
FROM
( SELECT
PROJECT_NAME,
PROJECT_ROLE,
PERSON
FROM
"PROJECT_MGMT"
WHERE
PROJECT_ROLE IN ('Account Mgr',
'Chief Architect',
'Project Mgr') )
GROUP BY
PROJECT_NAME);
Executing the query:
PROJECT_NAME Account Mgr Chief Architect Project Mgr
-------------------- ---------------- --------------- ---------------
Ring a Ding Phones Marjorie Jackson Nora Eclaire Ernestine Smith
Fast Cars Dealership TBD Linda Brock Wilber Jones
Organic Foods Daffy Duck Dilbert Clark Bug Bunny
Example Two
In this example, we demonstrate handling a fixed possible number of columns with the same type value. We handle the case of less than the maximum number of records existing to fill desired columns.
We demonstrate this with a loan application that allows a maximum of four co-signers. Only records of actual co-signers exist. For less than 4 co-signers, we want NULL in the column. If SPLIT_PART specifies a part greater than the length of the LISTAGG, it returns the empty string, not NULL; the query converts the empty string to NULL.
Create the table:
领英推荐
CREATE TABLE
IF NOT EXISTS LOAN_APPLICATION
(
BORROWER VARCHAR,
CO_SIGNER VARCHAR
);
Populate the table:
INSERT INTO "LOAN_APPLICATION" ("BORROWER", "CO_SIGNER") VALUES
('Albert Smith', 'Marjorie Smith'),
('Albert Smith', 'Bob Smith'),
('Albert Smith', 'Rudy Gonzalez'),
('Albert Smith', 'Scrooge McDuck'),
('Elmer Fudd', 'Bugs Bunny'),
('Elmer Fudd', 'Daffy Duck'),
('Elmer Fudd', 'Susan Fudd'),
('Indira Jones', 'Aya Jones'),
('Indira Jones', 'George Jones');
The queryordering by co-signer name:
SELECT
BORROWER,
IFF(SPLIT_PART(CO_SIGNERS,',',1)<>'',SPLIT_PART(CO_SIGNERS,',',1),NULL) "Co-signer 1",
IFF(SPLIT_PART(CO_SIGNERS,',',2)<>'',SPLIT_PART(CO_SIGNERS,',',2),NULL) "Co-signer 2",
IFF(SPLIT_PART(CO_SIGNERS,',',3)<>'',SPLIT_PART(CO_SIGNERS,',',3),NULL) "Co-signer 3",
IFF(SPLIT_PART(CO_SIGNERS,',',4)<>'',SPLIT_PART(CO_SIGNERS,',',4),NULL) "Co-signer 4"
FROM
( SELECT
BORROWER,
LISTAGG(CO_SIGNER,',') WITHIN GROUP (ORDER BY CO_SIGNER)
AS CO_SIGNERS
FROM
( SELECT
BORROWER,
LOAN_APPLICATION.CO_SIGNER
FROM
PUBLIC.LOAN_APPLICATION )
GROUP BY
Execute the query:
BORROWER Co-signer 1 Co-signer 2 Co-signer 3 Co-signer 4
------------ ----------- -------------- ------------- --------------
Albert Smith Bob Smith Marjorie Smith Rudy Gonzalez Scrooge McDuck
Elmer Fudd Bugs Bunny Daffy Duck Susan Fudd (null)
Indira Jones Aya Jones George Jones (null) (null)
Example Three
Finally, the "ultimate" pivot! LISTAGG accepts column expressions, not just columns ,as arguments. We rotate the desired data into JSON VARIANTs. We can now write queries SELECTing the desired pivot columns and their values. We do not need to predefine the pivot column names. If the desired pivot column doesn't exist, the pivoted column has NULL as the value. We can now expand the number of roles for our projects at will!
This type of query can be packaged as a CTE, VIEW or sub-query.
SELECT
PARSE_JSON(PROJ_ROLES) PARSED
FROM
( SELECT
CONCAT('{', CONCAT('"PROJECT_NAME','":"',PROJECT_NAME,'",', LISTAGG(CONCAT('"'
,PROJECT_ROLE, '":','"',PERSON,'"'),',') ),'}')PROJ_ROLES
FROM
( SELECT
PROJECT_NAME,
PROJECT_ROLE,
PERSON
FROM
"PROJECT_MGMT")
GROUP BY
PROJECT_NAME));
This form does not require SPLIT_PART. SELECT is used instead. The fields in the example are cast to '::string' to strip the double quotes from the JSON. Other data types will required an appropriates CAST/::.
Remember that results are indeterminate if duplicate fields exist in the object!
SELECT
PARSED:PROJECT_NAME::string "Project",
PARSED:"Project Mgr"::string "Project Mgr",
PARSED:"Chief Architect"::string "Chief Architect",
PARSED:"DBA"::string "DBA"
FROM
( SELECT
PARSE_JSON(PROJ_ROLES) PARSED
FROM
( SELECT
CONCAT('{', CONCAT('"PROJECT_NAME','":"',PROJECT_NAME,'",', LISTAGG(CONCAT('"'
,PROJECT_ROLE, '":','"',PERSON,'"'),',') ),'}')PROJ_ROLES
FROM
( SELECT
PROJECT_NAME,
PROJECT_ROLE,
PERSON
FROM
"PROJECT_MGMT")
GROUP BY
PROJECT_NAME));
Execute the query:
Project Project Mgr Chief Architect DBA
-------------------- --------------- --------------- ------
Organic Foods Bug Bunny Dilbert Clark (null)
Fast Cars Dealership Wilber Jones Linda Brock (null)
Ring a Ding Phones Ernestine Smith Nora Eclaire (null)
Add a newly hired DBA to the Organic Foods project; recall that no DBAs were previously present!
INSERT INTO
PROJECT_MGMT VALUES
( 'Organic Foods',
'DBA',
'Clark Kent');A
Execute the previous SELECT:
Project Project Mgr Chief Architect DBA
-------------------- --------------- --------------- ----------
Organic Foods Bug Bunny Dilbert Clark Clark Kent
Fast Cars Dealership Wilber Jones Linda Brock (null)
Ring a Ding Phones Ernestine Smith Nora Eclaire (null)
We now have the ultimately flexible pivot query pattern! Constructing the JSON with appropriate naming enables pivoting multiple columns, potentially the entire table!
Thanks to fellow Snowflake Data Super Hero for his review and improvements!
Copyright ? 2022, Jeffrey Jacobs & Associate, LLC
Senior Data Systems Manager at Sequoia Consulting Group
3 个月Thank you for writing this article. This is very helpful!
Principal Clinical Data Science Programmer @ ICON Strategic Solutions
8 个月Very Nice. Thanks for sharing Jeffrey / Simeon
CEO | Quema | Building scalable and secure IT infrastructures and allocating dedicated IT engineers from our team
2 年Jeffrey, thanks for sharing!
CTO/Founder of AltaSQL.io
2 年Updated with improved and corrected SQL queries, thanks to fellow Snowflake Data Super Hero Simeon Pilgrim!