Pivot ANYTHING in Snowflake, Without the SQL PIVOT Function

Pivot ANYTHING in Snowflake, Without the SQL PIVOT Function

The SQL PIVOT function has very limited functionality. It is only useful for numeric data, with very explicit, "hard coding" requirements. We frequently require more flexibility in pivoting data.

The desired flexibility can be accomplished with a combination of LISTAGG and SPLIT_PART SQL functions. This approach allows more robust pivoting functionality; it can be used to pivot any type of data as well as complex expressions. The examples may also be feasible in other RDBMS, as most have both LISTAGG and SPLIT_PART functions.

LISTAGG is an "aggregation" function, requiring GROUP BY column(s), while allowing complex column expressions for the pivoted values.

The basic code is similar to PIVOT:

SELECT
        <group_by_columns>, -- Typically Primary Key Columns
        LISTAGG(<column_expr>,'<seperator>') 
        [WITHIN GROUP ( <orderby_clause> )] COLUMN_LIST
        FROM
            <source_object>
        [WHERE] 
        GROUP BY
            <group_by_columns>        

We will look at three different types of pivoting.

Example One

In this example we know all of the desired pivoted column names and effectively hard code them. Following is the management structure of a consulting firm, consisting of the Account Mgr, the Project Mgr and the Chief Architect.

First, create the table:

CREATE TABLE
    IF NOT EXISTS PROJECT_MGMT
    (
        PROJECT_NAME VARCHAR,
        PROJECT_ROLE VARCHAR,
        PERSON       VARCHAR
    );        

Populate the table:

INSERT INTO "project_mgmt" ("PROJECT_NAME", "PROJECT_ROLE", "PERSON") VALUES 
('Organic Foods', 'Project Mgr', 'Bug Bunny'),
('Ring a Ding Phones', 'Project Mgr', 'Ernestine Smith'),
('Fast Cars Dealership', 'Project Mgr', 'Wilber Jones'),
('Organic Foods', 'Account Mgr', 'Daffy Duck'),
('Ring a Ding Phones', 'Account Mgr', 'Marjorie Jackson'),
('Fast Cars Dealership', 'Account Mgr', 'TBD'),
('Organic Foods', 'Chief Architect', 'Dilbert Clark'),
('Ring a Ding Phones', 'Chief Architect', 'Nora Eclaire'),
('Fast Cars Dealership', 'Chief Architect', 'Linda Brock');        

The elements in the LISTAGG are ordered by the desired column name. The SPLIT_PART function is used to pick the desired column from the ordered LISTAGG elements to create the desired pivoted column(s).

SELECT
    <group_by_columns>
    SPLIT_PART(COLUMN_LIST,',',1) PIVOTED_COLUMN_1,
    SPLIT_PART(COLUMN_LIST,',',2)     PIVOTED_COLUMN_2 ,
    SPLIT_PART(COLUMN_LIST,',',3)     PIVOTED_COLUMN_3,
    SPLIT_PART(COLUMN_LIST,',',4)     PIVOTED_COLUMN_4
FROM
    (   SELECT
            <group_by_columns>, -- Typically Primary Key Columns
            LISTAGG(<column_expr>,'<seperator>')
            [WITHIN GROUP (ORDER BY <orderby_clause>)] COLUMN_LIST
        FROM
            <source_object>
        [WHERE] 
        GROUP BY
            <group_by_columns>
    );        

The ORDER BY is critical to this type of pivot, as it orders the elements in the LISTAGG by the desired column names.


For the project management pivot query:

SELECT
    PROJECT_NAME,
    SPLIT_PART(PROJ_MGMT,',',1) "Account Mgr",
    SPLIT_PART(PROJ_MGMT,',',2) "Chief Architect",
    SPLIT_PART(PROJ_MGMT,',',3) "Project Mgr"
FROM
    (   SELECT
            "PROJECT_NAME",
            LISTAGG(PERSON,',') WITHIN GROUP (ORDER BY PROJECT_ROLE)  
               AS PROJ_MGMT
        FROM
            (   SELECT
                    PROJECT_NAME,
                    PROJECT_ROLE,
                    PERSON
                FROM
                    "PROJECT_MGMT"
                WHERE
                    PROJECT_ROLE IN ('Account Mgr',
                                     'Chief Architect',
                                     'Project Mgr') )
        GROUP BY
            PROJECT_NAME);        

Executing the query:

PROJECT_NAME         Account Mgr      Chief Architect Project Mgr     
-------------------- ---------------- --------------- --------------- 
Ring a Ding Phones   Marjorie Jackson Nora Eclaire    Ernestine Smith 
Fast Cars Dealership TBD              Linda Brock     Wilber Jones    
Organic Foods        Daffy Duck       Dilbert Clark   Bug Bunny               

Example Two

In this example, we demonstrate handling a fixed possible number of columns with the same type value. We handle the case of less than the maximum number of records existing to fill desired columns.

We demonstrate this with a loan application that allows a maximum of four co-signers. Only records of actual co-signers exist. For less than 4 co-signers, we want NULL in the column. If SPLIT_PART specifies a part greater than the length of the LISTAGG, it returns the empty string, not NULL; the query converts the empty string to NULL.

Create the table:

CREATE TABLE
    IF NOT EXISTS LOAN_APPLICATION
    (
        BORROWER  VARCHAR,
        CO_SIGNER VARCHAR
    );        

Populate the table:

INSERT INTO "LOAN_APPLICATION" ("BORROWER", "CO_SIGNER") VALUES 
('Albert Smith', 'Marjorie Smith'),
('Albert Smith', 'Bob Smith'),
('Albert Smith', 'Rudy Gonzalez'),
('Albert Smith', 'Scrooge McDuck'),
('Elmer Fudd', 'Bugs Bunny'),
('Elmer Fudd', 'Daffy Duck'),
('Elmer Fudd', 'Susan Fudd'),
('Indira Jones', 'Aya Jones'),
('Indira Jones', 'George Jones');        

The queryordering by co-signer name:

SELECT
    BORROWER,
    IFF(SPLIT_PART(CO_SIGNERS,',',1)<>'',SPLIT_PART(CO_SIGNERS,',',1),NULL) "Co-signer 1",
    IFF(SPLIT_PART(CO_SIGNERS,',',2)<>'',SPLIT_PART(CO_SIGNERS,',',2),NULL) "Co-signer 2",
    IFF(SPLIT_PART(CO_SIGNERS,',',3)<>'',SPLIT_PART(CO_SIGNERS,',',3),NULL) "Co-signer 3",
    IFF(SPLIT_PART(CO_SIGNERS,',',4)<>'',SPLIT_PART(CO_SIGNERS,',',4),NULL) "Co-signer 4"
FROM
    (   SELECT
            BORROWER,
            LISTAGG(CO_SIGNER,',') WITHIN GROUP (ORDER BY CO_SIGNER)
            AS CO_SIGNERS
        FROM
            (   SELECT
                    BORROWER,
                    LOAN_APPLICATION.CO_SIGNER
                FROM
                    PUBLIC.LOAN_APPLICATION )
        GROUP BY        

Execute the query:

BORROWER     Co-signer 1 Co-signer 2    Co-signer 3   Co-signer 4    
------------ ----------- -------------- ------------- -------------- 
Albert Smith Bob Smith   Marjorie Smith Rudy Gonzalez Scrooge McDuck 
Elmer Fudd   Bugs Bunny  Daffy Duck     Susan Fudd    (null)         
Indira Jones Aya Jones   George Jones   (null)        (null)                 

Example Three

Finally, the "ultimate" pivot! LISTAGG accepts column expressions, not just columns ,as arguments. We rotate the desired data into JSON VARIANTs. We can now write queries SELECTing the desired pivot columns and their values. We do not need to predefine the pivot column names. If the desired pivot column doesn't exist, the pivoted column has NULL as the value. We can now expand the number of roles for our projects at will!

This type of query can be packaged as a CTE, VIEW or sub-query.

 SELECT
            PARSE_JSON(PROJ_ROLES) PARSED
        FROM
            (   SELECT
                    CONCAT('{', CONCAT('"PROJECT_NAME','":"',PROJECT_NAME,'",', LISTAGG(CONCAT('"'
                    ,PROJECT_ROLE, '":','"',PERSON,'"'),',') ),'}')PROJ_ROLES
                FROM
                    (   SELECT
                            PROJECT_NAME,
                            PROJECT_ROLE,
                            PERSON
                        FROM
                            "PROJECT_MGMT")
                GROUP BY
                    PROJECT_NAME));        

This form does not require SPLIT_PART. SELECT is used instead. The fields in the example are cast to '::string' to strip the double quotes from the JSON. Other data types will required an appropriates CAST/::.

Remember that results are indeterminate if duplicate fields exist in the object!

SELECT
    PARSED:PROJECT_NAME::string "Project",
    PARSED:"Project Mgr"::string "Project Mgr",
    PARSED:"Chief Architect"::string "Chief Architect",
    PARSED:"DBA"::string "DBA"
FROM
    (   SELECT
            PARSE_JSON(PROJ_ROLES) PARSED
        FROM
            (   SELECT
                    CONCAT('{', CONCAT('"PROJECT_NAME','":"',PROJECT_NAME,'",', LISTAGG(CONCAT('"'
                    ,PROJECT_ROLE, '":','"',PERSON,'"'),',') ),'}')PROJ_ROLES
                FROM
                    (   SELECT
                            PROJECT_NAME,
                            PROJECT_ROLE,
                            PERSON
                        FROM
                            "PROJECT_MGMT")
                GROUP BY
                    PROJECT_NAME));        

Execute the query:

Project              Project Mgr     Chief Architect DBA    
-------------------- --------------- --------------- ------ 
Organic Foods        Bug Bunny       Dilbert Clark   (null) 
Fast Cars Dealership Wilber Jones    Linda Brock     (null) 
Ring a Ding Phones   Ernestine Smith Nora Eclaire    (null)         

Add a newly hired DBA to the Organic Foods project; recall that no DBAs were previously present!

INSERT INTO 
   PROJECT_MGMT VALUES 
    (   'Organic Foods', 
        'DBA', 
        'Clark Kent');A        

Execute the previous SELECT:

Project              Project Mgr     Chief Architect DBA        
-------------------- --------------- --------------- ---------- 
Organic Foods        Bug Bunny       Dilbert Clark   Clark Kent 
Fast Cars Dealership Wilber Jones    Linda Brock     (null)     
Ring a Ding Phones   Ernestine Smith Nora Eclaire    (null)             

We now have the ultimately flexible pivot query pattern! Constructing the JSON with appropriate naming enables pivoting multiple columns, potentially the entire table!

Thanks to fellow Snowflake Data Super Hero for his review and improvements!

Copyright ? 2022, Jeffrey Jacobs & Associate, LLC

Joseph Radwanski

Senior Data Systems Manager at Sequoia Consulting Group

3 个月

Thank you for writing this article. This is very helpful!

回复
Peter Cook

Principal Clinical Data Science Programmer @ ICON Strategic Solutions

8 个月

Very Nice. Thanks for sharing Jeffrey / Simeon

回复
Dmytro Chaurov

CEO | Quema | Building scalable and secure IT infrastructures and allocating dedicated IT engineers from our team

2 年

Jeffrey, thanks for sharing!

回复
Jeffrey Jacobs

CTO/Founder of AltaSQL.io

2 年

Updated with improved and corrected SQL queries, thanks to fellow Snowflake Data Super Hero Simeon Pilgrim!

回复

要查看或添加评论,请登录

Jeffrey Jacobs的更多文章

社区洞察

其他会员也浏览了