登录查看更多内容

Pivot ANYTHING in Snowflake, Without the SQL PIVOT Function

Jeffrey Jacobs

CTO/Founder of AltaSQL.io

发布日期: 2022年5月16日

The SQL PIVOT function has very limited functionality. It is only useful for numeric data, with very explicit, "hard coding" requirements. We frequently require more flexibility in pivoting data.

The desired flexibility can be accomplished with a combination of LISTAGG and SPLIT_PART SQL functions. This approach allows more robust pivoting functionality; it can be used to pivot any type of data as well as complex expressions. The examples may also be feasible in other RDBMS, as most have both LISTAGG and SPLIT_PART functions.

LISTAGG is an "aggregation" function, requiring GROUP BY column(s), while allowing complex column expressions for the pivoted values.

The basic code is similar to PIVOT:

SELECT
        <group_by_columns>, -- Typically Primary Key Columns
        LISTAGG(<column_expr>,'<seperator>') 
        [WITHIN GROUP ( <orderby_clause> )] COLUMN_LIST
        FROM
            <source_object>
        [WHERE] 
        GROUP BY
            <group_by_columns>

We will look at three different types of pivoting.

Example One

In this example we know all of the desired pivoted column names and effectively hard code them. Following is the management structure of a consulting firm, consisting of the Account Mgr, the Project Mgr and the Chief Architect.

First, create the table:

CREATE TABLE
    IF NOT EXISTS PROJECT_MGMT
    (
        PROJECT_NAME VARCHAR,
        PROJECT_ROLE VARCHAR,
        PERSON       VARCHAR
    );

Populate the table:

INSERT INTO "project_mgmt" ("PROJECT_NAME", "PROJECT_ROLE", "PERSON") VALUES 
('Organic Foods', 'Project Mgr', 'Bug Bunny'),
('Ring a Ding Phones', 'Project Mgr', 'Ernestine Smith'),
('Fast Cars Dealership', 'Project Mgr', 'Wilber Jones'),
('Organic Foods', 'Account Mgr', 'Daffy Duck'),
('Ring a Ding Phones', 'Account Mgr', 'Marjorie Jackson'),
('Fast Cars Dealership', 'Account Mgr', 'TBD'),
('Organic Foods', 'Chief Architect', 'Dilbert Clark'),
('Ring a Ding Phones', 'Chief Architect', 'Nora Eclaire'),
('Fast Cars Dealership', 'Chief Architect', 'Linda Brock');

The elements in the LISTAGG are ordered by the desired column name. The SPLIT_PART function is used to pick the desired column from the ordered LISTAGG elements to create the desired pivoted column(s).

SELECT
    <group_by_columns>
    SPLIT_PART(COLUMN_LIST,',',1) PIVOTED_COLUMN_1,
    SPLIT_PART(COLUMN_LIST,',',2)     PIVOTED_COLUMN_2 ,
    SPLIT_PART(COLUMN_LIST,',',3)     PIVOTED_COLUMN_3,
    SPLIT_PART(COLUMN_LIST,',',4)     PIVOTED_COLUMN_4
FROM
    (   SELECT
            <group_by_columns>, -- Typically Primary Key Columns
            LISTAGG(<column_expr>,'<seperator>')
            [WITHIN GROUP (ORDER BY <orderby_clause>)] COLUMN_LIST
        FROM
            <source_object>
        [WHERE] 
        GROUP BY
            <group_by_columns>
    );

The ORDER BY is critical to this type of pivot, as it orders the elements in the LISTAGG by the desired column names.

For the project management pivot query:

SELECT
    PROJECT_NAME,
    SPLIT_PART(PROJ_MGMT,',',1) "Account Mgr",
    SPLIT_PART(PROJ_MGMT,',',2) "Chief Architect",
    SPLIT_PART(PROJ_MGMT,',',3) "Project Mgr"
FROM
    (   SELECT
            "PROJECT_NAME",
            LISTAGG(PERSON,',') WITHIN GROUP (ORDER BY PROJECT_ROLE)  
               AS PROJ_MGMT
        FROM
            (   SELECT
                    PROJECT_NAME,
                    PROJECT_ROLE,
                    PERSON
                FROM
                    "PROJECT_MGMT"
                WHERE
                    PROJECT_ROLE IN ('Account Mgr',
                                     'Chief Architect',
                                     'Project Mgr') )
        GROUP BY
            PROJECT_NAME);

Executing the query:

PROJECT_NAME         Account Mgr      Chief Architect Project Mgr     
-------------------- ---------------- --------------- --------------- 
Ring a Ding Phones   Marjorie Jackson Nora Eclaire    Ernestine Smith 
Fast Cars Dealership TBD              Linda Brock     Wilber Jones    
Organic Foods        Daffy Duck       Dilbert Clark   Bug Bunny

Example Two

In this example, we demonstrate handling a fixed possible number of columns with the same type value. We handle the case of less than the maximum number of records existing to fill desired columns.

We demonstrate this with a loan application that allows a maximum of four co-signers. Only records of actual co-signers exist. For less than 4 co-signers, we want NULL in the column. If SPLIT_PART specifies a part greater than the length of the LISTAGG, it returns the empty string, not NULL; the query converts the empty string to NULL.

Create the table:

领英推荐

5 Time-Saving Hacks for Data Analysts Using SQL

Quantum Analytics NG 2 个月前

Advanced SQL Skills for the Modern Analyst

Walter Shields 1 个月前

Counting Distinct Values in SQL: Tips and Examples

StrataScratch 5 个月前

CREATE TABLE
    IF NOT EXISTS LOAN_APPLICATION
    (
        BORROWER  VARCHAR,
        CO_SIGNER VARCHAR
    );

Populate the table:

INSERT INTO "LOAN_APPLICATION" ("BORROWER", "CO_SIGNER") VALUES 
('Albert Smith', 'Marjorie Smith'),
('Albert Smith', 'Bob Smith'),
('Albert Smith', 'Rudy Gonzalez'),
('Albert Smith', 'Scrooge McDuck'),
('Elmer Fudd', 'Bugs Bunny'),
('Elmer Fudd', 'Daffy Duck'),
('Elmer Fudd', 'Susan Fudd'),
('Indira Jones', 'Aya Jones'),
('Indira Jones', 'George Jones');

The queryordering by co-signer name:

SELECT
    BORROWER,
    IFF(SPLIT_PART(CO_SIGNERS,',',1)<>'',SPLIT_PART(CO_SIGNERS,',',1),NULL) "Co-signer 1",
    IFF(SPLIT_PART(CO_SIGNERS,',',2)<>'',SPLIT_PART(CO_SIGNERS,',',2),NULL) "Co-signer 2",
    IFF(SPLIT_PART(CO_SIGNERS,',',3)<>'',SPLIT_PART(CO_SIGNERS,',',3),NULL) "Co-signer 3",
    IFF(SPLIT_PART(CO_SIGNERS,',',4)<>'',SPLIT_PART(CO_SIGNERS,',',4),NULL) "Co-signer 4"
FROM
    (   SELECT
            BORROWER,
            LISTAGG(CO_SIGNER,',') WITHIN GROUP (ORDER BY CO_SIGNER)
            AS CO_SIGNERS
        FROM
            (   SELECT
                    BORROWER,
                    LOAN_APPLICATION.CO_SIGNER
                FROM
                    PUBLIC.LOAN_APPLICATION )
        GROUP BY

Execute the query:

BORROWER     Co-signer 1 Co-signer 2    Co-signer 3   Co-signer 4    
------------ ----------- -------------- ------------- -------------- 
Albert Smith Bob Smith   Marjorie Smith Rudy Gonzalez Scrooge McDuck 
Elmer Fudd   Bugs Bunny  Daffy Duck     Susan Fudd    (null)         
Indira Jones Aya Jones   George Jones   (null)        (null)

Example Three

Finally, the "ultimate" pivot! LISTAGG accepts column expressions, not just columns ,as arguments. We rotate the desired data into JSON VARIANTs. We can now write queries SELECTing the desired pivot columns and their values. We do not need to predefine the pivot column names. If the desired pivot column doesn't exist, the pivoted column has NULL as the value. We can now expand the number of roles for our projects at will!

This type of query can be packaged as a CTE, VIEW or sub-query.

 SELECT
            PARSE_JSON(PROJ_ROLES) PARSED
        FROM
            (   SELECT
                    CONCAT('{', CONCAT('"PROJECT_NAME','":"',PROJECT_NAME,'",', LISTAGG(CONCAT('"'
                    ,PROJECT_ROLE, '":','"',PERSON,'"'),',') ),'}')PROJ_ROLES
                FROM
                    (   SELECT
                            PROJECT_NAME,
                            PROJECT_ROLE,
                            PERSON
                        FROM
                            "PROJECT_MGMT")
                GROUP BY
                    PROJECT_NAME));

This form does not require SPLIT_PART. SELECT is used instead. The fields in the example are cast to '::string' to strip the double quotes from the JSON. Other data types will required an appropriates CAST/::.

Remember that results are indeterminate if duplicate fields exist in the object!

SELECT
    PARSED:PROJECT_NAME::string "Project",
    PARSED:"Project Mgr"::string "Project Mgr",
    PARSED:"Chief Architect"::string "Chief Architect",
    PARSED:"DBA"::string "DBA"
FROM
    (   SELECT
            PARSE_JSON(PROJ_ROLES) PARSED
        FROM
            (   SELECT
                    CONCAT('{', CONCAT('"PROJECT_NAME','":"',PROJECT_NAME,'",', LISTAGG(CONCAT('"'
                    ,PROJECT_ROLE, '":','"',PERSON,'"'),',') ),'}')PROJ_ROLES
                FROM
                    (   SELECT
                            PROJECT_NAME,
                            PROJECT_ROLE,
                            PERSON
                        FROM
                            "PROJECT_MGMT")
                GROUP BY
                    PROJECT_NAME));

Execute the query:

Project              Project Mgr     Chief Architect DBA    
-------------------- --------------- --------------- ------ 
Organic Foods        Bug Bunny       Dilbert Clark   (null) 
Fast Cars Dealership Wilber Jones    Linda Brock     (null) 
Ring a Ding Phones   Ernestine Smith Nora Eclaire    (null)

Add a newly hired DBA to the Organic Foods project; recall that no DBAs were previously present!

INSERT INTO 
   PROJECT_MGMT VALUES 
    (   'Organic Foods', 
        'DBA', 
        'Clark Kent');A

Execute the previous SELECT:

Project              Project Mgr     Chief Architect DBA        
-------------------- --------------- --------------- ---------- 
Organic Foods        Bug Bunny       Dilbert Clark   Clark Kent 
Fast Cars Dealership Wilber Jones    Linda Brock     (null)     
Ring a Ding Phones   Ernestine Smith Nora Eclaire    (null)

We now have the ultimately flexible pivot query pattern! Constructing the JSON with appropriate naming enables pivoting multiple columns, potentially the entire table!

Thanks to fellow Snowflake Data Super Hero for his review and improvements!

Joseph Radwanski

Senior Data Systems Manager at Sequoia Consulting Group

3 个月

Thank you for writing this article. This is very helpful!

Peter Cook

Principal Clinical Data Science Programmer @ ICON Strategic Solutions

8 个月

Very Nice. Thanks for sharing Jeffrey / Simeon

Dmytro Chaurov

CEO | Quema | Building scalable and secure IT infrastructures and allocating dedicated IT engineers from our team

2 年

Jeffrey, thanks for sharing!

Jeffrey Jacobs

CTO/Founder of AltaSQL.io

2 年

Updated with improved and corrected SQL queries, thanks to fellow Snowflake Data Super Hero Simeon Pilgrim!

查看更多评论

要查看或添加评论，请登录

Jeffrey Jacobs的更多文章

AltaSQL Bulk Updating Expressions Across All View Definitions

2025年2月23日

AltaSQL Bulk Updating Expressions Across All View Definitions

Bulk Updating Expressions Across All View Definitions Using AltaSQL AltaSQL allows users to efficiently apply changes…
Generating 50+ SQL Statements in Under 10 Minutes; No Hand Written SQL!

2025年2月13日

Generating 50+ SQL Statements in Under 10 Minutes; No Hand Written SQL!

The AltaSQL SELECT Discover Demo uses the Chinook Music database whose columns are in CamelCase. This is the same demo…
Snowflake DBA-101; Deploying Standardized, Fully Functional Databases

2022年3月20日

Snowflake DBA-101; Deploying Standardized, Fully Functional Databases

This article provides both a guide and script for creating and deploying standardized, fully functional, Snowflake…

1 条评论
Snowflake and ELVT vs [ELT|ETL] - Case Study Part 2, Real Time Availability for Single Row INSERTs

2022年1月23日

Snowflake and ELVT vs [ELT|ETL] - Case Study Part 2, Real Time Availability for Single Row INSERTs

In my previous article, Snowflake and ELVT vs [ELT|ETL] – Case Study Part 1, The “No Data Model” Data Architecture, I…
Snowflake and Duo Mobile, How to Lose Login Ability and MFA Across Multiple Accounts:ERROR: "USER IS NOT ENROLLED IN DUO SECURITY. CONTACT YOUR LOCAL

2022年1月8日

Snowflake and Duo Mobile, How to Lose Login Ability and MFA Across Multiple Accounts:ERROR: "USER IS NOT ENROLLED IN DUO SECURITY. CONTACT YOUR LOCAL

In working with multiple Snowflake accounts, I found a serious issue that I have not seen addressed elsewhere. NOTE:…

7 条评论
Snowflake vs Databricks: TPCS-DS Benchmark Wars – Who Cares?

2021年11月27日

Snowflake vs Databricks: TPCS-DS Benchmark Wars – Who Cares?

This article discusses the relevance of the recent TPC-DS “results” to potential customers. Let’s start with…

2 条评论
Snowflake and ELVT vs [ELT|ETL] – Case Study Part 1, The “No Data Model” Data Architecture

2021年11月15日

Snowflake and ELVT vs [ELT|ETL] – Case Study Part 1, The “No Data Model” Data Architecture

This article is a case study of a real-world implementation. The client and purpose of the application are not relevant…

11 条评论
Snowflake and ELVT vs [ELT,ETL], Part 2, The ELVT Reference Architecture

2021年5月16日

Snowflake and ELVT vs [ELT,ETL], Part 2, The ELVT Reference Architecture

This is the promised follow up to my last article, Snowflake and EVLT vs [ELT, ETL], Part 1, discussing the advantages…

1 条评论
Snowflake and ELVT vs [ELT, ETL], Part 1

2021年4月16日

Snowflake and ELVT vs [ELT, ETL], Part 1

Over several generations of RDBMS technologies, I have learned that common practices, knowledge, and attitudes become…

23 条评论
Snowflake Micro-partition vs Legacy Macro-partition Pruning

2021年4月4日

Snowflake Micro-partition vs Legacy Macro-partition Pruning

I have been in the data business through several RDBM generations and have seen many attempts at comparing performance…

2 条评论

See all articles

Pivot ANYTHING in Snowflake, Without the SQL PIVOT Function

Jeffrey Jacobs

CTO/Founder of AltaSQL.io

Example One

Example Two

领英推荐

Example Three

Jeffrey Jacobs的更多文章

社区洞察

其他会员也浏览了

SQL for Data Projects: Key Commands You Need

Unlocking the Power of Semi-Structured Data in Snowflake

Beyond the Basics: Advanced SQL Techniques for Data Analysts

SQL Beyond the Basics: Advanced SQL Techniques

What Are the Steps to Cast INT in SQL for Type Conversion?

Understanding COUNT(*) vs COUNT(1) in SQL: When and How to Use Them

5 Essential SQL Techniques Every Data Analyst Should Master

Breaking Down SQL: Understanding the Difference Between GROUP BY and PARTITION BY

Dynamic SQL for Data Analysts: Tools, Techniques, and Tips

Databricks SQL Series: Advanced Analytics in Databricks SQL — Using Window Functions — Part 3

Example One

Example Two

领英推荐

Example Three

Jeffrey Jacobs的更多文章

AltaSQL Bulk Updating Expressions Across All View Definitions

Generating 50+ SQL Statements in Under 10 Minutes; No Hand Written SQL!

Snowflake DBA-101; Deploying Standardized, Fully Functional Databases

Snowflake and ELVT vs [ELT|ETL] - Case Study Part 2, Real Time Availability for Single Row INSERTs

Snowflake and Duo Mobile, How to Lose Login Ability and MFA Across Multiple Accounts:ERROR: "USER IS NOT ENROLLED IN DUO SECURITY. CONTACT YOUR LOCAL

Snowflake vs Databricks: TPCS-DS Benchmark Wars – Who Cares?

Snowflake and ELVT vs [ELT|ETL] – Case Study Part 1, The “No Data Model” Data Architecture

Snowflake and ELVT vs [ELT,ETL], Part 2, The ELVT Reference Architecture

Snowflake and ELVT vs [ELT, ETL], Part 1

Snowflake Micro-partition vs Legacy Macro-partition Pruning

社区洞察

其他会员也浏览了

SQL for Data Projects: Key Commands You Need

Unlocking the Power of Semi-Structured Data in Snowflake

Beyond the Basics: Advanced SQL Techniques for Data Analysts

SQL Beyond the Basics: Advanced SQL Techniques

What Are the Steps to Cast INT in SQL for Type Conversion?

Understanding COUNT(*) vs COUNT(1) in SQL: When and How to Use Them

5 Essential SQL Techniques Every Data Analyst Should Master

Breaking Down SQL: Understanding the Difference Between GROUP BY and PARTITION BY

Dynamic SQL for Data Analysts: Tools, Techniques, and Tips

Databricks SQL Series: Advanced Analytics in Databricks SQL — Using Window Functions — Part 3