Using Multiple Common Table Expressions In One SQL Query In PostgreSQL

Using Multiple Common Table Expressions In One SQL Query In PostgreSQL

Today I learned that you can have multiple CTEs within a single PostgreSQL query. And, that these CTEs can reference each other. So freaking cool!

To demonstrate, I'm going to build-up a SQL query using common table expressions. First, we'll start with a list of fake email addresses. This is something that I often have to do, copy-pasting them from a CSV (Comma Separated Values) file. The following CTE just gets the email addresses into a consumable format:

WITH 
    emails (email) AS (
    VALUES
        ('[email protected]'),
        ('[email protected]'),
        ('[email protected]'),
        ('[email protected]'),
        ('[email protected]'),
        ('[email protected]'),
        ('[email protected]'),
        ('[email protected]')
)

SELECT * FROM emails        

This creates a derived table, emails, with a single column, email. Common table expressions can be referenced by the main SQL statement. You can see the result when I select it:

they can also be referenced by other CTEs in the same query. And that's exactly what we'll do next—create another CTE that builds upon the prior CTE and extracts the domain from each row:

WITH. 
       -- ... truncated SQL query ...
      deconstructed (email, domain) AS (
		SELECT
			e.email,
			SUBSTRING(e.email FROM POSITION('@' IN e.email) + 1) AS domain
		FROM
			emails e -- CTE reference!

)

SELECT * FROM deconstructed        

As you can see, this CTE is querying from the previous CTE, emails. Then, it's using the SUBSTRING() function to parse the domain out of each email, creating yet another derived table / CTE.

Result:

Next, we'll use this new CTE to create another CTE which groups the emails by domain and records the COUNT() statistics:

WITH
	-- ... truncated SQL query ...
	stats ( `domain`, `emailCount` ) AS (

		SELECT
			d.domain,
			COUNT( * ) AS emailCount
		FROM
			deconstructed d -- CTE reference!
		GROUP BY
			d.domain

	)

SELECT * FROM stats        

As you can see, this CTE is querying from the previous CTE, deconstructed.

Result:


Next, we'll create another CTE which again references the previous CTE, deconstructed. But, this time, instead of getting the count, we'll collect the email addresses—per domain—into a JSON aggregation.

Note: I could have combined the following CTE with the previous CTE and used both the COUNT(*) and the JSON_ARRAYAGG() in the same query; but, splitting them up allowed me to explore the space a bit more.

WITH 
-- ... truncated SQL query ...
     aggregated (domain, collection) AS (
        SELECT
            d.domain,
            json_agg(d.email) AS collection
        FROM
            deconstructed d -- CTE reference!
        GROUP BY
            d.domain
    )

SELECT * FROM aggregated        

So far, we've only created CTEs that reference other CTEs. But, these CTEs merely setup utility tables, they don't actually return data to the client

Result:

Now it's time to write the actual SQL that returns actual data. This SQL will combine multiple CTEs from above:

WITH
	-- ... truncated SQL query ...
SELECT
	s.domain,
	s.emailCount,
	a.collection
FROM
	stats s -- CTE reference!
INNER JOIN
	aggregated a -- CTE reference!
ON
	a.domain = s.domain
ORDER BY
	s.emailCount DESC,
	s.domain ASC
;        

As you can see, we're taking two of the CTEs, stats and aggregated, and we're JOINing them together.


It's just CTEs all the way down! We've looked at the individual parts in isolation. Now, here's the entire SQL query in one query:

WITH 
   -- First, we'll start with a common table expression (CTE) for the email addresses.
    emails (email) AS (
    VALUES
        ('[email protected]'),
        ('[email protected]'),
        ('[email protected]'),
        ('[email protected]'),
        ('[email protected]'),
        ('[email protected]'),
        ('[email protected]'),
        ('[email protected]')
), 
   -- Second, we'll create a common table expression (CTE) that extracts the domain from
   -- the email address.
   deconstructed (email, domain) AS (
		SELECT
			e.email,
			SUBSTRING(e.email FROM POSITION('@' IN e.email) + 1) AS domain
		FROM
			emails e -- CTE reference!

), 
   -- Third, we'll create a common table expression (CTE) that provides some stats for
   -- how many emails belong to each domain.
   stats (domain, emailCount) AS (

		SELECT
			d.domain,
			COUNT( * ) AS emailCount
		FROM
			deconstructed d -- CTE reference!
		GROUP BY
			d.domain

), 
   -- Fourth, we'll create a common table expression (CTE) that groups the emails by
   -- domain and provides the collection of emails as a JSON payload.
   aggregated (domain, collection) AS (
        SELECT
            d.domain,
            json_agg(d.email) AS collection
        FROM
            deconstructed d -- CTE reference!
        GROUP BY
            d.domain
)


SELECT
	s.domain,
	s.emailCount,
	a.collection
FROM
	stats s -- CTE reference!
INNER JOIN
	aggregated a -- CTE reference!
ON
	a.domain = s.domain
ORDER BY
	s.emailCount DESC,
	s.domain ASC
;        

Result:

When it comes to writing basic business applications, I don't need these sassy SQL mechanics. But, when it comes to reporting and other types of one-off data introspection tasks, features like this, iteratively enhanced common table expressions, and JSON aggregations are just amazing! SQL is a truly wonderful language.


You Might Also Enjoy Some of My Other Posts

Tùng Ph?m

??Software Engineer, AI and Algorithms Enthusiast.

9 个月

n?u mà ph?i dùng ?i dùng l?i cte thì nên xài View hay Materialized View vì mình có th? t?o thêm index ph? giúp cho vi?c join nhanh h?n.

回复

要查看或添加评论,请登录

Steve Loc的更多文章

社区洞察

其他会员也浏览了