Using Multiple Common Table Expressions In One SQL Query In PostgreSQL
Today I learned that you can have multiple CTEs within a single PostgreSQL query. And, that these CTEs can reference each other. So freaking cool!
To demonstrate, I'm going to build-up a SQL query using common table expressions. First, we'll start with a list of fake email addresses. This is something that I often have to do, copy-pasting them from a CSV (Comma Separated Values) file. The following CTE just gets the email addresses into a consumable format:
WITH
emails (email) AS (
VALUES
('[email protected]'),
('[email protected]'),
('[email protected]'),
('[email protected]'),
('[email protected]'),
('[email protected]'),
('[email protected]'),
('[email protected]')
)
SELECT * FROM emails
This creates a derived table, emails, with a single column, email. Common table expressions can be referenced by the main SQL statement. You can see the result when I select it:
they can also be referenced by other CTEs in the same query. And that's exactly what we'll do next—create another CTE that builds upon the prior CTE and extracts the domain from each row:
WITH.
-- ... truncated SQL query ...
deconstructed (email, domain) AS (
SELECT
e.email,
SUBSTRING(e.email FROM POSITION('@' IN e.email) + 1) AS domain
FROM
emails e -- CTE reference!
)
SELECT * FROM deconstructed
As you can see, this CTE is querying from the previous CTE, emails. Then, it's using the SUBSTRING() function to parse the domain out of each email, creating yet another derived table / CTE.
Result:
Next, we'll use this new CTE to create another CTE which groups the emails by domain and records the COUNT() statistics:
WITH
-- ... truncated SQL query ...
stats ( `domain`, `emailCount` ) AS (
SELECT
d.domain,
COUNT( * ) AS emailCount
FROM
deconstructed d -- CTE reference!
GROUP BY
d.domain
)
SELECT * FROM stats
As you can see, this CTE is querying from the previous CTE, deconstructed.
Result:
Next, we'll create another CTE which again references the previous CTE, deconstructed. But, this time, instead of getting the count, we'll collect the email addresses—per domain—into a JSON aggregation.
领英推荐
Note: I could have combined the following CTE with the previous CTE and used both the COUNT(*) and the JSON_ARRAYAGG() in the same query; but, splitting them up allowed me to explore the space a bit more.
WITH
-- ... truncated SQL query ...
aggregated (domain, collection) AS (
SELECT
d.domain,
json_agg(d.email) AS collection
FROM
deconstructed d -- CTE reference!
GROUP BY
d.domain
)
SELECT * FROM aggregated
So far, we've only created CTEs that reference other CTEs. But, these CTEs merely setup utility tables, they don't actually return data to the client
Result:
Now it's time to write the actual SQL that returns actual data. This SQL will combine multiple CTEs from above:
WITH
-- ... truncated SQL query ...
SELECT
s.domain,
s.emailCount,
a.collection
FROM
stats s -- CTE reference!
INNER JOIN
aggregated a -- CTE reference!
ON
a.domain = s.domain
ORDER BY
s.emailCount DESC,
s.domain ASC
;
As you can see, we're taking two of the CTEs, stats and aggregated, and we're JOINing them together.
It's just CTEs all the way down! We've looked at the individual parts in isolation. Now, here's the entire SQL query in one query:
WITH
-- First, we'll start with a common table expression (CTE) for the email addresses.
emails (email) AS (
VALUES
('[email protected]'),
('[email protected]'),
('[email protected]'),
('[email protected]'),
('[email protected]'),
('[email protected]'),
('[email protected]'),
('[email protected]')
),
-- Second, we'll create a common table expression (CTE) that extracts the domain from
-- the email address.
deconstructed (email, domain) AS (
SELECT
e.email,
SUBSTRING(e.email FROM POSITION('@' IN e.email) + 1) AS domain
FROM
emails e -- CTE reference!
),
-- Third, we'll create a common table expression (CTE) that provides some stats for
-- how many emails belong to each domain.
stats (domain, emailCount) AS (
SELECT
d.domain,
COUNT( * ) AS emailCount
FROM
deconstructed d -- CTE reference!
GROUP BY
d.domain
),
-- Fourth, we'll create a common table expression (CTE) that groups the emails by
-- domain and provides the collection of emails as a JSON payload.
aggregated (domain, collection) AS (
SELECT
d.domain,
json_agg(d.email) AS collection
FROM
deconstructed d -- CTE reference!
GROUP BY
d.domain
)
SELECT
s.domain,
s.emailCount,
a.collection
FROM
stats s -- CTE reference!
INNER JOIN
aggregated a -- CTE reference!
ON
a.domain = s.domain
ORDER BY
s.emailCount DESC,
s.domain ASC
;
Result:
When it comes to writing basic business applications, I don't need these sassy SQL mechanics. But, when it comes to reporting and other types of one-off data introspection tasks, features like this, iteratively enhanced common table expressions, and JSON aggregations are just amazing! SQL is a truly wonderful language.
??Software Engineer, AI and Algorithms Enthusiast.
9 个月n?u mà ph?i dùng ?i dùng l?i cte thì nên xài View hay Materialized View vì mình có th? t?o thêm index ph? giúp cho vi?c join nhanh h?n.