登录查看更多内容

Learn SQL the Effective Way Part 4 - CTEs and Subqueries

Will Sutton

Iron Viz Champion 2022 ? Tableau Visionary and Consultant

发布日期: 2022年9月22日

Last time we learnt about the importance of writing clean code with some tools to make it easier for us. Now that we're getting more advanced with our queries, we'll learn about Common Table Expressions (CTEs) and subqueries.?

CTEs and Subqueries allow you to write additional queries and use the results in another query. Sounds confusing? Here's an example.

Say I want to find out which F1 drivers set the top 10 fastest laps at the Monaco Grand Prix in 2016 and 2017, how should we do this???Follow along with these queries on data.world

No CTE / Subquery method

Normally we would write queries for 2016 and 2017, save these as tables and join them together to see the result. In doing so we've had to execute 3 queries and create 2 new tables in our database which we may not have permission to do.

CREATE TABLE mgp_2017 as (
  SELECT 
    DRV.forename || ' ' || DRV.surname as fullname, 
    MAX(RES.fastestlapspeed :: decimal) as fastest_lap 
  FROM 
    results AS RES 
    INNER JOIN races AS RAC ON RAC.raceid = RES.raceid 
    INNER JOIN drivers AS DRV ON DRV.driverid = RES.driverid 
  WHERE 
    RES.fastestlapspeed :: decimal IS NOT NULL 
    AND RAC.name = 'Monaco Grand Prix' 
    AND YEAR(RAC.date) = 2017
  GROUP BY 
    DRV.forename || ' ' || DRV.surname 
  ORDER BY 
    MAX(RES.fastestlapspeed :: decimal) DESC 
  LIMIT 
    10
);


CREATE TABLE mgp_2016 as (
  SELECT 
    DRV.forename || ' ' || DRV.surname as fullname, 
    MAX(RES.fastestlapspeed :: decimal) as fastest_lap 
  FROM 
    results AS RES 
    INNER JOIN races AS RAC ON RAC.raceid = RES.raceid 
    INNER JOIN drivers AS DRV ON DRV.driverid = RES.driverid 
  WHERE 
    RES.fastestlapspeed :: decimal IS NOT NULL 
    AND RAC.name = 'Monaco Grand Prix' 
    AND YEAR(RAC.date) = 2017
  GROUP BY 
    DRV.forename || ' ' || DRV.surname 
  ORDER BY 
    MAX(RES.fastestlapspeed :: decimal) DESC 
  LIMIT 
    10
);


SELECT 
  mgp_2017.fullname, 
  mgp_2016.fastest_lap as fastest_lap_2016, 
  mgp_2017.fastest_lap as fastest_lap_2017 
FROM 
  mgp_2017 
  INNER JOIN mgp_2016 ON mgp_2017.fullname = mgp_2016.fullname;

CTE Method

Here we're using CTEs like the tables earlier but not writing them to the database. In doing so we only execute one query, which can be faster than executing several smaller queries in PostgreSQL.


WITH mgp_2017 AS (
  SELECT 
    DRV.forename || ' ' || DRV.surname as fullname, 
    MAX(RES.fastestlapspeed :: decimal) as fastest_lap 
  FROM 
    results AS RES 
    INNER JOIN races AS RAC ON RAC.raceid = RES.raceid 
    INNER JOIN drivers AS DRV ON DRV.driverid = RES.driverid 
  WHERE 
    RES.fastestlapspeed :: decimal IS NOT NULL 
    AND RAC.name = 'Monaco Grand Prix' 
    AND YEAR(RAC.date) = 2017 
  GROUP BY 
    DRV.forename || ' ' || DRV.surname 
  ORDER BY 
    MAX(RES.fastestlapspeed :: decimal) DESC 
  LIMIT 
    10
)
,mgp_2016 AS (
  SELECT 
    DRV.forename || ' ' || DRV.surname as fullname, 
    MAX(RES.fastestlapspeed :: decimal) as fastest_lap 
  FROM 
    results AS RES 
    INNER JOIN races AS RAC ON RAC.raceid = RES.raceid 
    INNER JOIN drivers AS DRV ON DRV.driverid = RES.driverid 
  WHERE 
    RES.fastestlapspeed :: decimal IS NOT NULL 
    AND RAC.name = 'Monaco Grand Prix' 
    AND YEAR(RAC.date) = 2016
  GROUP BY 
    DRV.forename || ' ' || DRV.surname 
  ORDER BY 
    MAX(RES.fastestlapspeed :: decimal) DESC 
  LIMIT 
    10
)

SELECT 
  mgp_2017.fullname, 
  mgp_2016.fastest_lap as fastest_lap_2016, 
  mgp_2017.fastest_lap as fastest_lap_2017 
FROM 
  mgp_2017 
  INNER JOIN mgp_2016 ON mgp_2017.fullname = mgp_2016.fullname

To declare our first CTE I should be of the form:


WITH {CTE Name} AS (
  {Query}
)

for additional CTEs the "WITH" is not required and is replaced with a comma, e.g. ", CTE2 AS (..."

Subquery Method

With subqueries, we return the same query result as the CTE but noticeably different code structure.


SELECT 
  mgp_2017.fullname, 
  mgp_2016.fastest_lap_2016,
  mgp_2017.fastest_lap_2017 
FROM (
    SELECT 
      DRV.forename || ' ' || DRV.surname as fullname, 
      MAX(RES.fastestlapspeed :: decimal) as fastest_lap_2017 
    FROM 
      results AS RES 
      INNER JOIN races AS RAC ON RAC.raceid = RES.raceid 
      INNER JOIN drivers AS DRV ON DRV.driverid = RES.driverid 
    WHERE 
      RES.fastestlapspeed :: decimal IS NOT NULL 
      AND RAC.name = 'Monaco Grand Prix' 
      AND YEAR(RAC.date) = 2017 
    GROUP BY 
      DRV.forename || ' ' || DRV.surname 
    ORDER BY 
      MAX(RES.fastestlapspeed :: decimal) DESC 
    LIMIT 
      10
  ) AS mgp_2017
  INNER JOIN (
    SELECT 
      DRV.forename || ' ' || DRV.surname as fullname, 
      MAX(RES.fastestlapspeed :: decimal) as fastest_lap_2016 
    FROM 
      results AS RES 
      INNER JOIN races AS RAC ON RAC.raceid = RES.raceid 
      INNER JOIN drivers AS DRV ON DRV.driverid = RES.driverid 
    WHERE 
      RES.fastestlapspeed :: decimal IS NOT NULL 
      AND RAC.name = 'Monaco Grand Prix' 
      AND YEAR(RAC.date) = 2016 
    GROUP BY 
      DRV.forename || ' ' || DRV.surname 
    ORDER BY 
      MAX(RES.fastestlapspeed :: decimal) DESC 
    LIMIT 
      10
  ) AS mgp_2016 ON mgp_2016.fullname = mgp_2017.fullname

You can see our subqueries take the place of tables we SELECT FROM or JOIN to, i.e.

领英推荐

Spring Savings is BACK!!

Maven Analytics 9 个月前

Excel LAMBDA Spotlight: Bisected Map with BMAPλ

Owen Price 1 年前

Data Manipulation in SQL

Pintu Kumar Kushwaha 1 年前


FROM {Query} AS alias
LEFT|INNER JOIN {Query} AS alias2 ON alias.id = alias2.id

but there's more, one of the advantages of subqueries is that don't just act as tables, you can use a subquery as a column or part of the WHERE or HAVING clauses, provided they return one value.

Subquery in a WHERE clause

In the example below we're showing all F1 drivers in the 2017 Monaco GP that beat the fastest lap time in 2016 by using a subquery in the where clause.


SELECT?
? DRV.forename || ' ' || DRV.surname as fullname,?
? MAX(RES.fastestlapspeed :: decimal) as fastest_lap_2017?
FROM?
? results AS RES?
? INNER JOIN races AS RAC ON RAC.raceid = RES.raceid?
? INNER JOIN drivers AS DRV ON DRV.driverid = RES.driverid?
WHERE?
? RES.fastestlapspeed :: decimal IS NOT NULL?
? AND RAC.name = 'Monaco Grand Prix'?
? AND YEAR(RAC.date) = 2017?
? AND RES.fastestlapspeed :: decimal > (
? ? SELECT?
? ? ? MAX(fastestlapspeed :: decimal)?
? ? FROM?
? ? ? results AS RES?
? ? ? INNER JOIN races AS RAC ON RAC.raceid = RES.raceid?
? ? WHERE?
? ? ? YEAR(RAC.date) = 2016?
? ? ? AND RAC.name = 'Monaco Grand Prix'
? )?
GROUP BY?
? DRV.forename || ' ' || DRV.surname?
ORDER BY?
? MAX(RES.fastestlapspeed :: decimal) DESC

Which is better? CTE or Subquery

Both techniques can achieve the same results with comparable query times, in different situations, one may perform better than the other.

I generally prefer CTEs because:

1. Easier to read and debug

If your query isn't returning the correct data it's very easy to check what the CTE is doing

We can see what the CTE returns with SELECT * FROM CTE,
and check the query without the CTE by commenting out "--" the CTE JOIN line,

2. More reusable

When we declare and name a CTE we can reuse it in multiple joins, be that in other CTEs or our final SELECT statement.

3. CTEs can be recursive

Recursive CTEs are self-referencing and are very useful for creating a list of values, e.g. 1-10, or dates 2022-01-01 to 2022-12-01. Recursive CTEs are not supported on data.world but running the following query on Snowflake will produce a column of numbers 1-10.


WITH numbers AS (
? SELECT?
? ? 1 as n?
? UNION ALL?
? SELECT?
? ? n + 1 as n?
? FROM?
? ? numbers?
? WHERE?
? ? n + 1 <= 10
)?
SELECT?
? n?
FROM?
? numbers

But I will still use subqueries for:

As a column for a quick total, e.g. the sum of a column or the highest value
As a dynamic filter in a WHERE or HAVING clause, i.e in an earlier example we showed all F1 drivers in the 2017 Monaco GP that beat the fastest lap time in 2016
For speed, as a small one-off query when I'm just exploring the data quickly before rewriting for readability purposes

要查看或添加评论，请登录

Will Sutton的更多文章

An AI Toolkit for Tableau

2024年11月8日

An AI Toolkit for Tableau

As AI becomes increasingly integrated into our daily tools, combining it with Tableau opens up new possibilities. This…

1 条评论
3 Iron Viz Tips from 3 Iron Viz Champions

2024年10月24日

3 Iron Viz Tips from 3 Iron Viz Champions

Iron Viz, the world’s largest data visualization competition is back for another year. Personally speaking, it's a…
The Summer of SQL

2024年9月18日

The Summer of SQL

Bridging the gap from SQL tutorials to real-world applications. I've seen first-hand the benefit of building personal…

2 条评论
Top Tips for Using Einstein Copilot in Tableau

2024年8月30日

Top Tips for Using Einstein Copilot in Tableau

Einstein Copilot is Tableau's new Generative AI power chart-building tool and is now generally available, as of 18…

1 条评论
Make learning more engaging, applicable and accessible with AI

2024年7月26日

Make learning more engaging, applicable and accessible with AI

Generative AI is changing learning by making it more accessible and personalised. Tools like FoondaMate, a chatbot on…
Ready, Set, Pulse. A Tableau Admin's Guide to Enabling Tableau Pulse

2024年2月2日

Ready, Set, Pulse. A Tableau Admin's Guide to Enabling Tableau Pulse

Pulse for Tableau Admins Tableau Pulse is the latest addition from Salesforce, designed to enhance the data analysis…

4 条评论
2023 in Review: The 10 Posts That Reshaped My Understanding of Data and AI

2023年12月22日

2023 in Review: The 10 Posts That Reshaped My Understanding of Data and AI

Here's what my LinkedIn followers engaged with the most across the year. The key themes are all around AI, enablement…

2 条评论
Analyst to Storyteller: How To Craft Memorable Presentations with Storyteller Tactics

2023年11月20日

Analyst to Storyteller: How To Craft Memorable Presentations with Storyteller Tactics

Have you ever felt your technical skills overshadowed your presentation skills? I sure have. One of the hard lessons I…
Train ChatGPT To Be Your Dataviz Co-Pilot

2023年9月9日

Train ChatGPT To Be Your Dataviz Co-Pilot

ChatGPT recently renamed their Code Interpreter as "Advanced Data Analysis" but can it be a data analyst? I've been…
What Does Midjourney Say About Your Data Visualisations?

2023年8月31日

What Does Midjourney Say About Your Data Visualisations?

Midjourney, the generative art tool, is a treasure trove for visual delight and captured our minds for what is possible…

4 条评论

See all articles

Learn SQL the Effective Way Part 4 - CTEs and Subqueries

Will Sutton

Iron Viz Champion 2022 ? Tableau Visionary and Consultant

No CTE / Subquery method

CTE Method

Subquery Method

领英推荐

Subquery in a WHERE clause

Which is better? CTE or Subquery

1. Easier to read and debug

2. More reusable

3. CTEs can be recursive

But I will still use subqueries for:

Will Sutton的更多文章

社区洞察

其他会员也浏览了

UrlCacheEntry Functions ?rne?i

What's Up Wednesday (7-September-2022): Scavenger Hunt Results, Hub Updates, Server API v4, FME Conference Follow-Ups, Transformer Friday, and more...

SQL Day 31/50

What's Up Wednesday

PowerBI Left-Anti Join using DAX(Dynamic)

Write Access Query to Word Bookmark

Merge Query #Power BI

Generics in dart

Base64 encoded Images and the new Power Bi button slicer

Strange Equation Behavior with a Knack Database App

No CTE / Subquery method

CTE Method

Subquery Method

领英推荐

Subquery in a WHERE clause

Which is better? CTE or Subquery

1. Easier to read and debug

2. More reusable

3. CTEs can be recursive

But I will still use subqueries for:

Will Sutton的更多文章

An AI Toolkit for Tableau

3 Iron Viz Tips from 3 Iron Viz Champions

The Summer of SQL

Top Tips for Using Einstein Copilot in Tableau

Make learning more engaging, applicable and accessible with AI

Ready, Set, Pulse. A Tableau Admin's Guide to Enabling Tableau Pulse

2023 in Review: The 10 Posts That Reshaped My Understanding of Data and AI

Analyst to Storyteller: How To Craft Memorable Presentations with Storyteller Tactics

Train ChatGPT To Be Your Dataviz Co-Pilot

What Does Midjourney Say About Your Data Visualisations?

社区洞察

其他会员也浏览了

UrlCacheEntry Functions ?rne?i

What's Up Wednesday (7-September-2022): Scavenger Hunt Results, Hub Updates, Server API v4, FME Conference Follow-Ups, Transformer Friday, and more...

SQL Day 31/50

What's Up Wednesday

PowerBI Left-Anti Join using DAX(Dynamic)

Write Access Query to Word Bookmark

Merge Query #Power BI

Generics in dart

Base64 encoded Images and the new Power Bi button slicer

Strange Equation Behavior with a Knack Database App