Learn SQL the Effective Way Part 4 -  CTEs and Subqueries

Learn SQL the Effective Way Part 4 - CTEs and Subqueries

Last time we learnt about the importance of writing clean code with some tools to make it easier for us. Now that we're getting more advanced with our queries, we'll learn about Common Table Expressions (CTEs) and subqueries.?

CTEs and Subqueries allow you to write additional queries and use the results in another query. Sounds confusing? Here's an example.

Say I want to find out which F1 drivers set the top 10 fastest laps at the Monaco Grand Prix in 2016 and 2017, how should we do this???Follow along with these queries on data.world


No CTE / Subquery method

Normally we would write queries for 2016 and 2017, save these as tables and join them together to see the result. In doing so we've had to execute 3 queries and create 2 new tables in our database which we may not have permission to do.

CREATE TABLE mgp_2017 as (
  SELECT 
    DRV.forename || ' ' || DRV.surname as fullname, 
    MAX(RES.fastestlapspeed :: decimal) as fastest_lap 
  FROM 
    results AS RES 
    INNER JOIN races AS RAC ON RAC.raceid = RES.raceid 
    INNER JOIN drivers AS DRV ON DRV.driverid = RES.driverid 
  WHERE 
    RES.fastestlapspeed :: decimal IS NOT NULL 
    AND RAC.name = 'Monaco Grand Prix' 
    AND YEAR(RAC.date) = 2017
  GROUP BY 
    DRV.forename || ' ' || DRV.surname 
  ORDER BY 
    MAX(RES.fastestlapspeed :: decimal) DESC 
  LIMIT 
    10
);


CREATE TABLE mgp_2016 as (
  SELECT 
    DRV.forename || ' ' || DRV.surname as fullname, 
    MAX(RES.fastestlapspeed :: decimal) as fastest_lap 
  FROM 
    results AS RES 
    INNER JOIN races AS RAC ON RAC.raceid = RES.raceid 
    INNER JOIN drivers AS DRV ON DRV.driverid = RES.driverid 
  WHERE 
    RES.fastestlapspeed :: decimal IS NOT NULL 
    AND RAC.name = 'Monaco Grand Prix' 
    AND YEAR(RAC.date) = 2017
  GROUP BY 
    DRV.forename || ' ' || DRV.surname 
  ORDER BY 
    MAX(RES.fastestlapspeed :: decimal) DESC 
  LIMIT 
    10
);


SELECT 
  mgp_2017.fullname, 
  mgp_2016.fastest_lap as fastest_lap_2016, 
  mgp_2017.fastest_lap as fastest_lap_2017 
FROM 
  mgp_2017 
  INNER JOIN mgp_2016 ON mgp_2017.fullname = mgp_2016.fullname;
        


CTE Method

Here we're using CTEs like the tables earlier but not writing them to the database. In doing so we only execute one query, which can be faster than executing several smaller queries in PostgreSQL.


WITH mgp_2017 AS (
  SELECT 
    DRV.forename || ' ' || DRV.surname as fullname, 
    MAX(RES.fastestlapspeed :: decimal) as fastest_lap 
  FROM 
    results AS RES 
    INNER JOIN races AS RAC ON RAC.raceid = RES.raceid 
    INNER JOIN drivers AS DRV ON DRV.driverid = RES.driverid 
  WHERE 
    RES.fastestlapspeed :: decimal IS NOT NULL 
    AND RAC.name = 'Monaco Grand Prix' 
    AND YEAR(RAC.date) = 2017 
  GROUP BY 
    DRV.forename || ' ' || DRV.surname 
  ORDER BY 
    MAX(RES.fastestlapspeed :: decimal) DESC 
  LIMIT 
    10
)
,mgp_2016 AS (
  SELECT 
    DRV.forename || ' ' || DRV.surname as fullname, 
    MAX(RES.fastestlapspeed :: decimal) as fastest_lap 
  FROM 
    results AS RES 
    INNER JOIN races AS RAC ON RAC.raceid = RES.raceid 
    INNER JOIN drivers AS DRV ON DRV.driverid = RES.driverid 
  WHERE 
    RES.fastestlapspeed :: decimal IS NOT NULL 
    AND RAC.name = 'Monaco Grand Prix' 
    AND YEAR(RAC.date) = 2016
  GROUP BY 
    DRV.forename || ' ' || DRV.surname 
  ORDER BY 
    MAX(RES.fastestlapspeed :: decimal) DESC 
  LIMIT 
    10
)

SELECT 
  mgp_2017.fullname, 
  mgp_2016.fastest_lap as fastest_lap_2016, 
  mgp_2017.fastest_lap as fastest_lap_2017 
FROM 
  mgp_2017 
  INNER JOIN mgp_2016 ON mgp_2017.fullname = mgp_2016.fullname
        

To declare our first CTE I should be of the form:


WITH {CTE Name} AS (
  {Query}
)          

for additional CTEs the "WITH" is not required and is replaced with a comma, e.g. ", CTE2 AS (..."


Subquery Method

With subqueries, we return the same query result as the CTE but noticeably different code structure.


SELECT 
  mgp_2017.fullname, 
  mgp_2016.fastest_lap_2016,
  mgp_2017.fastest_lap_2017 
FROM (
    SELECT 
      DRV.forename || ' ' || DRV.surname as fullname, 
      MAX(RES.fastestlapspeed :: decimal) as fastest_lap_2017 
    FROM 
      results AS RES 
      INNER JOIN races AS RAC ON RAC.raceid = RES.raceid 
      INNER JOIN drivers AS DRV ON DRV.driverid = RES.driverid 
    WHERE 
      RES.fastestlapspeed :: decimal IS NOT NULL 
      AND RAC.name = 'Monaco Grand Prix' 
      AND YEAR(RAC.date) = 2017 
    GROUP BY 
      DRV.forename || ' ' || DRV.surname 
    ORDER BY 
      MAX(RES.fastestlapspeed :: decimal) DESC 
    LIMIT 
      10
  ) AS mgp_2017
  INNER JOIN (
    SELECT 
      DRV.forename || ' ' || DRV.surname as fullname, 
      MAX(RES.fastestlapspeed :: decimal) as fastest_lap_2016 
    FROM 
      results AS RES 
      INNER JOIN races AS RAC ON RAC.raceid = RES.raceid 
      INNER JOIN drivers AS DRV ON DRV.driverid = RES.driverid 
    WHERE 
      RES.fastestlapspeed :: decimal IS NOT NULL 
      AND RAC.name = 'Monaco Grand Prix' 
      AND YEAR(RAC.date) = 2016 
    GROUP BY 
      DRV.forename || ' ' || DRV.surname 
    ORDER BY 
      MAX(RES.fastestlapspeed :: decimal) DESC 
    LIMIT 
      10
  ) AS mgp_2016 ON mgp_2016.fullname = mgp_2017.fullname
         

You can see our subqueries take the place of tables we SELECT FROM or JOIN to, i.e.


FROM {Query} AS alias
LEFT|INNER JOIN {Query} AS alias2 ON alias.id = alias2.id
        

but there's more, one of the advantages of subqueries is that don't just act as tables, you can use a subquery as a column or part of the WHERE or HAVING clauses, provided they return one value.

Subquery in a WHERE clause

In the example below we're showing all F1 drivers in the 2017 Monaco GP that beat the fastest lap time in 2016 by using a subquery in the where clause.


SELECT?
? DRV.forename || ' ' || DRV.surname as fullname,?
? MAX(RES.fastestlapspeed :: decimal) as fastest_lap_2017?
FROM?
? results AS RES?
? INNER JOIN races AS RAC ON RAC.raceid = RES.raceid?
? INNER JOIN drivers AS DRV ON DRV.driverid = RES.driverid?
WHERE?
? RES.fastestlapspeed :: decimal IS NOT NULL?
? AND RAC.name = 'Monaco Grand Prix'?
? AND YEAR(RAC.date) = 2017?
? AND RES.fastestlapspeed :: decimal > (
? ? SELECT?
? ? ? MAX(fastestlapspeed :: decimal)?
? ? FROM?
? ? ? results AS RES?
? ? ? INNER JOIN races AS RAC ON RAC.raceid = RES.raceid?
? ? WHERE?
? ? ? YEAR(RAC.date) = 2016?
? ? ? AND RAC.name = 'Monaco Grand Prix'
? )?
GROUP BY?
? DRV.forename || ' ' || DRV.surname?
ORDER BY?
? MAX(RES.fastestlapspeed :: decimal) DESC
        

Which is better? CTE or Subquery

Both techniques can achieve the same results with comparable query times, in different situations, one may perform better than the other.

I generally prefer CTEs because:

1. Easier to read and debug

If your query isn't returning the correct data it's very easy to check what the CTE is doing

  • We can see what the CTE returns with SELECT * FROM CTE,
  • and check the query without the CTE by commenting out "--" the CTE JOIN line,

2. More reusable

When we declare and name a CTE we can reuse it in multiple joins, be that in other CTEs or our final SELECT statement.

3. CTEs can be recursive

Recursive CTEs are self-referencing and are very useful for creating a list of values, e.g. 1-10, or dates 2022-01-01 to 2022-12-01. Recursive CTEs are not supported on data.world but running the following query on Snowflake will produce a column of numbers 1-10.


WITH numbers AS (
? SELECT?
? ? 1 as n?
? UNION ALL?
? SELECT?
? ? n + 1 as n?
? FROM?
? ? numbers?
? WHERE?
? ? n + 1 <= 10
)?
SELECT?
? n?
FROM?
? numbers
        

But I will still use subqueries for:

  1. As a column for a quick total, e.g. the sum of a column or the highest value
  2. As a dynamic filter in a WHERE or HAVING clause, i.e in an earlier example we showed all F1 drivers in the 2017 Monaco GP that beat the fastest lap time in 2016
  3. For speed, as a small one-off query when I'm just exploring the data quickly before rewriting for readability purposes

要查看或添加评论,请登录

Will Sutton的更多文章

社区洞察

其他会员也浏览了