SQL 101: Working with CTE
A Common Table Expression, often referred to as a CTE, is a temporary named result set that you can reference within a SELECT, INSERT, UPDATE, or DELETE statement. It allows you to define a subquery that can be referenced multiple times within the same query, enhancing readability and reusability.
Here’s an example that demonstrates the basic syntax of a CTE:
WITH cte_name (column1, column2, ...) AS (
-- CTE query definition
SELECT column1, column2, ...
FROM table
WHERE conditions
)
-- Main query using the CTE
SELECT *
FROM cte_name;
Now let’s explore a few scenarios where CTEs can be beneficial:
Syntax:
WITH RECURSIVE cte_name (column1, column2, ...) AS (
-- Anchor member
SELECT column1, column2, ...
FROM table
WHERE conditions
UNION ALL
-- Recursive member
SELECT column1, column2, ...
FROM table
WHERE conditions
AND column1 IN (SELECT column1 FROM cte_name)
)
SELECT *
FROM cte_name;
Question: Consider a table called “Categories” with the following structure:
The parent_category_id represents the ID of the parent category for each category. It is NULL for top-level categories.
Write an SQL query using a Recursive CTE to retrieve the full path of each category, starting from the top-level category and including all parent categories. The result should include the category name and the full path.
Solution:
WITH RecursiveCTE AS (
SELECT category_id, category_name, parent_category_id, CAST(category_name AS varchar(MAX)) AS full_path
FROM Categories
WHERE parent_category_id IS NULL
UNION ALL
SELECT c.category_id, c.category_name, c.parent_category_id, CONCAT(rc.full_path, ' > ', c.category_name)
FROM Categories c
INNER JOIN RecursiveCTE rc ON c.parent_category_id = rc.category_id
)
SELECT category_name, full_path
FROM RecursiveCTE;
Explanation:
-In this solution, we use a Recursive CTE to retrieve the full path of each category. The initial part of the CTE selects the top-level categories by filtering the ones with a NULL parent_category_id.
-The recursive part of the CTE joins the Categories table with the RecursiveCTE using the parent_category_id to find the parent categories for each category. The full_path column is constructed by concatenating the parent category’s full path with the current category name.
-The RecursiveCTE has four columns: category_id, category_name, parent_category_id, and full_path. The full_path column holds the complete path of each category.
-Finally, the main query selects the category_name and full_path columns from the RecursiveCTE to display the category names and their corresponding full paths.
This example demonstrates how a Recursive CTE can be used to traverse hierarchical data and build a complete path for each category, enabling powerful querying and analysis of hierarchical structures.
2. Multi-level CTEs: You can also use multiple CTEs within a single query, allowing you to break down complex logic into more manageable parts.
Syntax:
WITH cte1 AS (
SELECT column1, column2, ...
FROM table1
WHERE conditions
),
cte2 AS (
SELECT column1, column2, ...
FROM table2
WHERE conditions
)
SELECT *
FROM cte1
JOIN cte2 ON cte1.column = cte2.column;
Question: Consider a table called “Employees” with the following structure:
Write an SQL query using multi-level CTEs to retrieve the hierarchical structure of employees, including their managers and managers’ managers. The result should include the employee name, manager name, and grandmanager name. Assume that the top-level managers have a NULL manager_id.
Solution:
WITH RecursiveCTE AS (
SELECT employee_id, employee_name, manager_id, CAST(employee_name AS varchar(MAX)) AS hierarchy
FROM Employees
WHERE manager_id IS NULL
UNION ALL
SELECT e.employee_id, e.employee_name, e.manager_id, CONCAT(rc.hierarchy, ' > ', e.employee_name)
FROM Employees e
INNER JOIN RecursiveCTE rc ON e.manager_id = rc.employee_id
), ManagerCTE AS (
SELECT employee_id, employee_name, manager_id
FROM RecursiveCTE
), GrandmanagerCTE AS (
SELECT m1.employee_id, m1.employee_name, m2.employee_name AS manager_name
FROM ManagerCTE m1
LEFT JOIN ManagerCTE m2 ON m1.manager_id = m2.employee_id
)
SELECT m.employee_name, gm.manager_name, gm.employee_name AS grandmanager_name
FROM ManagerCTE m
LEFT JOIN GrandmanagerCTE gm ON m.manager_id = gm.employee_id;
Explanation:
-The first CTE, RecursiveCTE, is a recursive CTE that retrieves the employee hierarchy. It starts with the top-level managers (where manager_id is NULL) and recursively joins the Employees table to fetch the subordinates for each manager.
-The second CTE, ManagerCTE, selects the employee_id, employee_name, and manager_id columns from the RecursiveCTE. It represents the direct managers of each employee.
-The third CTE, GrandmanagerCTE, joins the ManagerCTE with itself to fetch the grandmanagers (managers’ managers). It selects the employee_id and employee_name from the ManagerCTE as well as the manager_name from a self-join on the ManagerCTE.
-Finally, the main query selects the employee_name from the ManagerCTE and joins it with the GrandmanagerCTE on the manager_id to retrieve the manager name and grandmanager name.
Using multi-level CTEs allows us to break down the hierarchical query into separate logical parts, enhancing readability and maintainability.
3. CTEs for Data Transformation: CTEs can be useful for data transformation operations, such as aggregations, filtering, or joining multiple tables.
Syntax:
WITH cte1 AS (
SELECT column1, SUM(column2) AS total
FROM table1
GROUP BY column1
),
cte2 AS (
SELECT column1, AVG(column3) AS average
FROM table2
GROUP BY column1
)
SELECT cte1.column1, cte1.total, cte2.average
FROM cte1
JOIN cte2 ON cte1.column1 = cte2.column1;
Question: Consider a table called “Orders” with the following structure:
领英推荐
Write an SQL query using CTEs to transform the data by calculating the average order amount for each customer and retrieving the customer details along with their average order amount. The result should include the customer_id, customer_name, and average_order_amount.
Solution:
WITH CustomerOrders AS (
SELECT customer_id, AVG(total_amount) AS average_order_amount
FROM Orders
GROUP BY customer_id
), CustomerDetails AS (
SELECT customer_id, customer_name
FROM Customers
)
SELECT c.customer_id, cd.customer_name, co.average_order_amount
FROM CustomerOrders co
JOIN CustomerDetails cd ON co.customer_id = cd.customer_id;
Explanation:
In this solution, we use two CTEs for data transformation.
-The first CTE, CustomerOrders, calculates the average order amount for each customer by grouping the orders by customer_id and using the AVG aggregation function on the total_amount column.
-The second CTE, CustomerDetails, selects the customer_id and customer_name from the Customers table.
-Finally, the main query joins the CustomerOrders CTE with the CustomerDetails CTE on the customer_id column to retrieve the customer details (customer_id and customer_name) along with their average order amount.
-By using CTEs, we can break down the transformation steps into separate logical parts, making the query more readable and maintainable.
Few more examples to get?going
Question 1: Consider a table called “Orders” with the following structure:
Write an SQL query using a CTE to calculate the total sales amount for each month in a given year, along with the corresponding month and year. The result should be sorted by the year and month in ascending order.
Solution:
WITH monthly_sales AS (
SELECT
EXTRACT(YEAR FROM order_date) AS sales_year,
EXTRACT(MONTH FROM order_date) AS sales_month,
SUM(total_amount) AS total_sales
FROM Orders
WHERE EXTRACT(YEAR FROM order_date) = 2023 -- Specify the desired year here
GROUP BY sales_year, sales_month
)
SELECT sales_year, sales_month, total_sales
FROM monthly_sales
ORDER BY sales_year, sales_month;
Question 2: Consider two tables, “Employees” and “Salaries,” with the following structure:
Employees table:
Salaries table:
Write an SQL query using a CTE to retrieve the employee names, their department names, and the average salary for each department. The result should be sorted by the department name in ascending order.
Solution:
WITH avg_salaries AS (
SELECT
e.employee_name,
e.department_id,
AVG(s.salary_amount) AS avg_salary
FROM Employees e
JOIN Salaries s ON e.employee_id = s.employee_id
GROUP BY e.employee_name, e.department_id
),
departments AS (
SELECT
department_id,
department_name
FROM Departments -- Assuming there is a table called "Departments" with department_id and department_name columns
)
SELECT a.employee_name, d.department_name, a.avg_salary
FROM avg_salaries a
JOIN departments d ON a.department_id = d.department_id
ORDER BY d.department_name;
Question 3: Consider a table called “Students” with the following structure:
Write an SQL query using a CTE to calculate the average enrollment year for each major, excluding majors with less than 5 students. The result should be sorted by the average enrollment year in descending order.
Solution:
WITH major_avg_enrollment AS (
SELECT
major,
AVG(enrollment_year) AS avg_enrollment_year,
COUNT(student_id) AS student_count
FROM Students
GROUP BY major
HAVING COUNT(student_id) >= 5
)
SELECT major, avg_enrollment_year
FROM major_avg_enrollment
ORDER BY avg_enrollment_year DESC;
Summary
In summary, Common Table Expressions (CTEs) are a powerful feature in SQL that provide a way to break down complex queries into smaller, more manageable parts. CTEs offer several benefits, including:
1. Readability: CTEs enhance query readability by allowing queries to be organized into logical sections. Each CTE can focus on a specific task, making the code easier to understand and maintain.
2. Modularity: CTEs enable a step-by-step approach to query building. Each CTE can be designed to perform a specific operation, such as filtering, aggregating, or transforming data, resulting in cleaner and more modular SQL code.
3. Recursive Operations: Recursive CTEs are particularly useful for working with hierarchical data structures. They enable efficient traversal of hierarchical relationships, such as organizational charts or nested categories, without the need for complex joins or procedural logic.
4. Code Reusability: CTEs act as temporary named result sets that can be referenced multiple times within a query. This promotes code reusability by allowing complex subqueries or intermediate result sets to be defined once and utilized in multiple parts of the main query, reducing redundancy and improving performance.
By leveraging CTEs, developers can enhance the readability, modularity, and reusability of their SQL code, leading to more maintainable and efficient queries. Whether it’s for recursive operations, data transformation, or breaking down complex logic, CTEs provide a valuable tool for working with complex SQL queries.
Also, Read