SQL Order of Operations - Extended

SQL Order of Operations - Extended

Ever wondered, ?? why you're not able to use the window functions in the WHERE clause, or why you're not able to use the column aliases in the WHERE, GROUP BY, and HAVING clauses?

The order of operations in the SQL query execution can help us understand.

Let's go through the order of operations while solving an SQL interview question.

You have an orders table that contains the order_id, customer_id, and the order_status ("canceled", "delivered") of a particular order. You have another table customers that contains customer_id and customer_name.

To-Do: You have to get the top 3 customers with the most number of canceled orders and also the top 3 customers with the most number of delivered orders. You have to return the status (canceled/delivered), customer_name, and rank (by number of canceled/delivered orders).

Solution: ??The solution is made according to explain the order of operations. There is a better solution too which I'm going to leave up to you to think of.

SELECT 'canceled' AS status, c.customer_name,
    ROW_NUMBER() OVER(ORDER BY COUNT(*) DESC) AS rank
FROM orders o
JOIN customers c USING(customer_id)
WHERE order_status='canceled'
GROUP BY c.customer_name

UNION ALL

SELECT 'passed' AS status, c.customer_name,
    ROW_NUMBER() OVER(ORDER BY COUNT(*) DESC) AS rank
FROM orders o
JOIN customers c USING(customer_id)
WHERE order_status='delivered'
GROUP BY c.customer_name

ORDER BY rank
FETCH NEXT 6 ROWS ONLY        

Now, let's understand the solution while explaining the order of operations.

FROM, JOIN

The first operation is to get and prepare the required data i.e., tables using the FROM and JOIN clause.

In view of our question, the orders table will be inner joined with the customers table in both statements.

WHERE

After preparing the required data, we have to filter the data, if needed. In our case, we will filter out the data based on the order_status in both the statements before and after the UNION ALL clause.

GROUP BY

Following filtering, Aggregation is applied to the filtered data using the GROUP BY clause. The required aggregation is performed on the filtered data.

In the question, we are applying the aggregation by the customer_name and calculating the total number of orders using the COUNT(*).

HAVING

After we have conducted the aggregation by the required field, the aggregated data can be filtered using the HAVING clause. Just like we do filtering using WHERE in the prepared data, we do filtering using HAVING in the aggregated data.

OVER (e.g. Window Functions)

The window functions are operated after the aggregation has been done.

You may have encountered the error while learning window functions that the window functions can't be used in the WHERE clause. Since the WHERE clause is executed much before the execution of the window functions, how you are going to use the window functions that have not even been calculated yet?

In our question, we have used the ROW_NUMBER() function to calculate the rank of the customers by the total number of canceled/delivered orders.

SELECT

The required columns are picked up from the filtered aggregated data. The column aliasing is also performed here only. So, we can't use the column aliases in the steps which are executed before SELECT.

We have selected three columns in our question, which are status, customer_name, and rank.

DISTINCT

DISTINCT is executed after selecting the required columns.

UNION, INTERSECT, EXCEPT (Set Operations)

After selecting the required columns the records are combined/reduced based on the set operations provided.

In our question, the customers associated with the canceled orders are combined with the customers associated with the delivered orders using the UNION ALL clause.

ORDER BY

The records obtained after performing the set operations are then sorted accordingly. The sorting is always performed after the set operations.

In our question, after set operations, we have customers associated with the canceled orders as well as those associated with delivered orders along with their respective ranks. We are sorting the customers by their rank in increasing order.

OFFSET

The OFFSET clause is executed after the sorting is completed. OFFSET simply shifted the starting of output to the said row number. With this, you can skip a certain number of records from the top and let's say return the records from the 6th row skipping the first 5 rows.

LIMIT, TOP, FETCH FIRST/NEXT ROWS ONLY

These above clauses are used to get only the required number of rows.

In our question, we have used FETCH NEXT 6 ROWS ONLY. This will give us only the first 6 rows of the sorted data, which will contain the top 3 ranks from both the categories (canceled and delivered).


That's all about the order of operations. If you have any questions or suggestions, kindly ping me on LinkedIn to discuss. I will be happy to help you in case of any query or learn in case of any suggestion.


I'm going to end this article with a question to ponder about:

How the query be executed if I had used the ORDER BY clause in the first query also? Refer to the below query for a better understanding of the question.

SELECT 'canceled' AS status, c.customer_name,
    ROW_NUMBER() OVER(ORDER BY COUNT(*) DESC) AS rank
FROM orders o
JOIN customers c USING(customer_id)
WHERE order_status='canceled'
GROUP BY c.customer_name
ORDER BY c.customer_name

UNION ALL

SELECT 'passed' AS status, c.customer_name,
    ROW_NUMBER() OVER(ORDER BY COUNT(*) DESC) AS rank
FROM orders o
JOIN customers c USING(customer_id)
WHERE order_status='delivered'
GROUP BY c.customer_name
ORDER BY rank

FETCH NEXT 6 ROWS ONLY        

References:

Vlad Mihalcea - SQL Operation oOrder

Unsplash - Markus Spiske




Muskan Kashyap

Google Certified Business Intelligence & Data Analyst | Python, SQL, R, Tableau, Power BI, Excel | Mathematician & Researcher

1 年

This is the question I had in my mind when I just started working with SQL. Thank you for sharing this!

要查看或添加评论,请登录

社区洞察

其他会员也浏览了