SQL’s Order of Execution
Order of Execution in SQL

SQL’s Order of Execution

In the world of data, SQL (Structured Query Language) is a widely used tool. It’s a language that is crucial for extracting data, so it’s important to understand how it works if you want to work with data. One important concept to master in SQL is its Order of Execution, which determines how SQL processes and executes queries. Let’s explore and demystify this concept together using simpler terms.

SQL commands might appear linear, but they don’t run from top to bottom. Instead, they follow a specific order of execution that could seem counter-intuitive at first. Let’s explore this.

Here is the standard order of execution:

-- SQL: Order of Execution                            @lasha-dolenjashvili
--------------------------
1. FROM & JOINS
2. WHERE
3. GROUP BY
4. Aggregation functions (SUM(), COUNT(), etc.)
5. HAVING
6. SELECT
7. DISTINCT
8. ORDER BY        

This order is crucial to understanding how SQL queries work.

Deep Dive

Let’s break down each step with a real-world example. Imagine we own a bookstore and we have a database of our book sales.

  1. FROM / JOIN: Select the table(s) where the data is coming from. Here we might select our “sales” table.
  2. WHERE: Filters rows that don’t meet certain criteria. If we’re only interested in sales of science fiction books, our WHERE clause might filter out all other genres.
  3. GROUP BY: Groups selected rows by the values of certain columns. If we want to know total sales by author, we would group by the “author” column.
  4. HAVING: Filters groups that don’t meet certain criteria. If we’re only interested in authors who’ve sold more than 100 books, the HAVING clause will handle this.
  5. SELECT: Specifies columns to be included in the result. Despite being written first, it’s executed after the clauses above. That’s why we can’t use aliases created in SELECT in the previous clauses.
  6. DISTINCT: Removes duplicate rows in the result set.
  7. ORDER BY: Sorts the result set by one or more columns. For instance, we could sort our authors by total sales, either in ascending or descending order.

Case Study

-- Table: employees
+-----+--------+--------+
| id  | name   | dept   |
+-----+--------+--------+
| 1   | Alice  | Sales  |
| 2   | Bob    | HR     |
| 3   | Charlie| Sales  |
| 4   | David  | IT     |
| 5   | Eve    | IT     |
+-----+--------+--------+

-- Table: sales
+----------+------------+
| employee_id | amount  |
+-------------+---------+
|     1       |   100   |
|     1       |   200   |
|     3       |   300   |
|     3       |   400   |
|     5       |   250   |
+-------------+---------+        

Let's analyze the following query.

SELECT name
     , SUM(amount) AS total_sales
  FROM employees
       INNER JOIN sales 
          ON employees.id = sales.employee_id
 WHERE dept = 'Sales'
 GROUP BY name
HAVING total_sales > 250
 ORDER BY total_sales DESC;        

Order of Execution

  • FROM + JOIN: The query first identifies the tables and joins them.

-- Intermediate Result
+---------+--------+-------------+--------+
| name    | dept   | employee_id | amount |
+---------+--------+-------------+--------+
| Alice   | Sales  |     1       |  100   |
| Alice   | Sales  |     1       |  200   |
| Charlie | Sales  |     3       |  300   |
| Charlie | Sales  |     3       |  400   |
| Eve     | IT     |     5       |  250   |
+---------+--------+-------------+--------+        

  • WHERE: Filters out the rows not in the 'Sales' department.

-- Intermediate Result
+---------+--------+-------------+--------+
| name    | dept   | employee_id | amount |
+---------+--------+-------------+--------+
| Alice   | Sales  |      1      |  100   |
| Alice   | Sales  |      1      |  200   |
| Charlie | Sales  |      3      |  300   |
| Charlie | Sales  |      3      |  400   |
+---------+--------+-------------+--------+        

  • GROUP BY + Aggregations Groups by the name column and calculates the total sales for each name.

-- Intermediate Result
+---------+--------------+
| name    | total_sales  |
+---------+--------------+
| Alice   |    300       |
| Charlie |    700       |
+---------+--------------+        

  • HAVING: Filters out rows with total_sales less than or equal to 250.

Intermediate Result (same in this case, since both have sales > 250)
+---------+-------------+
| name    | total_sales |
+---------+-------------+
| Alice   |    300      |
| Charlie |    700      |
+---------+-------------+        

  • SELECT: This is where the actual column selection happens. All previous steps worked on all columns specified in the FROM and JOIN clauses.
  • DISTINCT: No DISTINCT keyword in this query, so no action here.
  • ORDER BY: Sorts the results by total_sales in descending order.

-- Final Result
+---------+--------------+
| name    | total_sales  |
+---------+--------------+
| Charlie |    700       |
| Alice   |    300       |
+---------+--------------+        

Remember, SQL’s order of execution isn’t just trivia — it’s fundamental to writing effective, correct queries.

Thanks for sharing Lasha Dolenjashvili. I don’t get to use SQL very often but I found it extremely useful while working with PySpark.

要查看或添加评论,请登录

Lasha Dolenjashvili的更多文章

社区洞察

其他会员也浏览了