SQL Friday newsletter #5
Magnus Ahlkvist
SQL Server Specialist @ Transmokopter SQL AB | Microsoft Data Platform MVP
The year ended in a very busy way and not many newsletters came out. I'll try to be more consistent about it in 2024. That'll be my new year resolution.
Also, editing and publishing videos from SQL Friday - I'm 15 episodes behind. But that's actually happening right now - videos are being uploaded as I write. Subscribe to https://youtube.com/c/transmokopter to be notified when they become public.
Call for Speakers for SQL Friday season 8
SQL Friday season 8 starts middle of February and Call for Speakers is open. It doesn't matter if you're an experienced speaker or never presented before - as long as you plan to talk about SQL Server, you're most welcome to submit an abstract. If it's your first time presenting, we will even hook you up with an experienced speaker mentor (if you wish of course, it's an offer, not a demand). https://sessionize.com/sql-friday-season-8-february-june-2024 - follow this link and submit your session.
Finding new topics is hard, so let's go back to basics
It's not always easy to find a new topic for SQL-tips, videos or presentations. So today, I'm going back to basics. How is a SQL query executed.
When you write SQL SELECT queries (no matter which dialect, but my examples use T-SQL), you start with the SELECT clause followed by a list of columns or expressions. Then you go on with the FROM clause, JOIN clauses, WHERE clause, GROUP BY, HAVING and ORDER BY. But that's how the query is syntactically written. When SQL Server (or any other RDBMS) executes the query, the order is different.
I hope you will find this useful. I'm planning a video series on SQL basics as well. Maybe if you're a SQL Friday reader, you're already into SQL querying. But not everyone is. I think SQL such a powerful language and I want to try and help more people become curious about SQL. This video series, whenever it happens to be recorded will be available on the YouTube channel mentioned above.
Anyway - order of execution in a SELECT query, here we go.
1. FROM
The first clause to be evaluated in a SQL query is the FROM clause. Transact-SQL is a bit special, because you can write a SELECT query without a FROM clause. In other SQL dialects you can't. But let's assume you're querying one or multiple tables. Then the FROM clause is the first thing that is evaluated. To the FROM clause belongs JOIN and APPLY.
2. WHERE
Predicates in the WHERE clause happen after FROM. This is probably the first pit new SQL developers fall into. Since the SELECT clause is not yet evaluated, you can not use expression names here. If you for example have "Quantity * ItemPrice AS LineTotal", you can't use the alias "LineTotal" in your WHERE clause. It's logic, because the expression alias is not yet evaluated. SQL Server doesn't know about the alias yet when the WHERE clause is evaluated.
One caveat about the WHERE clause. You can't filter by the result of window functions or grouped aggregate functions in the WHERE clause. The later can be handled by the HAVING clause. But filtering by the result of window functions isn't possible in the having clause either. To work around that, I usually wrap my query inside a Common Table Expression (CTE) and then select from the CTE, filtering by the result of window functions calculated inside the CTE. SQL Server will optimize the query such that the predicate is evaluated together with the inner CTE query, but the SQL standard doesn't allow WHERE predicates directly on results of window functions. And Microsoft tries their best to keep Transact-SQL standardised.
领英推荐
3. GROUP BY
Grouping with the GROUP BY clause happens after the WHERE clause. This is logic in a way. The filter predicates on the underlying data set is evaluated before grouping happens. Otherwise, grouping would have to evaluate a whole lot more data than necessary, and the results would be wrong in every case when filtering happens on other columns than the grouping columns. If you plan to put a column into the SELECT list, that column have to be a part of the GROUP BY clause. This also makes sense. Grouping means you loose details, because grouping effectively gives you unique groups based on the expressions you group by, so adding more than these to the SELECT list is an impossible task.
4. HAVING
Having is used for filtering after the grouping has happened. Typically you would use HAVING to filter based on the results of aggregation (sum, min, max etc). You can filter based on the columns you group by, but bare in mind that if you for example group by color and then filter out all colors that aren't "red", grouping happens before and the database engine will have to do a lot of unnecessary work. Filtering by color, I would probably do that with a WHERE predicate instead. Chances are SQL Server will be smart enough to figure this out and perform the filtering before grouping anyway, but logically, HAVING happens after grouping and the technical implementation in the RDBMS of your choice may very well have implemented it technically the same way the SQL standard defines the logical order of execution.
5. SELECT
SELECT list is evaluated as the fifth logical execution step. Which means this is when aliases and expressions become known to the query engine. Before this, you will need to use full column names and expressions.
Stop to think for a bit now. The SELECT-list comes first in a SELECT query. But it's evaluated as the fifth step. If you ever saw a LINQ query, it's syntactically closer to the logical order of execution than a SQL query.
As with all other clauses, SELECT clause is evaluated in an all-at-once fashion. Meaning you can't use alias names defined as part of the SELECT list. An example:
"SELECT Quantity * ItemPrice AS LineTotal, LineTotal * Discount as LineDiscount
FROM ... "
The above is invalid, because when evaluating the expression aliased LineDiscount, the LineTotal is an unknown expression. Correct would be "Quantity * ItemPrice * Discount AS LineDiscount".
6. ORDER BY
Ordering happens after the select list is evaluated. This means you're allowed to ORDER BY alias names. Going back to the example with "Quantity * ItemPrice AS LineTotal", you can use the alias name LineTotal in the ORDER BY clause, because when the ORDER BY clause is evaluated, LineTotal exists. Before this, it doesn't. In the ORDER BY clause, you can use column ordinals instead of names or aliases. "ORDER BY 1" means sort the result by the first column in the SELECT list. I strongly advice you to not use column ordinals for your ordering. It looks messy and it's very easy to make mistakes. What if you change the order of the columns in the SELECT list. Will you always remember to change the column ordinal in the ORDER BY clause accordingly?
7. OFFSET, TOP, LIMIT
In the SQL Standard, OFFSET/FETCH is used to limit the number of rows returned by a query. But different SQL dialects have their own way of doing this. In Transact-SQL we can use TOP to limit the number of results. And TOP has the WITH TIES option too, which OFFSET FETCH doesn't have. However, OFFSET/FETCH has the SKIP option, as a way to implement pagination in a SQL query. I think both have their use cases, but when I don't need to use WITH TIES, I try to use OFFSET/FETCH instead of TOP, simply because OFFSET/FETCH is a part of the standardised SQL language, and TOP isn't.
Helping Business Leaders Drive Growth with Data & Tech | Interim CTO | SQL Server | Power BI | Microsoft Fabric | MCT & MVP | Author & Speaker | Founder @gethynellis.com | CIO @NICSWellbeing
1 年Season 8! Wow. I think I was speaker 4 or 5 in season 1. A great event well worth tuning in.
I make music from the data??Data Mozart ??| MVP Data Platform | O'Reilly Author | Pluralsight Author | MCT
1 年Oh, it's been already season 8! This is amazing, thanks for all you do for the community
Data Platform MVP in Microsoft Fabric Technology area. Co-organizer of the Dutch Fabric User Group and founder of the DataWeekender conference.
1 年Sounds good, might have to submit to the next season of SQL Friday myself.