SQL Friday newsletter #5

SQL Friday newsletter #5

The year ended in a very busy way and not many newsletters came out. I'll try to be more consistent about it in 2024. That'll be my new year resolution.

Also, editing and publishing videos from SQL Friday - I'm 15 episodes behind. But that's actually happening right now - videos are being uploaded as I write. Subscribe to https://youtube.com/c/transmokopter to be notified when they become public.

Call for Speakers for SQL Friday season 8

SQL Friday season 8 starts middle of February and Call for Speakers is open. It doesn't matter if you're an experienced speaker or never presented before - as long as you plan to talk about SQL Server, you're most welcome to submit an abstract. If it's your first time presenting, we will even hook you up with an experienced speaker mentor (if you wish of course, it's an offer, not a demand). https://sessionize.com/sql-friday-season-8-february-june-2024 - follow this link and submit your session.

Finding new topics is hard, so let's go back to basics

It's not always easy to find a new topic for SQL-tips, videos or presentations. So today, I'm going back to basics. How is a SQL query executed.

When you write SQL SELECT queries (no matter which dialect, but my examples use T-SQL), you start with the SELECT clause followed by a list of columns or expressions. Then you go on with the FROM clause, JOIN clauses, WHERE clause, GROUP BY, HAVING and ORDER BY. But that's how the query is syntactically written. When SQL Server (or any other RDBMS) executes the query, the order is different.

I hope you will find this useful. I'm planning a video series on SQL basics as well. Maybe if you're a SQL Friday reader, you're already into SQL querying. But not everyone is. I think SQL such a powerful language and I want to try and help more people become curious about SQL. This video series, whenever it happens to be recorded will be available on the YouTube channel mentioned above.

Anyway - order of execution in a SELECT query, here we go.

1. FROM

The first clause to be evaluated in a SQL query is the FROM clause. Transact-SQL is a bit special, because you can write a SELECT query without a FROM clause. In other SQL dialects you can't. But let's assume you're querying one or multiple tables. Then the FROM clause is the first thing that is evaluated. To the FROM clause belongs JOIN and APPLY.

2. WHERE

Predicates in the WHERE clause happen after FROM. This is probably the first pit new SQL developers fall into. Since the SELECT clause is not yet evaluated, you can not use expression names here. If you for example have "Quantity * ItemPrice AS LineTotal", you can't use the alias "LineTotal" in your WHERE clause. It's logic, because the expression alias is not yet evaluated. SQL Server doesn't know about the alias yet when the WHERE clause is evaluated.

One caveat about the WHERE clause. You can't filter by the result of window functions or grouped aggregate functions in the WHERE clause. The later can be handled by the HAVING clause. But filtering by the result of window functions isn't possible in the having clause either. To work around that, I usually wrap my query inside a Common Table Expression (CTE) and then select from the CTE, filtering by the result of window functions calculated inside the CTE. SQL Server will optimize the query such that the predicate is evaluated together with the inner CTE query, but the SQL standard doesn't allow WHERE predicates directly on results of window functions. And Microsoft tries their best to keep Transact-SQL standardised.

3. GROUP BY

Grouping with the GROUP BY clause happens after the WHERE clause. This is logic in a way. The filter predicates on the underlying data set is evaluated before grouping happens. Otherwise, grouping would have to evaluate a whole lot more data than necessary, and the results would be wrong in every case when filtering happens on other columns than the grouping columns. If you plan to put a column into the SELECT list, that column have to be a part of the GROUP BY clause. This also makes sense. Grouping means you loose details, because grouping effectively gives you unique groups based on the expressions you group by, so adding more than these to the SELECT list is an impossible task.

4. HAVING

Having is used for filtering after the grouping has happened. Typically you would use HAVING to filter based on the results of aggregation (sum, min, max etc). You can filter based on the columns you group by, but bare in mind that if you for example group by color and then filter out all colors that aren't "red", grouping happens before and the database engine will have to do a lot of unnecessary work. Filtering by color, I would probably do that with a WHERE predicate instead. Chances are SQL Server will be smart enough to figure this out and perform the filtering before grouping anyway, but logically, HAVING happens after grouping and the technical implementation in the RDBMS of your choice may very well have implemented it technically the same way the SQL standard defines the logical order of execution.

5. SELECT

SELECT list is evaluated as the fifth logical execution step. Which means this is when aliases and expressions become known to the query engine. Before this, you will need to use full column names and expressions.

Stop to think for a bit now. The SELECT-list comes first in a SELECT query. But it's evaluated as the fifth step. If you ever saw a LINQ query, it's syntactically closer to the logical order of execution than a SQL query.

As with all other clauses, SELECT clause is evaluated in an all-at-once fashion. Meaning you can't use alias names defined as part of the SELECT list. An example:

"SELECT Quantity * ItemPrice AS LineTotal, LineTotal * Discount as LineDiscount

FROM ... "

The above is invalid, because when evaluating the expression aliased LineDiscount, the LineTotal is an unknown expression. Correct would be "Quantity * ItemPrice * Discount AS LineDiscount".

6. ORDER BY

Ordering happens after the select list is evaluated. This means you're allowed to ORDER BY alias names. Going back to the example with "Quantity * ItemPrice AS LineTotal", you can use the alias name LineTotal in the ORDER BY clause, because when the ORDER BY clause is evaluated, LineTotal exists. Before this, it doesn't. In the ORDER BY clause, you can use column ordinals instead of names or aliases. "ORDER BY 1" means sort the result by the first column in the SELECT list. I strongly advice you to not use column ordinals for your ordering. It looks messy and it's very easy to make mistakes. What if you change the order of the columns in the SELECT list. Will you always remember to change the column ordinal in the ORDER BY clause accordingly?

7. OFFSET, TOP, LIMIT

In the SQL Standard, OFFSET/FETCH is used to limit the number of rows returned by a query. But different SQL dialects have their own way of doing this. In Transact-SQL we can use TOP to limit the number of results. And TOP has the WITH TIES option too, which OFFSET FETCH doesn't have. However, OFFSET/FETCH has the SKIP option, as a way to implement pagination in a SQL query. I think both have their use cases, but when I don't need to use WITH TIES, I try to use OFFSET/FETCH instead of TOP, simply because OFFSET/FETCH is a part of the standardised SQL language, and TOP isn't.

Gethyn Ellis

Helping Business Leaders Drive Growth with Data & Tech | Interim CTO | SQL Server | Power BI | Microsoft Fabric | MCT & MVP | Author & Speaker | Founder @gethynellis.com | CIO @NICSWellbeing

1 年

Season 8! Wow. I think I was speaker 4 or 5 in season 1. A great event well worth tuning in.

Nikola Ilic

I make music from the data??Data Mozart ??| MVP Data Platform | O'Reilly Author | Pluralsight Author | MCT

1 年

Oh, it's been already season 8! This is amazing, thanks for all you do for the community

Kevin Chant

Data Platform MVP in Microsoft Fabric Technology area. Co-organizer of the Dutch Fabric User Group and founder of the DataWeekender conference.

1 年

Sounds good, might have to submit to the next season of SQL Friday myself.

要查看或添加评论,请登录

Magnus Ahlkvist的更多文章

  • SQL Friday Newsletter #11 - New year, new speakers

    SQL Friday Newsletter #11 - New year, new speakers

    Hi there! Autumn season of SQL Friday got really short, because I had to deal with personal issues. That's how life…

  • SQL Friday newsletter #10 - SQL Friday is back!

    SQL Friday newsletter #10 - SQL Friday is back!

    Finally, a new season of SQL Friday can start. We start this Friday (October 18) with a fairly new speaker from the…

  • SQL Friday newsletter - Call for Speakers edition

    SQL Friday newsletter - Call for Speakers edition

    First of all, I would like to thank all the speakers who has made SQL Friday what it is over the years. We have had 145…

  • SQL Friday Newsletter #9

    SQL Friday Newsletter #9

    SQL Friday season 8 is almost over - only two episodes left before we take a short summer break. Tomorrow, Elena…

  • SQL Friday Newsletter #8

    SQL Friday Newsletter #8

    Next week, Paresh Motiwala comes to SQL Friday to present 20 mistakes he has made as a DBA, and how he "survived" them.…

    1 条评论
  • SQL Friday newsletter #7

    SQL Friday newsletter #7

    This Friday, Erland is coming back to SQL Friday. The topic is "Don't let your permissions be hijacked!".

  • SQL Friday newsletter #6

    SQL Friday newsletter #6

    From last Friday, SQL Friday is now livestreaming to YouTube. This means videos from SQL Friday are published to…

    2 条评论
  • SQL Friday newsletter #4

    SQL Friday newsletter #4

    This week's newsletter won't have a SQL-tip. Work- and life-schedule simply didn't allow for it.

  • SQL Friday newsletter #3

    SQL Friday newsletter #3

    Hi! Thanks for reading! Klaus Aschenbrenner was supposed to present on SQL Friday on November 10 but he got ill. Good…

    2 条评论
  • SQL Friday newsletter #2

    SQL Friday newsletter #2

    Hi. Thanks for reading! Today, my SQL-tip of the week is quite a bit longer than last time.

    3 条评论

社区洞察

其他会员也浏览了