登录查看更多内容

SQL Queries Quickstart: Part II

Jose Canciani

Principal Engineer at Avature | @josecanciani at ??

发布日期: 2024年4月30日

Welcome back. This is part II of my SQL Queries Quickstart Guide. If this is your first time here, please read the first part which includes the Index as well as the conventions used during the series.

?? Read Part I here

By this time, you should be familiar in writing a statement to query a single table. Now we will embark in multi-table queries, but in order to do so, we first need to know about a special type of constraint.

Foreign Keys

One of the objectives of a Relational database is to easily connect data in different tables. This was a difficult task to do with the early file-based databases, but SQL makes it easy.

Tables are connected between each other using Foreign Keys (aka FKs). This is a special type of index that references rows in a different table (the referenced table). The basic idea is that you create a column to contain the identifier -primary key value- of a referenced-table's row.

If that was confusing, let's view it with an example in Educa.

Let's add a third table, enrollments, that will connect a user with a course. This table will have two FKs pointing to the other two tables:

The enrollments table, with it's two foreign keys to the courses and users tables.

Remember this article will not focus on DDL statements (like this CREATE TABLE here), but it's important to show it so we can understand how a foreign key is defined. Watch for the last two lines how we set two local columns to reference the primary keys of the other two tables.

Foreign keys are considered constraints too. Why? Because if we now try to add a row with an invalid user or course ID number, the database will reject it. This is one of the most compelling features of a relational databases: it works as the end line of defense to keep your data consistent, without having to rely in custom application code to do so.

Multi table queries

Ok, now that we have our three table setup, lets run a query to view all students and the courses they have enrolled in:

???? Play on Fiddle

The query starts on enrollments and then joins its rows with the rows on the other two tables. Let's review what's new here.

Aliases

They are added with the AS keyword, although in some cases it's optional to do so. Aliases have different purposes:

Changing a column label, specially useful when using functions like the CONCAT in the example. This will result in a more beautiful table output.
Easily work with duplicated column names, or even joining the same table more than once.
Better naming for tables. Look at how the users table becomes a student.

Connecting tables with JOINs

The JOIN keyword allows to connect a row from one table (left table) to the rows of another (join or right table).

In our example, both joins represent a 1:1 relationship. This means that for each row in the enrollments table, we match only one row in each of the right tables.

Even though there are a lot of different JOINs specified by the ANSI SQL standard, in practice I would argue there are only two important ones: the INNER JOIN and the LEFT OUTER JOIN. I recommend sticking with this two, as the others typically can be rewritten using these two anyways (or are almost never needed at all).

INNER JOIN

This is the most common join type, and the one we used in our examples (the INNER keyword is optional). It selects rows that have matching values in both tables.

Here's a visual representation (source: w3schools):

An inner join will select rows in which both tables have a match.

In our example every enrollment needs both a student and a course.

领英推荐

Transact SQL

TopNotch Technology 2 年前

Mastering Subqueries in SQL: An In-depth Guide with…

??Database Design SQL??Development MySQL ??Data Analyst ??Business Intelligence 12 个月前

Comprehensive Guide To SQL (Structured Query Language)

Ijaz Khan 5 个月前

Notice that Foreign Keys may have NULL values. For example, we may want to start a draft enrollment process without assigning a student, introducing the concept of a vacant seat for the class in our application. In this situation, how can we get all enrollments, even those without a student assigned?

LEFT OUTER JOIN

The LEFT OUTER JOIN (the OUTER keyword is optional) allows us to get all records from the left table, and the matching on the right table. Here's another visual representation:

A left join will includes rows from the left table, even without a matching row in the right one.

???? Play in Fiddle

In this Fiddle playground I've removed the NOT NULL constraint from the userId column. Notice how all rows without a matching student record will just return a NULL value:

LEFT JOIN, notice the NULL values on the fullName column.

Anti Join

There's a final join pattern that is really useful, called the Anti Join. Imagine a situation where you want all rows from the left table that don't have a matching row in the right table. Here it is visually:

This can be obtained by doing a LEFT JOIN and then requiring the right column to be NULL. Let's see an example in Educa. We want to send information about courses to new users. In other words, we want to find to all users without an enrollment:

???? Play on Fiddle

Look at that, all our users have an enrollment, except for the admin account. Educa business is thriving!

Notice the tables used in this query: the Foreign Key is defined in the right table. That's why we need the Anti Join pattern. There's no way to get this information starting from the enrollments table. Actually, that's not entirely true, you can use one of the less common RIGHT JOIN types. In my experience, I've never had to use a RIGHT join, I could always get away rewriting a query using LEFT joins.

Tip: what happens if we change the condition to "WHERE enrollment.userId IS NOT NULL"? Technically, we would be transforming a LEFT JOIN into an INNER JOIN. This happens any time we add a filter (other than IS NULL) in the WHERE section over a column of a LEFT JOINed table. Why? Because any NULL value in a comparison will force the filter to return NULL. And NULL = NULL is actually a FALSE statement! The filter returns NULL, which in turn evaluates to FALSE. This is a very common -bad- assumption new developers make.

Too many rows and Cartesian Product

Joining can be tricky. If you forget to add a join predicate, then you'll get all the combinations between each row of the left table with the rows in the right table (a "cartesian" product).

Let's explain this with an example. We want to find all Educa students with an enrollment. If we just do a JOIN, here's what we get:

Students with an enrollment, why are they duplicated?

There are duplicate rows! What's happening?

The reason is that there are more than one enrollment for some students. It's tricky to see it, after all, we haven't added enrollment columns to the query, right? But that's the way a JOIN works.

Even worse, if you add a third table, also duplicating rows, you'll get the combination of all the rows, producing even more duplicate rows in the result set.

DISTINCT

One way to solve this, is to use the DISTINCT keyword before specifying the columns to query. This will remove the duplicates, but it may not be too good for performance: the engine will first retrieve all the rows, and then perform the deduplication. This could be time consuming on big datasets, and it can mess with your pagination (but more on these on the Performance article, Part IV, coming later).

How can we do this filtering? We'll have to use subqueries. But this will be explained in Part III. Look for it in the replies, or stay tuned to my Articles section on ??.

#sql #select #queries #database #dba #developer #DataAnalysis #tutorial #DatabaseDevelopment #SQLServer #MySQL #PostgreSQL #Oracle #SQLite

要查看或添加评论，请登录

Jose Canciani的更多文章

Privacy Unlocked: Apple vs UK Government

2025年2月26日

Privacy Unlocked: Apple vs UK Government

Apple recently dropped Advanced Data Protection (ADP) in the UK, and it’s a move that hits hard for user privacy. This…
Beyond Mocking: Towards Robust and Realistic Testing

2024年11月14日

Beyond Mocking: Towards Robust and Realistic Testing

Mocking, a popular technique in unit testing, often fails to achieve its intended goals. Instead of helping to provide…
Is Simplicity making a comeback? Pieter Levels on Lex Fridman’s Podcast

2024年8月27日

Is Simplicity making a comeback? Pieter Levels on Lex Fridman’s Podcast

In a recent conversation that has caught the attention of the tech community, Pieter Levels, known on ?? as @levelsio…
SQL Queries Quickstart: Part III

2024年5月8日

SQL Queries Quickstart: Part III

Welcome back, again! This is part III of my SQL Queries Quickstart Guide. If this is your first time here, please…
SQL Queries Quickstart: Part I

2024年4月25日

SQL Queries Quickstart: Part I

Welcome to my SQL Queries Quickstart! Designed for complete newbies or developers with some SQL knowledge, this article…

4 条评论
Remote Coding with VSCode & Tunnels

2024年4月16日

Remote Coding with VSCode & Tunnels

I code in #VSCode in a M1 Mac. Unfortunately the main app I develop does not run on ARM.
Programming with Early Returns

2024年4月10日

Programming with Early Returns

Writing code can be messy. But there are things you can do to make it more enjoyable.
Avature en ECI 2016: Reports reports reports

2016年7月7日

Avature en ECI 2016: Reports reports reports

Volvemos a estar presentes en ECI este a?o! Estaré hablando de como evolucionó nuestro sistema de reportes, insertados…

See all articles

SQL Queries Quickstart: Part II

Jose Canciani

Principal Engineer at Avature | @josecanciani at ??

Foreign Keys

Multi table queries

Aliases

Connecting tables with JOINs

INNER JOIN

领英推荐

LEFT OUTER JOIN

Anti Join

Too many rows and Cartesian Product

DISTINCT

Jose Canciani的更多文章

社区洞察

其他会员也浏览了

Learn SQL Comprehensive Guide for Beginners in 2024

A not-so-good idea: Pipe Syntax In SQL

7 Best SQL Assignment Help Websites

Introduction to T-SQL: Career Opportunities & Applications

Reading Source File Members Using SQL

Exploring SQL: Definition, Benefits, and Practical Applications

Best Practices for Writing Impressive SQL Code

Top 9 Best Practices When Writing SQL

SQL Clause (Select, Where, Order by, Limit) | SQL-MySQL | Belayet Hossain

Comprehensive Guide to SQL

Foreign Keys

Multi table queries

Aliases

Connecting tables with JOINs

INNER JOIN

领英推荐

LEFT OUTER JOIN

Anti Join

Too many rows and Cartesian Product

DISTINCT

Jose Canciani的更多文章

Privacy Unlocked: Apple vs UK Government

Beyond Mocking: Towards Robust and Realistic Testing

Is Simplicity making a comeback? Pieter Levels on Lex Fridman’s Podcast