SQL Crash Course — Lesson 7 — JOINs
Diogo Henrique de Bem
Data Engineer | Spark | AWS | CDK | SQL | Data Modeling | Data Architecture | Python | Data Pipelines | Cloud | ETL | Big Data | Analytics | Business Intelligence
Hey folks… ready for one more lesson?
Today we are going to talk about the JOIN statement. JOIN is used whenever you want to “join” two tables together using a common value/column.
When working with data coming from relational databases you will notice that using JOINs, for most analyses, is really necessary. Therefore, I encourage you to try to understand the concepts I explain in this lesson and play around a bit on the →SQLFiddle sandbox.←
For this lesson, let’s add one more table to our Suits Database — a Lawsuits table.
We have basically 4 different types of JOINs
The syntax for the JOIN follows a specific pattern :
SELECT ... FROM TABLE_A JOIN TABLE_B ON TABLE_A.ID=TABLE_B.EXT_ID
Whatever comes after the ON indicates how both tables connect — it will get clearer once you see the examples below….
To start, let’s write a simple query joining these 2 tables:
SELECT * FROM suits_users INNER JOIN lawsuits
ON suits_users.user_id=lawsuits.owner_id
Did you notice that even though we have 8 rows on both tables, we just got 7 rows?
That’s because we performed an INNER JOIN, which means the engine will only return rows that perfectly match the condition we have after the ON. Thus, the ID has to exist on both tables.
Notice that the lawsuit_id=8 is not returned. That’s because the user_id of that lawsuit does not exist in the suits_user table. The same goes for users that don’t have any lawsuit — they don’t show up.
I used “select *” in this example, which will return all columns from both tables. If that’s not desired, you can specify the individual columns you’d like to see (as good practice, always specify the name of the table before the field using “table+ dot+field”. That will make your query a lot easier to read) :
SELECT suits_users.user_name,lawsuits.lawsuit_name
FROM suits_users INNER JOIN lawsuits
ON suits_users.user_id=lawsuits.owner_id
2. LEFT JOIN
Left join will bring all the rows that exist on the “left” table of the join, independently if it has a pair in the right table or not. In the example below, suits_users is the LEFT table, since it’s on the LEFT side of the JOIN word.
领英推荐
SELECT * FROM suits_users LEFT JOIN lawsuits
ON suits_users.user_id=lawsuits.owner_id
Notice that by doing this we added 4 new rows. Those are rows that exist on the “left table” and don’t have a pair on the “right table”. That’s why all the columns coming from the “right table” show null values.
3. RIGHT JOIN
The exact opposite happens if we do a right join:
SELECT * FROM suits_users RIGHT JOIN lawsuits
ON suits_users.user_id=lawsuits.owner_id
In this case, we get all the results that exist on the right table, even if it doesn’t have a pair on the table. In this case, the data that should be coming from the LEFT table are shown as ‘null’
NOTE: RIGHT JOIN and LEFT JOIN can return the same thing if you change the order of the tables:
SELECT * FROM suits_users RIGHT JOIN lawsuits
ON suits_users.user_id=lawsuits.owner_id
is the same as
SELECT * FROM lawsuits LEFT JOIN suits_users
ON suits_users.user_id=lawsuits.owner_id
4. FULL JOIN (NOT AVAILABLE IN MYSQL)
FULL JOIN is like a combination of all three above (left, right and inner join) and will return every value that exists on ANY of the tables.
SELECT * FROM lawsuits FULL JOIN suits_users
ON suits_users.user_id=lawsuits.owner_id
5. WHICH JOIN TO USE?
The type of JOIN you will use depends on what you want to analyze.
In this example, we could state the following:
That’s all folks..
Data Engineer | Spark | AWS | CDK | SQL | Data Modeling | Data Architecture | Python | Data Pipelines | Cloud | ETL | Big Data | Analytics | Business Intelligence
1 年Raul Pontello