Null-Duplicate=Joins
Pavan kumar Cheemala
ETL Developer | Informatica PowerCenter | IICS | Data Integration | SQL & Oracle Specialist | UNIX | Teradata | GitHub | ServiceNow|Rally|WinScp|Putty|Concourse|Harbour|Harness|Outlook|MS Excel For Data Analysis.
SQL joins are a powerful way to combine data from multiple tables. However, what happens when you have null and duplicate data in one of the tables? In this blog post, we will explore how to use SQL joins with null data and duplicate data.
What is Null Data?
Null data is a special value that represents an unknown value. Null data is not the same as zero, and it is not the same as an empty string. Null data simply means that the value is unknown.
How to use SQL joins with Null Data?
When you join two tables that have null data, the null values will not be matched. This means that any rows that contain null values in the join condition will not be included in the result set.
How does duplicate Data affect SQL joins?
When you join two tables that have Duplicate data, The matching rows will be repeated a number of times the data on the other table is duplicated. will see this with the example below.
The Scenario:
let's say we have two tables: table1 and table2. We want to join table1 and table2 On their respective columns id1 and id2.
As you can see we have the duplicate and null values in both the table1 and table2
We will see different types of join in this example.
INNER JOIN:
The INNER JOIN combines rows from both tables where the join condition is satisfied. It returns only the matching rows.
SELECT *
FROM table1 t1 INNER JOIN table2 t2
ON t1.id1 = t2.id2;
Output:
Explanation of the above output:
As you can see from the above picture that each 1's in the table1 get matched with 1's of the table2 . So in total, we have 6 outputs. (3 pink arrows + 3 purple arrows). You see that the null values have not got matched tho there are present in both tables because the null can't be compared with the null value.
LEFT JOIN:
The left join returns all rows from the left table (the table specified before the JOIN keyword) and the matching rows from the right table. If no match is found, NULL values are returned for the columns of the right table.
SELECT *
FROM table1 t1 LEFT JOIN table2 t2
ON t1.id1 = t2.id2;
Output:
Explanation of the above output:
As you can see from the above picture that each 1's in the table1 get matched with 1's of the table2 and remaining rows of the left table. So in total, we have 10 outputs. (3 pink arrows + 3 purple arrows+4 box of table1).
领英推荐
??
This may seem confusing for programmers, used to comparing NULL values, But in a database NULL != NULL.
RIGHT JOIN:
The right join returns all rows from the right table (the table specified after the JOIN keyword) and the matching rows from the left table. If no match is found, NULL values are returned for the columns of the left table.
SELECT *
FROM table1 t1 RIGHT JOIN table2 t2
ON t1.id1 = t2.id2;
Output:
Explanation of the above output:
As you can see from the above picture that each 1's in the table1 get matched with 1's of the table2 and remaining rows of the right table. So in total, we have 9 outputs. (3 pink arrows + 3 purple arrows+3 box of table2).
Full OUTER JOIN:
The Full Outer Join, also known as Full Join, combines the result sets of both the Left Join and Right Join. It returns all rows from both tables and includes NULL values for non-matching rows. Full Outer Join is helpful when you need to retrieve all records from both tables, regardless of matches.
SELECT *
FROM table1 t1 FULL JOIN table2 t2
ON t1.id1 = t2.id2;
Output:
Explanation of the above output:
As you can see from the above picture that each 1's in the table1 get matched with 1's of the table2 and remaining rows of table1 and table2 . So in total, we have 9 outputs. (3 pink arrows + 3 purple arrows + 4 box of table1 + 3 box of table2 ).
??
In real-time, it is not ideal to have null or duplicate values on joining columns of a table. This is because null values can make it difficult to join tables, and duplicate values can skew the results of a join.
Handling Null Values
Handling Duplicate Values
Conclusion:
Here are some key takeaways from the blog post:
Data Analyst ?? | B.Tech ?? in (ECE) | Proficient in Python???, , R Basics, SQL??, and Advanced Excel | Skilled in Data Visualization with Tableau and Power BI??[DAX] |Completed PwC Power BI Job Simulation|
4 个月Very helpful