SQL’s EXISTS and NOT EXISTS: A Comprehensive Guide
Lasha Dolenjashvili
Data Solutions Architect @ Bank of Georgia | IIBA? Certified Business Analyst | Open to Freelance, Remote, or Relocation Opportunities
Let's learn about two powerful SQL constructs: EXISTS and NOT EXISTS.
What are EXISTS and NOT EXISTS?
The EXISTS clause is used to test for the existence of any record in a subquery. If the subquery returns at least one record, the EXISTS condition is true; if the subquery returns no records, the EXISTS condition is false.
The NOT EXISTS clause is the exact opposite. It checks if a subquery returns any records. If the subquery returns no records, the NOT EXISTS condition is true; if the subquery returns records, the NOT EXISTS condition is false.
Syntax
The syntax for EXISTS and NOT EXISTS is:
-- EXISTS
SELECT column_name(s)
FROM table_name
WHERE EXISTS (subquery);
-- NOT EXISTS
SELECT column_name(Ss)
FROM table_name
WHERE NOT EXISTS (subquery);
EXISTS and NOT EXISTS in Action
Suppose we have two tables: Customers and Orders. We want to find customers who have placed at least one order. We could use the EXISTS operator as follows:
SELECT customer_id, customer_name
FROM Customers
WHERE EXISTS (
SELECT 1 FROM Orders
WHERE Customers.customer_id = Orders.customer_id
);
On the other hand, if we want to find customers who have not placed any orders, we could use the NOT EXISTS operator:
SELECT customer_id, customer_name
FROM Customers
WHERE NOT EXISTS (
SELECT 1 FROM Orders
WHERE Customers.customer_id = Orders.customer_id
);
Use Cases
Identifying active users: Using EXISTS, you can find users who have logged in or interacted with your website.
SELECT user_id FROM Users
WHERE EXISTS (
SELECT 1 FROM UserActivity
WHERE Users.user_id = UserActivity.user_id
);
Identifying inactive users: Using NOT EXISTS, you can identify users who have never logged in or interacted with your website.
SELECT user_id FROM Users
WHERE NOT EXISTS (
SELECT 1 FROM UserActivity
WHERE Users.user_id = UserActivity.user_id
);
Finding products that have been sold: With EXISTS, you can find products that have sales records.
SELECT product_id FROM Products
WHERE EXISTS (
SELECT 1 FROM Sales
WHERE Products.product_id = Sales.product_id
);
Finding products with no sales: Using NOT EXISTS, you can identify products that have never been sold.
领英推荐
SELECT product_id FROM Products
WHERE NOT EXISTS (
SELECT 1 FROM Sales
WHERE Products.product_id = Sales.product_id
);
NOT EXISTS vs. NOT IN
Similar to EXISTS and IN, NOT EXISTS and NOT IN are often comparable in functionality. However, NOT IN can yield unexpected results when NULL values are involved. NOT IN will return an empty set if any NULL values are part of the subquery result, regardless of other values. NOT EXISTS doesn’t get confused by NULL values and behaves as expected.
In SQL, a NULL value essentially means “unknown.” Remember, it’s not equivalent to zero or an empty string; it’s a special marker used to indicate that a data value doesn’t exist in the database.
When we use the NOT IN operator in SQL, it returns TRUE if a value is not in the list of values provided or returned by a subquery. However, when NULLs come into play, things get a bit more complicated.
Consider the following query:
SELECT customer_id FROM Customers
WHERE customer_id NOT IN (SELECT customer_id FROM Orders);
In this query, if the subquery (SELECT customer_id FROM Orders) returns a list that includes some NULLs, then the NOT IN operation gets tricky.
Why? Because when SQL compares a value to NULL using a comparison operator like =, <> or NOT IN, the result is not TRUE or FALSE but rather NULL (unknown).
Let’s say we have a customer_id of "123". SQL is essentially checking if "123" is NOT IN the list (which includes NULLs). However, when it compares "123" to a NULL in the list, SQL cannot definitively say that "123" is NOT IN the list, because NULL is an unknown value. So, the result is not TRUE, and the record for customer "123" would not be included in the final result set. This happens regardless of whether "123" appears in the list or not. In the presence of NULL, NOT IN essentially returns no results.
On the other hand, when using NOT EXISTS, the situation is different. The NOT EXISTS operator returns TRUE if the subquery returns no rows, and it doesn’t get tripped up by NULL values. It only checks for the existence of rows in the subquery result, without comparing specific values, which makes it a safer choice when NULL values may be involved.
The above behavior of the NOT IN operator in SQL can vary across different database systems and SQL dialects. However, it is generally a common behavior. In most SQL dialects, including popular ones like Oracle, MySQL, and SQL Server, the presence of NULL values in the list used with NOT IN can cause this behavior. The result is often no records returned, even if the value being compared exists in the list but is also accompanied by NULLs.
EXISTS/NOT EXISTS vs LEFT JOIN
When you want to find rows in one table that do or do not have matches in another table, you might consider using a LEFT JOIN or EXISTS/NOT EXISTS.
A LEFT JOIN followed by a "WHERE … IS NULL" clause can be used to find rows from the left table that have no matching rows in the right table. However, this approach can be less efficient than using NOT EXISTS, particularly on large datasets. This is because a LEFT JOIN produces a result set that combines all records from both tables, which can be costly in terms of memory and processing time. In contrast, a NOT EXISTS query stops processing as soon as it finds a matching record.
Performance Considerations and Best Practices
?? Thanks a lot for reading the article and remember to ???????? ?????????????????? ??????????????!
GCP Data Engineer at Tech Mahindra | GCP | Bigquery | Composer | Cloud Functions | PubSub | DataFlow | Python | PySpark | SQL | Docker | Terraform
5 个月Why where column in () , will work if we have null in sub query? As you explained, if we have a value 123 and if we want to check if that value exist in the sub query, This will happen will in clause also. 123=Null. In this case also,it should throw error. Why “in” works with null values in sub query?