Filling Null Values in SQL with Three Approaches
Prashant Verma
Experienced Data Analyst | SQL Expert (HackerRank Gold) | Power BI Specialist | Delivering Data-Driven Insights & Workflow Optimization | 10+ Years Enhancing Decision-Making Across Industries
In SQL, dealing with null values is a common challenge. Suppose you have a table with categories and brand names where some rows have null values in the category column. The task is to fill these nulls with the most recent non-null value from the column. Here's how we can approach this problem using different SQL techniques.
Before I start, let me mention my mentor: Ankit Bansal who's video gave me inspiration to write this article.
His youtube channel: @ankitbansal6
Link to referenced video: https://youtu.be/Xh0EevUOWF0?si=kiXuT4Q5LxZTEswV
Input Table Example:
category brand_name
chocolates 5-star
NULL dairy milk
NULL perk
NULL eclair
Biscuits britannia
NULL good day
NULL boost
Desired Output:
category brand_name
chocolates 5-star
chocolates dairy milk
chocolates perk
chocolates eclair
Biscuits britannia
Biscuits good day
Biscuits boost
create and insert scripts:
create table brands
(
category varchar(20),
brand_name varchar(20)
);
insert into brands values
('chocolates','5-star')
,(null,'dairy milk')
,(null,'perk')
,(null,'eclair')
,('Biscuits','britannia')
,(null,'good day')
,(null,'boost');
Approach 1: Running Sum Based Grouping
This approach uses ROW_NUMBER() to create a sequence and a running sum to group records based on whether they have a non-null category.
--Approach 1:
WITH cte AS (
SELECT
*,
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS rn,
CASE
WHEN category IS NOT NULL THEN 1
ELSE 0
END AS grp
FROM brands
)
SELECT
CASE
WHEN SUM(grp) OVER(ORDER BY rn) = 1 THEN 'Chocolates'
ELSE 'Biscuits'
END AS category,
brand_name,
category AS old_category
FROM cte;
Explanation:
Approach 2: Defining Ranges
In this approach, we define a range for each category and fill null values based on these ranges.
WITH cte_running_num AS (
SELECT *, ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS rn
FROM brands
),
range_table AS (
SELECT
DISTINCT category,
rn AS rn2,
COALESCE(LEAD(rn) OVER (ORDER BY rn) - 1, 9999) AS range_end
FROM cte_running_num
WHERE category IS NOT NULL
)
SELECT
CASE
WHEN ct.rn BETWEEN rt.rn2 AND rt.range_end THEN rt.category
END AS category,
ct.brand_name
FROM cte_running_num ct
LEFT JOIN range_table rt
ON ct.rn BETWEEN rt.rn2 AND rt.range_end;
Explanation:
领英推荐
Approach 3: Recursive CTE
This solution utilizes a recursive CTE to fill all the null values by repeating the last non-null value as we traverse the table row by row.
WITH cte_fill AS (
SELECT
category,
brand_name,
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS rn
FROM brands
), recursive_cte AS (
SELECT
rn,
category,
brand_name
FROM cte_fill
WHERE category IS NOT NULL
UNION ALL
SELECT
c.rn,
COALESCE(c.category, r.category),
c.brand_name
FROM cte_fill c
JOIN recursive_cte r ON c.rn = r.rn + 1
)
SELECT category, brand_name
FROM recursive_cte
ORDER BY rn;
Explanation:
Summary
Each approach has its strengths:
These solutions offer robust ways to handle nulls in SQL, and each approach can be selected based on your data structure a
nd requirements.
General Tips for Remembering how to approach similar problem statements:
Mental Shortcut for each approaches:
Approach 1: Whenever you see a problem where nulls are grouped together and you need to fill them with the nearest value, think of "grouping and running total."
Approach 2: When you hear "fill nulls within a range," think of the LEAD() or LAG() functions and their ability to define ranges for replacement.
Approach 3: Whenever you encounter a problem where nulls should be filled sequentially, row by row, think "recursion."
Over time, practicing these approaches will make it easier to recognize when to use each one in real-world scenarios.
This article is designed for SQL enthusiasts looking to fill null values effectively in their datasets, and can serve as a valuable learning resource!
Let me know how you plan to use these solutions, and feel free to connect with me on LinkedIn. #SQL #DataAnalytics #DataScience #SQLTips