Filling Null Values in SQL with Three Approaches

Filling Null Values in SQL with Three Approaches

In SQL, dealing with null values is a common challenge. Suppose you have a table with categories and brand names where some rows have null values in the category column. The task is to fill these nulls with the most recent non-null value from the column. Here's how we can approach this problem using different SQL techniques.

Before I start, let me mention my mentor: Ankit Bansal who's video gave me inspiration to write this article.

His youtube channel: @ankitbansal6

Link to referenced video: https://youtu.be/Xh0EevUOWF0?si=kiXuT4Q5LxZTEswV

Input Table  Example:
category	    brand_name
chocolates   5-star
NULL	   dairy milk
NULL	   perk
NULL	   eclair
Biscuits	   britannia
NULL	   good day
NULL	   boost

Desired   Output:
category	        brand_name
chocolates	5-star
chocolates	dairy milk
chocolates	perk
chocolates      eclair
Biscuits	       britannia
Biscuits	       good day
Biscuits	       boost

create and insert scripts: 
create table brands 
(
category varchar(20),
brand_name varchar(20)
);
insert into brands values
('chocolates','5-star')
,(null,'dairy milk')
,(null,'perk')
,(null,'eclair')
,('Biscuits','britannia')
,(null,'good day')
,(null,'boost');        

Approach 1: Running Sum Based Grouping

This approach uses ROW_NUMBER() to create a sequence and a running sum to group records based on whether they have a non-null category.


Please copy the code from below
--Approach 1: 
WITH cte AS (
    SELECT 
        *, 
        ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS rn,
        CASE 
            WHEN category IS NOT NULL THEN 1 
            ELSE 0 
        END AS grp
    FROM brands
)
SELECT 
    CASE 
        WHEN SUM(grp) OVER(ORDER BY rn) = 1 THEN 'Chocolates'
        ELSE 'Biscuits' 
    END AS category,
    brand_name,
    category AS old_category
FROM cte;        

Explanation:

  • The ROW_NUMBER() function assigns a unique number to each row.
  • We use a CASE statement to create a grp column that marks the non-null category rows with 1 and the null ones with 0.
  • The SUM() window function is used to propagate the non-null category value to the nulls.


Approach 2: Defining Ranges

In this approach, we define a range for each category and fill null values based on these ranges.


Copy the code from code block below
WITH cte_running_num AS (
    SELECT *, ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS rn
    FROM brands
),
range_table AS (
    SELECT 
        DISTINCT category, 
        rn AS rn2, 
        COALESCE(LEAD(rn) OVER (ORDER BY rn) - 1, 9999) AS range_end 
    FROM cte_running_num 
    WHERE category IS NOT NULL
)
SELECT 
    CASE 
        WHEN ct.rn BETWEEN rt.rn2 AND rt.range_end THEN rt.category 
    END AS category, 
    ct.brand_name
FROM cte_running_num ct
LEFT JOIN range_table rt 
    ON ct.rn BETWEEN rt.rn2 AND rt.range_end;        

Explanation:

  • The ROW_NUMBER() function creates a sequence in the table.
  • In range_table, we use the LEAD() function to find the next non-null category and define the range in which nulls should be replaced.
  • A LEFT JOIN is performed to update the null values within their respective ranges.


Approach 3: Recursive CTE

This solution utilizes a recursive CTE to fill all the null values by repeating the last non-null value as we traverse the table row by row.


Check the code block below to copy the query.
WITH cte_fill AS (
    SELECT 
        category, 
        brand_name, 
        ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS rn
    FROM brands
), recursive_cte AS (
    SELECT 
        rn, 
        category, 
        brand_name
    FROM cte_fill 
    WHERE category IS NOT NULL
    UNION ALL
    SELECT 
        c.rn, 
        COALESCE(c.category, r.category), 
        c.brand_name
    FROM cte_fill c
    JOIN recursive_cte r ON c.rn = r.rn + 1
)
SELECT category, brand_name
FROM recursive_cte
ORDER BY rn;        

Explanation:

  • The base part of the recursive CTE selects rows where category is not null.
  • The recursive part fills the nulls by carrying the last non-null category downwards through the rows.
  • This approach efficiently fills all null values with the correct category.


Summary

Each approach has its strengths:

  • Approach 1: Uses window functions and is relatively simple for smaller datasets.
  • Approach 2: Allows defining a range of rows to fill nulls and offers more control over the replacement process.
  • Approach 3: Leverages recursion, making it very flexible and scalable for larger datasets.

These solutions offer robust ways to handle nulls in SQL, and each approach can be selected based on your data structure a

nd requirements.

General Tips for Remembering how to approach similar problem statements:

  1. Pattern Recognition: Look for patterns in the problem statement—are the nulls grouped together, or do they span a range of rows? This helps you quickly identify if you're dealing with groups or ranges.
  2. Contextual Clues: When the problem specifies "filling null values" or "carry forward values," this should immediately trigger these three approaches in your mind.
  3. Cheat Sheet: Maintain a simple cheat sheet of SQL problems and the associated approaches. For null value filling:

Mental Shortcut for each approaches:

Approach 1: Whenever you see a problem where nulls are grouped together and you need to fill them with the nearest value, think of "grouping and running total."

Approach 2: When you hear "fill nulls within a range," think of the LEAD() or LAG() functions and their ability to define ranges for replacement.

Approach 3: Whenever you encounter a problem where nulls should be filled sequentially, row by row, think "recursion."

Over time, practicing these approaches will make it easier to recognize when to use each one in real-world scenarios.

This article is designed for SQL enthusiasts looking to fill null values effectively in their datasets, and can serve as a valuable learning resource!

Let me know how you plan to use these solutions, and feel free to connect with me on LinkedIn. #SQL #DataAnalytics #DataScience #SQLTips

要查看或添加评论,请登录

Prashant Verma的更多文章

社区洞察

其他会员也浏览了