登录查看更多内容

NaN Wrangling: LOESS/LOWESS to the Rescue

Chonghua Yin

Head of Data Science | Climate Risk & Extreme Event Modeling | AI & Geospatial Analytics

发布日期: 2025年3月8日

Have you ever tried interpolating geospatial data near coastlines, only to find your results ruined by NaN (Not a Number) values? Imagine working with sea surface temperature data—just when you need smooth, realistic values along the land-sea boundary, standard interpolation methods fail due to NaNs, leaving gaps or unrealistic artefacts in your dataset.

This is a common challenge in geospatial data processing, especially with land/sea masks. Interpolation methods like nearest-neighbor or bilinear interpolation often struggle with NaNs. So, how do we work around this? The answer lies in LOESS/LOWESS (Locally Estimated Scatterplot Smoothing).

LOESS (locally estimated scatterplot smoothing) regression combines aspects of weighted moving average smoothing with weighted linear or polynomial regression. LOESS is also called LOWESS, which stands for locally weighted scatterplot smoothing.

Why Traditional Interpolation Fails with NaNs

Before diving into LOESS/LOWESS, let's briefly examine why standard interpolation methods struggle with NaNs:

Nearest-Neighbor Interpolation: This method is simple and avoids NaN propagation, but the results tend to be blocky and fail to capture smooth variations in data.
Bilinear Interpolation: Works well for continuous data but ultimately breaks down in the presence of NaNs, leading to missing or distorted values.

Since both methods have limitations, we need a more adaptive approach—this is where LOESS/LOWESS comes in.

How LOESS/LOWESS Handles NaNs Gracefully

LOESS/LOWESS offers a more sophisticated way to handle missing values by leveraging local regression. Here’s why it excels:

Local Adaptability

Unlike global regression models, LOESS/LOWESS builds local regression models around each target point. This makes it ideal for handling complex, non-linear data, especially at land/sea boundaries where sharp transitions occur.

Weighted Regression for Robustness

LOESS/LOWESS assigns higher weights to nearby data points, ensuring that the most relevant local data primarily influence interpolation. This also makes it more resistant to outliers.

Smart Boundary Handling

Rather than interpolating unthinkingly, LOESS/LOWESS applies locally weighted regression to estimate values near boundaries.

?? Bonus: It allows user-defined parameters for fine-grained control over interpolation behaviour.

Built-in NaN Handling

LOESS/LOWESS can be modified to ignore NaNs and use only valid neighbouring data points. This prevents NaNs from disrupting interpolation while maintaining accuracy.

Where Can You Use LOESS/LOWESS?

LOESS/LOWESS has wide-ranging applications in geospatial and climate science, including:

Filling missing values in sea surface temperature maps for more realistic oceanographic analysis.
Processing remote sensing images, where data gaps occur due to sensor limitations.
Climate modelling, where handling missing values properly can improve model accuracy.

However, while LOESS/LOWESS is highly effective for 2D spatial interpolation, its computational cost increases with larger datasets. That said, frameworks like Dask can parallelize the computation if you work with large datasets. By applying LOESS/LOWESS layer by layer, Dask enables scalable, efficient processing, making it a viable solution even for larger, high-dimensional datasets. The image below shows a demonstration results of LOESS interpolation.

Summary and Discussions

NaNs don’t have to be the enemy of interpolation. With LOESS/LOWESS, you get a powerful and flexible method that adapts to local data variations, handles NaNs effectively, and improves interpolation quality.

Next time you encounter NaNs in geospatial data, will you reach for LOESS/LOWESS? Let me know how you handle missing data in your projects!

References

NIST (2012) LOESS (aka LOWESS)

https://www.itl.nist.gov/div898/handbook/pmd/section1/pmd144.htm

Cleveland, W.S. (1979) Robust locally weighted regression and smoothing scatterplots. Journal of the American Statistical Association, Vol. 74, pp. 829-836.

https://sites.stat.washington.edu/courses/stat527/s13/readings/Cleveland_JASA_1979.pdf

NIST (2012) Example of LOESS computations

https://www.itl.nist.gov/div898/handbook/pmd/section1/dep/dep144.htm

Ryan Abernathey

Scientist and Startup Founder

2 天前

What Python package do you recommend for LOESS?

4 次回应

查看更多评论

要查看或添加评论，请登录

Chonghua Yin的更多文章

SPEI: A Smarter Way to Measure Drought

2025年3月8日

SPEI: A Smarter Way to Measure Drought

When we think about drought, we often focus on rainfall—how much (or little) precipitation a place receives. But is…
Efficient Geospatial Nearest Neighbor Search with KDTree and xarray

2025年3月1日

Efficient Geospatial Nearest Neighbor Search with KDTree and xarray

When working with large-scale geospatial data, efficient nearest neighbor search is crucial. This article explores how…
Unlocking Data's Potential: Four Types of Analytics

2025年2月21日

Unlocking Data's Potential: Four Types of Analytics

In today's data-driven world, businesses that can harness the power of analytics gain a significant competitive edge…
Analytics: Team Driven

2025年2月19日

Analytics: Team Driven

A data analytics team’s strength doesn’t come from a single exceptional individual but from the collective impact of…
Secret to Product Longevity: Simplicity, Support, and Feedback

2025年2月15日

Secret to Product Longevity: Simplicity, Support, and Feedback

In today's rapidly evolving tech landscape, products constantly emerge and transform. Yet, some stand the test of time,…
Flying High: A Simple Metaphor for Business

2025年2月11日

Flying High: A Simple Metaphor for Business

I recently discussed the relationship between marketing and sales with a friend. During our conversation, he used a…
Project Life Cycle vs. Product Life Cycle: Embracing Agile Product Thinking

2025年2月6日

Project Life Cycle vs. Product Life Cycle: Embracing Agile Product Thinking

In business management, grasping project and product life cycle disparities is paramount. Although both concepts entail…
Separating Data APIs and Business Logic with an API Gateway

2025年1月23日

Separating Data APIs and Business Logic with an API Gateway

Today, I conversed with a friend about separating data APIs from business logic. Coincidentally, my friend is a wine…
Direct Access to NetCDF Files in TAR Archives

2024年8月30日

Direct Access to NetCDF Files in TAR Archives

Recently, I need to validate the performance of wind data from CONUS404 against observational data at a specific site…
Merge Overlapping Rasters Using R and Terra

2024年3月22日

Merge Overlapping Rasters Using R and Terra

When utilizing tiled spatial data, it's common to come across overlapping tiles. For instance, when we chose four tiles…

See all articles

Why Traditional Interpolation Fails with NaNs

How LOESS/LOWESS Handles NaNs Gracefully

Where Can You Use LOESS/LOWESS?

Summary and Discussions

References

Chonghua Yin的更多文章

SPEI: A Smarter Way to Measure Drought

Efficient Geospatial Nearest Neighbor Search with KDTree and xarray

Unlocking Data's Potential: Four Types of Analytics

Analytics: Team Driven

Secret to Product Longevity: Simplicity, Support, and Feedback

Flying High: A Simple Metaphor for Business

Project Life Cycle vs. Product Life Cycle: Embracing Agile Product Thinking

Separating Data APIs and Business Logic with an API Gateway

Direct Access to NetCDF Files in TAR Archives

Merge Overlapping Rasters Using R and Terra