NaN Wrangling: LOESS/LOWESS to the Rescue
Chonghua Yin
Head of Data Science | Climate Risk & Extreme Event Modeling | AI & Geospatial Analytics
Have you ever tried interpolating geospatial data near coastlines, only to find your results ruined by NaN (Not a Number) values? Imagine working with sea surface temperature data—just when you need smooth, realistic values along the land-sea boundary, standard interpolation methods fail due to NaNs, leaving gaps or unrealistic artefacts in your dataset.
This is a common challenge in geospatial data processing, especially with land/sea masks. Interpolation methods like nearest-neighbor or bilinear interpolation often struggle with NaNs. So, how do we work around this? The answer lies in LOESS/LOWESS (Locally Estimated Scatterplot Smoothing).
LOESS (locally estimated scatterplot smoothing) regression combines aspects of weighted moving average smoothing with weighted linear or polynomial regression. LOESS is also called LOWESS, which stands for locally weighted scatterplot smoothing.
Why Traditional Interpolation Fails with NaNs
Before diving into LOESS/LOWESS, let's briefly examine why standard interpolation methods struggle with NaNs:
Since both methods have limitations, we need a more adaptive approach—this is where LOESS/LOWESS comes in.
How LOESS/LOWESS Handles NaNs Gracefully
LOESS/LOWESS offers a more sophisticated way to handle missing values by leveraging local regression. Here’s why it excels:
Unlike global regression models, LOESS/LOWESS builds local regression models around each target point. This makes it ideal for handling complex, non-linear data, especially at land/sea boundaries where sharp transitions occur.
LOESS/LOWESS assigns higher weights to nearby data points, ensuring that the most relevant local data primarily influence interpolation. This also makes it more resistant to outliers.
Rather than interpolating unthinkingly, LOESS/LOWESS applies locally weighted regression to estimate values near boundaries.
?? Bonus: It allows user-defined parameters for fine-grained control over interpolation behaviour.
LOESS/LOWESS can be modified to ignore NaNs and use only valid neighbouring data points. This prevents NaNs from disrupting interpolation while maintaining accuracy.
Where Can You Use LOESS/LOWESS?
LOESS/LOWESS has wide-ranging applications in geospatial and climate science, including:
However, while LOESS/LOWESS is highly effective for 2D spatial interpolation, its computational cost increases with larger datasets. That said, frameworks like Dask can parallelize the computation if you work with large datasets. By applying LOESS/LOWESS layer by layer, Dask enables scalable, efficient processing, making it a viable solution even for larger, high-dimensional datasets. The image below shows a demonstration results of LOESS interpolation.
Summary and Discussions
NaNs don’t have to be the enemy of interpolation. With LOESS/LOWESS, you get a powerful and flexible method that adapts to local data variations, handles NaNs effectively, and improves interpolation quality.
Next time you encounter NaNs in geospatial data, will you reach for LOESS/LOWESS? Let me know how you handle missing data in your projects!
References
NIST (2012) LOESS (aka LOWESS)
Cleveland, W.S. (1979) Robust locally weighted regression and smoothing scatterplots. Journal of the American Statistical Association, Vol. 74, pp. 829-836.
NIST (2012) Example of LOESS computations
Scientist and Startup Founder
2 天前What Python package do you recommend for LOESS?