登录查看更多内容

Efficient Geospatial Nearest Neighbor Search with KDTree and xarray

Chonghua Yin

Head of Data Science | Climate Risk & Extreme Event Modeling | AI & Geospatial Analytics

发布日期: 2025年3月1日

When working with large-scale geospatial data, efficient nearest neighbor search is crucial. This article explores how to combine KDTree and xarray to achieve fast and accurate geospatial nearest neighbor searches, addressing some common challenges.

Problem Background

In fields like GIS, climate science, and remote sensing, we often need to find the nearest data points based on given coordinates (e.g., longitude and latitude). Traditional linear search methods become inefficient with large datasets. xarray provides powerful multi-dimensional data handling, but its built-in sel(method='nearest') has limitations when dealing with geospatial data, as it selects based on coordinate value proximity, not actual geographic distance.

Solution: Combining KDTree and xarray

KDTree is an efficient spatial indexing data structure for fast nearest neighbor lookups. By combining KDTree with xarray, we overcome the limitations of sel(method='nearest') and achieve accurate geospatial nearest neighbor searches.

Implementation Steps

Build KDTree: Construct a KDTree using the longitude and latitude coordinates from your xarray dataset. For large geographic extents, consider converting coordinates to a projected coordinate system.
Vectorized Queries: Store the longitude and latitude coordinates of all query points in a NumPy array for vectorized queries.
Search with KDTree: Use tree.query(query_points, k=1, distance_upper_bound=radius) for the search, where radius is the search radius. k=1 ensures finding the unique nearest neighbor. distance_upper_bound limits the search radius.
Process Search Results: KDTree returns flattened index arrays. Use numpy.unravel_index to convert them to original array indices. Check index values to handle cases where no neighbor is found (index value tree.data.shape[0]).
Map Indices Back to xarray Dataset: Use xarray.isel to select corresponding data in the xarray dataset based on the indices.

Code Example

import xarray as xr
import numpy as np
from scipy.spatial import KDTree

def find_unique_nearest_neighbors(ds, query_lons, query_lats, radius):
    """
    Find unique nearest neighbors, allowing for cases where no neighbor is found.
    """
    points = np.column_stack((ds['lon'].values.ravel(), ds['lat'].values.ravel()))
    tree = KDTree(points)
    query_points = np.column_stack((query_lons, query_lats))
    distances, indices = tree.query(query_points, k=1, distance_upper_bound=radius)
    results = []
    for i, index in enumerate(indices.flatten()):
        if index == tree.data.shape[0]:
            results.append(None)
        else:
            nearest_indices = np.unravel_index(index, ds['lon'].shape)
            nearest_data = ds.isel(lon=nearest_indices[1], lat=nearest_indices[0])
            results.append(nearest_data)
    return results

Summary

By integrating KDTree with xarray, we can efficiently perform geospatial nearest neighbor searches while overcoming the limitations of xarray's built-in methods. This approach offers a robust solution for various geospatial data analysis tasks.

However, further exploration is also needed on how to quickly and effectively extract valid points from the data, combine them with invalid points, and return the results in the same order as the input points. In addition, it is important to consider how to extend this approach to high-dimensional data, ensuring scalability and adaptability for more complex geospatial analyses.

References

https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.KDTree.html

https://xarray.pydata.org/en/stable/

要查看或添加评论，请登录

Chonghua Yin的更多文章

SPEI: A Smarter Way to Measure Drought

2025年3月8日

SPEI: A Smarter Way to Measure Drought

When we think about drought, we often focus on rainfall—how much (or little) precipitation a place receives. But is…
NaN Wrangling: LOESS/LOWESS to the Rescue

2025年3月8日

NaN Wrangling: LOESS/LOWESS to the Rescue

Have you ever tried interpolating geospatial data near coastlines, only to find your results ruined by NaN (Not a…

2 条评论
Unlocking Data's Potential: Four Types of Analytics

2025年2月21日

Unlocking Data's Potential: Four Types of Analytics

In today's data-driven world, businesses that can harness the power of analytics gain a significant competitive edge…
Analytics: Team Driven

2025年2月19日

Analytics: Team Driven

A data analytics team’s strength doesn’t come from a single exceptional individual but from the collective impact of…
Secret to Product Longevity: Simplicity, Support, and Feedback

2025年2月15日

Secret to Product Longevity: Simplicity, Support, and Feedback

In today's rapidly evolving tech landscape, products constantly emerge and transform. Yet, some stand the test of time,…
Flying High: A Simple Metaphor for Business

2025年2月11日

Flying High: A Simple Metaphor for Business

I recently discussed the relationship between marketing and sales with a friend. During our conversation, he used a…
Project Life Cycle vs. Product Life Cycle: Embracing Agile Product Thinking

2025年2月6日

Project Life Cycle vs. Product Life Cycle: Embracing Agile Product Thinking

In business management, grasping project and product life cycle disparities is paramount. Although both concepts entail…
Separating Data APIs and Business Logic with an API Gateway

2025年1月23日

Separating Data APIs and Business Logic with an API Gateway

Today, I conversed with a friend about separating data APIs from business logic. Coincidentally, my friend is a wine…
Direct Access to NetCDF Files in TAR Archives

2024年8月30日

Direct Access to NetCDF Files in TAR Archives

Recently, I need to validate the performance of wind data from CONUS404 against observational data at a specific site…
Merge Overlapping Rasters Using R and Terra

2024年3月22日

Merge Overlapping Rasters Using R and Terra

When utilizing tiled spatial data, it's common to come across overlapping tiles. For instance, when we chose four tiles…

See all articles

Problem Background

Solution: Combining KDTree and xarray

Implementation Steps

Code Example

Summary

References

Chonghua Yin的更多文章

SPEI: A Smarter Way to Measure Drought

NaN Wrangling: LOESS/LOWESS to the Rescue

Unlocking Data's Potential: Four Types of Analytics

Analytics: Team Driven

Secret to Product Longevity: Simplicity, Support, and Feedback

Flying High: A Simple Metaphor for Business

Project Life Cycle vs. Product Life Cycle: Embracing Agile Product Thinking

Separating Data APIs and Business Logic with an API Gateway

Direct Access to NetCDF Files in TAR Archives

Merge Overlapping Rasters Using R and Terra