登录查看更多内容

Simplifying Geospatial Analytics with the New Sedona STAC Reader

Feng Zhang, PhD

Principal Software Engineer @ Wherobots

发布日期: 2025年3月5日

Integrating extensive satellite imagery and geospatial datasets into analytics platforms has traditionally been a complex and time-consuming effort. As the primary developer and contributor to the SpatioTemporal Asset Catalog (STAC) Reader feature, I’m excited to announce its introduction in the upcoming release of Apache Sedona 1.7.1. This feature simplifies geospatial analytics by enabling seamless ingestion and analysis of multi-dimensional datasets.

For example, urban planners assessing a city’s resilience to climate-induced flooding can utilize the STAC Reader to integrate and analyze?Sentinel-2 satellite imagery efficiently, identifying vulnerable areas and informing effective mitigation strategies. Here’s how this functionality can be implemented:

// Load STAC data into Sedona dataframe
val df = spark.read
  .format("stac")
  .option("itemsLimitMax", "100")
  .option("itemsLoadProcessReportThreshold", "2000000")
  .option("itemsLimitPerRequest", "100")
  .load("https://earth-search.aws.element84.com/v1/collections/sentinel-2-pre-c1-l2a")

// Show schema and data
df.printSchema()

Despite the wealth of information contained within STAC datasets, professionals often encounter several obstacles when integrating them into geospatial analytics workflows:

Scalability: Geospatial datasets, especially high-resolution imagery like Sentinel-2, are large and computationally demanding. Processing such data can lead to performance bottlenecks, extended processing times, and increased costs without adequate infrastructure and optimized workflows.
Analytical Workflow Integration: Incorporating geospatial data into existing analytical workflows can be challenging due to its specialized nature and potential lack of institutional knowledge. This skills gap can impede the adoption of geospatial technologies and the integration of STAC datasets into existing analytical frameworks.

Recognizing these challenges, the Apache Sedona community will introduce the STAC Reader in version 1.7.1 to facilitate smoother integration of multi-dimensional datasets into geospatial analytics and queries. By supporting the STAC specification, Sedona enables users to seamlessly access and analyze diverse geospatial assets, such as Sentinel-2 satellite imagery, without extensive data preprocessing. This integration aligns with Sedona’s mission to provide efficient and scalable solutions for spatial data processing, empowering users to derive valuable insights from complex datasets with greater ease.

-- In this example, the data source pushes down the temporal filter to the 
-- underlying data source, efficiently retrieving records within the 
-- specified date range.
SELECT id, datetime as dt, geometry, bbox
FROM STAC_TABLE
WHERE datetime BETWEEN '2020-01-01' AND '2020-12-13'


-- In this example, the spatial filter is applied to retrieve records whose 
-- geometries are contained within the specified polygon, optimizing data 
-- retrieval by leveraging spatial indexing.
SELECT id, geometry
FROM STAC_TABLE
WHERE st_contains(ST_GeomFromText('POLYGON((17 10, 18 10, 18 11, 17 11, 17 10))'), geometry)

The launch of the STAC Reader in Apache Sedona 1.7.1 marks a significant advancement for professionals working with geospatial data. This new feature tackles common challenges related to the integration of STAC datasets, simplifying data ingestion and improving analytical workflows. As a result, users can fully utilize the potential of geospatial data, leading to more informed decision-making and encouraging innovation across various industries.

要查看或添加评论，请登录

Feng Zhang, PhD的更多文章

Optimizing Spatial Queries with Distance-Bound kNN Join

2025年3月18日

Optimizing Spatial Queries with Distance-Bound kNN Join

Queries in k-Nearest Neighbors (kNN) Joins can at times be very inefficient. This inefficiency occurs when the join…
Optimizing KNN Joins with Broadcast in Apache Sedona

2025年2月10日

Optimizing KNN Joins with Broadcast in Apache Sedona

One of the key challenges in performing k-Nearest Neighbors (KNN) joins in distributed systems is the performance…

1 条评论
Understanding Prolly Trees: A Step-by-Step Guide to How They Work

2024年11月16日

Understanding Prolly Trees: A Step-by-Step Guide to How They Work

Prolly trees are an advanced data structure designed for immutability and efficiency, making them perfect for versioned…
Decoding the Complexities of Scalable KNN Joins

2024年10月21日

Decoding the Complexities of Scalable KNN Joins

K-Nearest Neighbor (KNN) Join is widely used in geospatial analysis, recommendation systems, and machine learning…

1 条评论
Exploring the Convergence of Federated JOIN & RAG

2024年3月28日

Exploring the Convergence of Federated JOIN & RAG

Two powerful concepts in data integration and AI stand out for their ability to synthesize information from disparate…

2 条评论
Why scale matters in learning and predictive models?

2017年6月27日

Why scale matters in learning and predictive models?

It has been broadly believed that to sustain in future marketplaces it is one of the key abilities to acquire, store…
Job Openings at Aetion Inc. (LA)

2017年4月21日

Job Openings at Aetion Inc. (LA)

We have immediate openings at our Aetion LA office: Graduate Engineers:
Job Openings at Aetion

2016年4月15日

Job Openings at Aetion

Please see the following link: Engineering Director of QA NYCSystems Engineer NYCUI/UX Engineer NYC (preferred), LA…

See all articles

Feng Zhang, PhD的更多文章

Optimizing Spatial Queries with Distance-Bound kNN Join

Optimizing KNN Joins with Broadcast in Apache Sedona

Understanding Prolly Trees: A Step-by-Step Guide to How They Work

Decoding the Complexities of Scalable KNN Joins

Exploring the Convergence of Federated JOIN & RAG

Why scale matters in learning and predictive models?

Job Openings at Aetion Inc. (LA)

Job Openings at Aetion

社区洞察