Iceberg REST Catalog Overview #8 - Scan Plan Retrieval and Cancellation

Iceberg REST Catalog Overview #8 - Scan Plan Retrieval and Cancellation

Register for 2025 Apache Iceberg Summit

Free Copy of Apache Iceberg: The Definitive Guide

Free Apache Iceberg Course

2025 Apache Iceberg Architecture Guide

Ultimate Iceberg Resource Guide

We explored how to submit a table scan request using Apache Iceberg’s REST Catalog API. Once a scan is submitted, clients need a way to retrieve the scan results or cancel the scan if it’s no longer needed.

This blog covers:

  1. Fetching the results of a submitted scan plan
  2. Handling different response statuses
  3. Cancelling a scan plan to free up resources

Fetching the Result of a Scan Plan (GET /plan/{plan-id})

After submitting a scan request using /plan, the server might not return results immediately. Instead, it provides a plan-id, which the client uses to fetch results once the scan is ready.

Example Request to Fetch Scan Results

GET /v1/warehouse/namespaces/sales/tables/orders/plan/scan-12345 HTTP/1.1  
Host: iceberg.catalog.com  
Authorization: Bearer <your-access-token>        

Understanding the Response Statuses

The response to this request can have several statuses:

1. Completed (Scan Results Available)

If the scan has been fully planned, the response includes the plan-tasks and file-scan-tasks required to execute the scan.

{
  "status": "completed",
  "plan-tasks": [
    {
      "file": "s3://data-lake/sales/orders.parquet",
      "start": 0,
      "length": 5242880
    }
  ]
}        

?? Action: Proceed with executing the scan using the provided tasks.

2. Submitted (Scan Still in Progress)

If the scan is still being processed, the response returns a “submitted” status.

{
  "status": "submitted",
  "plan-id": "scan-12345"
}        

?? Action: Wait and retry the request later.

3. Failed (Error Occurred)

If the scan fails, an error response is returned.

{
  "status": "failed",
  "error": "Table not found"
}        

?? Action: Check the error message and troubleshoot accordingly.

4. Cancelled (Scan No Longer Valid)

If the scan has been cancelled, the response includes “cancelled” status.

{
  "status": "cancelled",
  "message": "The plan-id is no longer valid."
}        

?? Action: Discard the plan-id and do not retry.

Cancelling a Scan Plan (DELETE /plan/{plan-id})

If a scan is no longer needed, clients should explicitly cancel it to release server resources.

Example Request to Cancel a Scan Plan

DELETE /v1/warehouse/namespaces/sales/tables/orders/plan/scan-12345 HTTP/1.1  
Host: iceberg.catalog.com  
Authorization: Bearer <your-access-token>        

If the cancellation is successful, the server returns HTTP 204 No Content.

When Should You Cancel a Scan?

? The scan is still in “submitted” status, and the results are no longer needed. ? The scan was initiated, but no plan tasks were fetched. ? The client is shutting down or switching to a different query plan.

?? No Need to Cancel If:

  • The scan has already been completed, and the results have been retrieved.
  • The scan has failed or been automatically canceled by the server.

Best Practices for Managing Scan Plans

  • Poll efficiently → Use exponential backoff when fetching scan results to reduce unnecessary requests.
  • Cancel unused scans → Prevent resource wastage by explicitly cancelling no longer needed scans.
  • Handle errors gracefully → Be prepared for 404 errors if the plan-id expires or the table is deleted.

Conclusion

Fetching and canceling scan plans gives Iceberg users greater control over query execution and resource utilization. By following best practices, teams can ensure efficient scan planning, reduced latency, and better resource management in large-scale data lakehouse environments.

要查看或添加评论,请登录

Alex Merced的更多文章