Spatial Stats of Raster Upon Polygons - Zonal Operations
Chonghua Yin
Head of Data Science | Climate Risk & Extreme Event Modeling | AI & Geospatial Analytics
Map algebra, alternatively referred to as cartographic modeling, constitutes a straightforward yet potent algebraic approach that employs an array of tools, operators, and functions for conducting geographical analysis with raster data. This method proves effective in manipulating geographic data through mathematical functions to achieve specific outcomes or information.
There are four basic function types used in map algebra:
§? Local Operations: involving the scrutiny of raster data cell by cell, where the value in each cell of one layer is compared with the values in the corresponding cell in other layers, such as addition, minus, etc.
§? Focal Operations: entailing the comparison of the value in each cell with the values in its neighboring cells. For example, convolution, kernel, and moving windows are focal operations.
§? Global Operations: generating results that are applicable to the entire layer, such as adding a scaling factor to all cells.
§? Zonal Operations: computing results for within a specified zone defined by rater or vector. An instance includes determining the total precipitation for a watershed.
The local and global operations are quite straightforward and easy to implement. We presented focal operations before. In this tutorial, we will focus on the most interesting map algebra functional type of zonal operations, which also is one of the most prevalent tasks for developers of spatial analysis applications. For example, we can apply the functions to assess the risk of buildings being impacts of floods, wildfires, or other natural hazards in small-scale areas (defined by polygons) (Figure 1). The building data could be obtained from Google’s Open Buildings or Microsoft’s Worldwide building footprints derived from satellite imagery.
The conventional method involves the clipping of raster data with each asset polygon and then performing statistical analysis individually (e.g., using rasterstats, rioxarray, salem, xarray, geopandas, etc.). This method is intuitive and easy to understand. However, this approach will become highly inefficient when there are many polygons (e.g., over hundreds of thousands). As a result, some people have considered using a parallel approach, where asset polygons are divided into several groups, and extraction and statistics are performed on each group separately (e.g., Dask-GeoPandas). This approach indeed increases processing speed, but its essence remains iterative, albeit with grouping, so the speed improvement is still relatively limited when dealing with many polygons.
We can contemplate a reverse approach as well. Why not digitize the asset polygons directly into raster data with a style closely mirroring the target raster data (e.g., the flood raster in Figure 1), including the same spatial domain and resolution? By pursuing this strategy, we can leverage a distinct field within the property polygons as a key to pinpoint the location of each asset polygon. This process effectively generates a mask matrix to extract data from the target raster and conduct subsequent statistical analyses (e.g., Figure 2). The tools to rasterize polygons include rasterio, regionmask, geocube, superstar gdal, etc.
Indeed, there are other choices available. We can employ specialized spatial analysis software to finish such kinds of tasks. For instance, ArcGIS Pro provides the ZonalStatisticsAsTable function, or you can utilize QgsZonalStatistics within QGIS. It's important to note that this approach necessitates the installation of these software packages to access their functionalities. This might not align with your preferences, especially if you seek to utilize lightweight scripts exclusively. Read the technical document about ArcGIS Pro’s ZonalStatisticsAsTable. You'll notice that it initiates the rasterization of polygons to generate masking arrays, essentially sharing the same concept as the second method.
Of course, each approach has its strengths and weaknesses. If you have a relatively small amount of polygonal data, you may opt for the first method. However, the second method may be more suitable for dealing with many polygons. The first two methods provide greater flexibility and versatility. On the other hand, if your analysis workflow is closely tied to a specific software, such as ArcGIS or QGIS, then it's advisable to consider the third approach. This ensures compatibility with your software and can streamline your analysis process within that particular environment.
References
领英推荐
?