登录查看更多内容

Reclassify DEM

Chonghua Yin

Head of Data Science | Climate Risk & Extreme Event Modeling | AI & Geospatial Analytics

发布日期: 2019年12月18日

A very useful technique to work with raster data (such as DEM) is changing their values or grouping them into categories, which is the so-called reclassification. In another word, reclassification is the process of reassigning a value, a range of values, or a list of values in a raster to new output values. There are several reasons to classify rasters:

set specific values to NoData to exclude them from analysis;
change values in response to new information or classification schemes;
replace one set of values with an associated set;
assign values of preference, priority, sensitivity, or similar criteria to a raster;
others.

There are two categories of classifying methods (Joel Lawhead, 2019). The first one is fairly simple, which is based only on some value distribution algorithm derived from the histogram of raster data or user-defined groups. This kind of simplest methods is called unsupervised classifications, where no additional input is given other than the image itself. The second category is quite complicated as it may involve training datasets and even computer learning and artificial intelligence. Generally, this kind of methods involves some sort of training data to guide the computer. Therefore it is called supervised classifications. For example, identifying crop types from satellite images is a kind of typical application of supervised classification.

Here we will only touch the unsupervised classification method, where values for each cell are compared with the range limits in the lookup table, considering the specified comparison criteria. Whenever a value falls into a given range, the class value specified for this range will be used in the output layer. The following image presents an example of how to reclassify the original values from base raster by ranges to new reclassified values:

Example of reclassification by value ranges (Image source: ESRI.)

Raster Classification Steps

From the above demo image, we know that the raster processing workflow could be broken into several steps as follows:

Step 1: Data import / cleanup Load and “clean” the data. This may include cropping, dealing with NA values, etc.
Step 2: Data exploration Understand the range and distribution of values in your data. This may involve plotting histograms scatter plots, etc.
Step 3: More data processing & analysis This may include the final data processing steps that you determined based upon the data exploration phase.
Step 4: Final data analysis The final steps of your analysis - often performed using information gathered in the early data processing / exploration stages of your workflow.
Step 5: Presentation Refining your results into a final plot or set of plots that are cleaned up, labeled, etc.

Keep in mind that these steps are not compulsory, which depends on your data. You always can figure out your own workflow. For example, sometimes the Step 2 and 3 could be not necessary if you already know the specified classification criteria in advance.

Python implement

Although many geospatial software such as ArcMap and QGIS provides raster reclassification functions, in fact, using Python to reclassify raster is also quite convenient. And sometimes, you could figure out more customized classification criteria to fit your purpose than those software.

The Step 1, 2, 3 and 5 have presented in previous tutorials.

We always apply xarray (backended with rasterio) to read and clean data.
We use scipy.stats and seaborn to explore data and check data distributions (bins). Or we could use the numpy.histogram() to compute the histogram of a set of data, easily.
We use matplotlib (+ cartopy or folium) to visualize and present final results.

Look, you already mastered most of the steps and only Step 4 is left. Let's have a look how to convert cell values to categories. Firstly, let's presume that the DEM data is stored in a 2D numpy.array with rows and columns (actually they are).

The first method is that you can use an iteration procedure to pick each cell along with rows and column to compare with each category bounds and assign a new value to that cell. This is the most straightforward method and quite easy to implement. However, it is always not a good idea to use loops in Python.

Luckily, we have the second choice that numpy has a function called digitize( ) that could simplify our classifying task. numpy.digitize( ) is similar to how histogram works, because it categorizes datapoints into bins. However, unlike histogram, it doesn’t aggregate/count the number of values within each bin. Instead, digitize will replace each datapoint with an integer corresponding to which bin it belongs to (i.e., category). You can use this to determine which datapoints fall within certain ranges.

I strongly suggest you to have a look at this tutorial, which presented a procedure step by step to create a classified canopy height model (CHM). Moreover, many material is extracted from the tutorial and therefore all credit also should go to it.

However, it is not my style to operate on numpy.array since we already applied xarray to read and clean data. It should be better to work directly on xarray.dataarray, instead of convert it to numpy.array (unless no other choises). We have the third method :). We can use the python package of mapclassify to reclassify rasters, which could be done by xarray.apply_ufunc(mapclassify.method).

mapclassify is an open-source python library for Choropleth map classification and includes quite a few unsupervised classification methods as follows:

Box_Plot
EqualInterval
FisherJenks
FisherJenksSampled
HeadTailBreaks
JenksCaspall
JenksCaspallForced
JenksCaspallSampled
MaxP
MaximumBreaks
NaturalBreaks
Quantiles
Percentiles
StdMean
UserDefined.

In this notebook, we will present a simple reclassification example based on mapclassify. After this, you will find it is not that difficult to reproduce the tutorial of reclassifying CHM mentioned above based on mapclassify. I will give you the most part of code, but leave the key part to you to fill.

Show time

%matplotlib inline

import numpy as np
import xarray as xr
import mapclassify as mc
import matplotlib.pyplot as plt

import warnings
warnings.filterwarnings("ignore")

Read and preprocess DEM data

Removing with nodata values

infile = "data/FloodFill/terrain.asc"

da_dem = xr.open_rasterio(infile).drop('band')[0].rename({'x':'longitude', 'y':'latitude'})
nodata = da_dem.nodatavals[0]
da_dem = da_dem.where(da_dem>nodata, np.nan)

Reclassify DEM using Quantiles

Map Raster Values to New Values

da_dem_stack = da_dem.stack(point=['latitude', 'longitude'])
da_dem_class = xr.apply_ufunc(mc.Quantiles.make(rolling=True), da_dem_stack).unstack('point')

Comparison visualization

fig, ax = plt.subplots(ncols=2, figsize=(15, 6))

da_dem.plot(ax=ax[0], vmax=150, vmin=0, add_colorbar=False)
ax[0].axis('off')
_= ax[0].set_title('Original', fontsize=16)

da_dem_class.plot(ax=ax[1],  add_colorbar=False )
ax[1].axis('off')
_= ax[1].set_title('After Reclassification with Quantiles', fontsize=16)

Bonus

Steps to reproduce the tutorial of reclassifying CHM mentioned above

Read and clean data
Create the CHM (See Sup2: DEM, DTM, DSM and Canopy Height Model)
Self-define category bounds (bins)
Classify the data
Visualization

Most of these steps are quite easy to achieve. But you may meet a little bit difficult at xarray.apply_ufunc(mapclassify.method, da_chm). If you can figure out an elegant solution, instead of using hard-code, I can confirm that you actually move a big step towards applying xarray to analyze spatial data.

The final reclassification image I reproduced looks almost the same to the original tutorial.

Read and clean data

dtm_path = 'data/colorado-flood/spatial/boulder-leehill-rd/pre-flood/lidar/pre_DTM.tif'
dsm_path = 'data/colorado-flood/spatial/boulder-leehill-rd/pre-flood/lidar/pre_DSM.tif'

da_dtm = xr.open_rasterio(dtm_path).drop('band')[0].rename({'x':'longitude', 'y':'latitude'})
nodata = da_dtm.nodatavals[0]
da_dtm = da_dtm.where(da_dtm>nodata, np.nan)

da_dsm = xr.open_rasterio(dsm_path).drop('band')[0].rename({'x':'longitude', 'y':'latitude'})
nodata = da_dsm.nodatavals[0]
da_dsm = da_dsm.where(da_dsm>nodata, np.nan)

Create CHM

lidar_chm_im = da_dsm - da_dtm

Define bins and classify chm

I leave the key part for you to fill.

# Define bins that you want, and then classify the data
class_bins = [lidar_chm_im.min(), 2, 7, 12, np.inf]

#clsfunc = partial(mc.UserDefined, bins=class_bins)
lidar_chm_im_stack = lidar_chm_im.stack(point=['latitude', 'longitude'])

# filling this part

# post-process
lidar_chm_class_ma = lidar_chm_im_class.where(lidar_chm_im_class != 0, np.nan)

Visualization

# Plot data using nicer colors
colors = ['linen', 'lightgreen', 'darkgreen', 'maroon']

cmap = ListedColormap(colors)
norm = BoundaryNorm(class_bins, len(colors))

# Create a list of labels to use for your legend
height_class_labels = ["Short trees", "Less short trees", "Medium trees", "Tall trees"]

# A path is an object drawn by matplotlib. In this case a patch is a box draw on your legend
# Below you create a unique path or box with a unique color - one for each of the labels above
legend_patches = [Patch(color=icolor, label=label)
                  for icolor, label in zip(colors, height_class_labels)]

fig, ax = plt.subplots(figsize=(15, 7))
lidar_chm_class_ma.plot(ax=ax, cmap=cmap, add_colorbar=False)

ax.legend(handles=legend_patches,
          facecolor="white",
          edgecolor="white",
          bbox_to_anchor=(1.15, 1))  # Place legend to the RIGHT of the map
ax.set_axis_off()
_= ax.set_title("Classified Canopy Height", fontsize=16)

References

Joel Lawhead (2019). Learning Geospatial Analysis with Python - Third Edition.

Travis E, Oliphant. A guide to NumPy, USA: Trelgol Publishing, (2006).

Stéfan van der Walt, S. Chris Colbert and Ga?l Varoquaux. The NumPy Array: A Structure for Efficient Numerical Computation, Computing in Science & Engineering, 13, 22-30 (2011), DOI:10.1109/MCSE.2011.37

Fernando Pérez and Brian E. Granger. IPython: A System for Interactive Scientific Computing, Computing in Science & Engineering, 9, 21-29 (2007), DOI:10.1109/MCSE.2007.53

John D. Hunter. Matplotlib: A 2D Graphics Environment, Computing in Science & Engineering, 9, 90-95 (2007), DOI:10.1109/MCSE.2007.55

https://rasterio.readthedocs.io/en/stable/

https://xarray.pydata.org/en/stable/

https://github.com/pysal/mapclassify

https://www.earthdatascience.org/courses/earth-analytics/lidar-raster-data-r/classify-raster/

要查看或添加评论，请登录

Chonghua Yin的更多文章

SPEI: A Smarter Way to Measure Drought

2025年3月8日

SPEI: A Smarter Way to Measure Drought

When we think about drought, we often focus on rainfall—how much (or little) precipitation a place receives. But is…
NaN Wrangling: LOESS/LOWESS to the Rescue

2025年3月8日

NaN Wrangling: LOESS/LOWESS to the Rescue

Have you ever tried interpolating geospatial data near coastlines, only to find your results ruined by NaN (Not a…

2 条评论
Efficient Geospatial Nearest Neighbor Search with KDTree and xarray

2025年3月1日

Efficient Geospatial Nearest Neighbor Search with KDTree and xarray

When working with large-scale geospatial data, efficient nearest neighbor search is crucial. This article explores how…

1 条评论
Unlocking Data's Potential: Four Types of Analytics

2025年2月21日

Unlocking Data's Potential: Four Types of Analytics

In today's data-driven world, businesses that can harness the power of analytics gain a significant competitive edge…
Analytics: Team Driven

2025年2月19日

Analytics: Team Driven

A data analytics team’s strength doesn’t come from a single exceptional individual but from the collective impact of…
Secret to Product Longevity: Simplicity, Support, and Feedback

2025年2月15日

Secret to Product Longevity: Simplicity, Support, and Feedback

In today's rapidly evolving tech landscape, products constantly emerge and transform. Yet, some stand the test of time,…
Flying High: A Simple Metaphor for Business

2025年2月11日

Flying High: A Simple Metaphor for Business

I recently discussed the relationship between marketing and sales with a friend. During our conversation, he used a…
Project Life Cycle vs. Product Life Cycle: Embracing Agile Product Thinking

2025年2月6日

Project Life Cycle vs. Product Life Cycle: Embracing Agile Product Thinking

In business management, grasping project and product life cycle disparities is paramount. Although both concepts entail…
Separating Data APIs and Business Logic with an API Gateway

2025年1月23日

Separating Data APIs and Business Logic with an API Gateway

Today, I conversed with a friend about separating data APIs from business logic. Coincidentally, my friend is a wine…
Direct Access to NetCDF Files in TAR Archives

2024年8月30日

Direct Access to NetCDF Files in TAR Archives

Recently, I need to validate the performance of wind data from CONUS404 against observational data at a specific site…

See all articles

Reclassify DEM

Chonghua Yin

Head of Data Science | Climate Risk & Extreme Event Modeling | AI & Geospatial Analytics

Raster Classification Steps

Python implement

Show time

Read and preprocess DEM data

Reclassify DEM using Quantiles

Comparison visualization

Bonus

Read and clean data

Create CHM

Define bins and classify chm

Visualization

References

Chonghua Yin的更多文章

社区洞察

其他会员也浏览了

Zyrix DataZen! ZDZ for Data Analysis in 2025 - Analytics Insight:

Handling imbalanced data with SMOTE

Exploratory Data Analysis (EDA)

Roles and Responsibilities of Data Scientists

Data Science Notes _ Part 1

The Essential Guide to Data Cleaning and Preprocessing with Pandas

Preliminary Data Analysis with Automated EDA: A CRISP ML(Q) Approach

Navigating a Career in Data Analytics: Seizing Future Trends

Quick Review: Model Development and Business Metric Evaluation using Sci-Kit Learn

Advanced Polling Analysis System

Raster Classification Steps

Python implement

Show time

Read and preprocess DEM data

Reclassify DEM using Quantiles

Comparison visualization

Bonus

Read and clean data

Create CHM

Define bins and classify chm

Visualization

References

Chonghua Yin的更多文章

SPEI: A Smarter Way to Measure Drought

NaN Wrangling: LOESS/LOWESS to the Rescue

Efficient Geospatial Nearest Neighbor Search with KDTree and xarray

Unlocking Data's Potential: Four Types of Analytics

Analytics: Team Driven

Secret to Product Longevity: Simplicity, Support, and Feedback

Flying High: A Simple Metaphor for Business

Project Life Cycle vs. Product Life Cycle: Embracing Agile Product Thinking

Separating Data APIs and Business Logic with an API Gateway

Direct Access to NetCDF Files in TAR Archives

社区洞察

其他会员也浏览了

Zyrix DataZen! ZDZ for Data Analysis in 2025 - Analytics Insight:

Handling imbalanced data with SMOTE

Exploratory Data Analysis (EDA)

Roles and Responsibilities of Data Scientists

Data Science Notes _ Part 1

The Essential Guide to Data Cleaning and Preprocessing with Pandas

Preliminary Data Analysis with Automated EDA: A CRISP ML(Q) Approach

Navigating a Career in Data Analytics: Seizing Future Trends

Quick Review: Model Development and Business Metric Evaluation using Sci-Kit Learn

Advanced Polling Analysis System