USA Urban Tree Cover with Nearmap: Technical Back Story
Dr. Michael Bewley
Mapping the evolution of cities with petabyte scale deep learning on geospatial imagery.
It's nearly a year and a half since I posted the Australian National Tree Cover Analysis Back Story. That's a long time between stories - what have we been up to? In the interim, we've been busy building and rolling out a slew of new products that are built on top of Nearmap AI Gen 5 (packing in around 80 semantic segmentation layers into our model, and massively increasing our training set size).
I also squeezed in the time to do a decade long longitudinal study of tree canopy in the city of Adelaide in a four part blog series (including tracking the story of several suburbs over a decade with some fun visualisations, a qualitative comparison with LiDAR and a quantitiative comparison with both LiDAR and human expert labels).
I wanted to take things to the next level, and expand both geographically, and in terms of scale. For that, I took things state side. Where Australia's population is around 26 million, the United States of America is home to over 331 million residents, according to the freshly released 2020 census data. That's a 12-13x scale up. The USA also involved some new challenges, such as a different census data set, and far higher prevalence of deciduous trees (which we often capture deliberately in a "leaf off" state).
Methodology
I wanted to be as consistent as I could with our work on Australia. As a quick summary, this meant:
Scale
The most obvious difference between the two studies was scale. To make the problem of analysing such a large amount of data practical, I moved on from "Nellie", the old faithful workstation, with a nostalgic sigh, and switched across to SageMaker Studio. While it would have been technically possible to re-engineer the analysis code to work on Nellie (computing in parallel for longer, and chunking to avoid memory limits), there was something seductively simple about dialling up a 96 core, 768 GB RAM monster of a machine at the click of a button for the more intensive parts of the analysis, and falling back to something smaller and cheaper for working on final visualisations and statistics.
USA 2020 Census
At the time of performing the analysis, the USA census was updated with 2020 statistics (done each decade, rather than every 4 years like in Australia, it was important not to fall back on population data more than 10 years old).
Census Blocks (US) differ slightly from Mesh Blocks (AU), but are in principle the same - the smallest statistical area unit. Census Blocks are deliberately focussed on visible boundaries, like roads, rivers, etc and often represent one "city block". They also have a larger population (typically 600-3,000 people).
There was no available categorisation of census block type (such as with "residential" mesh blocks), so a zero census population in a census block was used as a proxy for a non-residential label.
"Suburb" analysis and "Greater Sydney" type areas don't have direct analogies in US data. Instead, we aggregated the Census Blocks up to the 2020 "Places" data, which are larger in population than Australian suburbs, and a subset of them represent the metropolitan areas of US state capitals.
An improved version of the nearmap-ai-user-guides python package was used to pull the data, which should not have any methodological changes, but is able to deal better with Census Blocks (which are frequently multipolygons or contain holes, unlike Mesh Blocks).
Dealing with Seasonality
Seasonality was a much greater factor in the USA than Australia. Deciduous trees form a larger part of the canopy in the USA, and part of the Nearmap capture program is explicitly focussed on capturing the "leaf off" scenario, in order to provide maximum visibility of the cities that lie beneath. It is important to note that Nearmap AI does capture trees without leaves quite effectively - however I didn't want systematic bias introduced between the cities as the predicted areas are a little smaller (it can be hard to spot the thin, leafless edges of bare twigs at a tree's perimeter). Leaf-off occurs at different times each year, and in different locations. To avoid the need to model geographically varying seasonality explicitly I used the simpler approach of pulling all available surveys within a 24 month period (1st July 2020 to 30th June 2022), and choosing the survey date for each census block that had maximum tree cover. The typical number of surveys in 24 month window varied between two and six captures, depending on the regularity of our capture program.
As an aside, if I was to repeat this study with Gen 5 Nearmap AI data, I would be able to easily create a much more nuanced approach. There is new "leaf off" vegetation layer, which can be used to explicitly identify and ignore areas on dates impacted by seasonality.
Steps
Results
领英推荐
National Statistics and Coverage
The Census Block summary included 4.65 million census blocks, and a population of 279.8 million; that's 83.6% of the nations population as per the 2020 census.
This data set included 110.7 million building polygons (over 10 thousand square miles of buildings), and 279 thousand square miles of tree canopy.
The median census block tree cover (for residential blocks with population > 0) was 21%.
Finally, it turned out that 52% of the USA population covered by this study were living in a "leafy" census block (defined in the same way as the Australian national study, requiring at least 20% tree cover).
In this map, yellow shows all census blocks included in the analysis, and red shows towns and cities from the Places data set.
Here, the 4.65 million census blocks are shown shaded in greens (darker for higher tree coverage).
City and Town Analysis
The Places data set includes a wide range of cities and towns - from very large, to small. The criteria for giving a valid result on a "Place" was that we needed coverage of at least 90% of the Census Blocks within the Place, and the population needed to be at least 1,000 residents.
46 of these Places represent recognised boundaries of state capitals. 377 of the Places had a population greater than 100,000, 978 Places a population of at least 50,000, 4,379 at least 10,000, and 11,684 at least 1,000.
Capital city ranking in the Leafiest Capital Cities blog post was performed by intersecting the census blocks with the Places data set. The result is a perimeter of census block surrounding each place is also included. This has the effect of placing a city in the context of its immediate surroundings, and has an influence on the results (whether the city is surrounded by national park, or other urban areas). An image of the highest ranking capital city - Charleston, West Virginia - is displayed below. Blue outlines show census block boundaries with non-zero population that were included in the analysis, and the orange outline shows the official boundary of the city from the Places data set.
The raster AI Layers are essentially the raw output of the deep learning model - in the image below, we show roof tops in orange, tree canopy in green, and overlap between the two in red. Charleston, West Virginia and Little Rock, Arkansas are two of the leafiest state capitals in the USA. However, Charleston's trees are mostly around the perimeter of the city, whereas Little Rock's are mixed much more evenly amongst the suburban houses.
When aggregated at the census block level (shaded darker for higher tree cover), this difference in distribution becomes more apparent:
Conclusion
Nearmap imagery and AI form a unique data set on which to perform earth observation on detailed urban environments, at massive scale. Longitudinal studies, inter-city, and even international comparisons are all possible to do with consistent methodology and a high degree of accuracy. Where this analysis looks solely at two of our eighty or so layers (trees and buildings), this work is equally possible with the rest - but we'll leave that analysis to another day!
Director, Australian Urban Research Infrastructure Network (AURIN)
1 年Aaron Magri Dr Paula Hooper