Why can't Google Maps find grass?
The end of the pixel king
Google Maps excels at providing detailed information on restaurants, live updates on public transport, and more. Yet, it struggles to identify something as simple as grass or an entire forest, which appears as a gray blob:?
Of course Google Maps could create those maps of land use. But then what about finding tulips? of finding corals? Or how do we keep it updated as the map updates? The point here is not one of several semantics, the point here is finding at least the same semantic any human could instantly and easily see.
This is not picking on Google Maps, our entire geospatial toolset suffers from similar limitations, which impose a significant gap in our understanding and use of Earth data.?
>We’ve increased the amount of Earth data much faster than the amount of useful information... with more barriers of entry.
Imagine if we could easily locate trees, grass, deforestation, muddy roads, coral bleaching, floods, and more. The implications for climate change, sustainability, biodiversity, and environmental monitoring are profound. Despite the influx of data from satellites, planes, and drones, the tools to process this data have not advanced at the same pace, leaving a growing gap between data availability and useful information extraction.
The Promise of AI for Earth
“How many trees have been cut this year?”, “Have the crops changed over the years due to climate change?”, … We know we have the data to answer these questions, but they are locked in an ever larger pile of data that remains slow, expensive and complex to work with. Is like getting access to the Library of Alexandria, to find out it’s all written on a dead language.
AI for Earth can bridge the gap. By leveraging advanced AI models, we can create extremely small summaries of images—called embeddings—that retain most of the information while being significantly smaller in size. For instance, our Clay model can reduce an image to an embedding of just 768 numbers, making it possible to process and analyze vast amounts of data quickly and efficiently. In our example, AI is the librarian that has read, classified and summarized all the books, and is ready to hand you to right book, and page every time, instantly.
Let's go down this rabbit hole.?
>Reader’s note: This is a long deep overview of embeddings in the context of AI for Earth, the work we do at our nonprofit Clay. I try to thread the headings and first paragraphs on each section as easy, fast reads, and nontechnical as possible, but I also go pretty down to the most nuisance details and conceptual tools. I hope to give increasingly more technical information to help readers approach this new, amazing, and unexplored work of AI for Earth.?
Images Are Deceptively Simple
Before diving more into embeddings, I want to make the point that looking at images includes a strong bias, since the power of human eyes and brain is often taken for granted, which puts computer vision on a strong handicap.
We do not realize, but when we look at an image, we leverage millions of years of evolution, and the reader’s years of education. We lean on that, and in less than a second we can capture millions of possible semantics. We don't really understand how. This book is the closest attempt to understand it.?
It also took millions of dollars to develop, fly and operate?digital cameras on planes (or satellites) and capture the signals that our eyes can use, as if we were on the spot where the camera took the picture.
Look at these images:
These are random locations across California. Any person can easily and immediately understand what’s on these images: pools in suburban settings, agriculture fields, a lake, ... Depending on your location, background, and experience, you might even identify the types of trees, or crops planted, and a million other things. The roots of geospatial and remote sensing are to understand images like these: Geospatial is the “What is where”.
Imagine now that your job is to count all trees in California. Easy enough to count or estimate trees on a single image.
If you want to count trees in the whole of California, you then have to do that "easy count" 20 million times, that's roughly the number of image chips that cover California with tiles like the above. Obviously, we use computers, computer vision, and geospatial tools to do this. The best current approach without AI is to build a machine learning based bespoke tree detector based on the circular shape and colors, segment each image, and count the trees. And we've done this for decades, explicitly encoding exactly what we want.
To really understand a remote sensing image, you need to also understand well the physics of what you are looking at (e.g. the way sediments appear tells you about river flows), looking through (the effect of the atmosphere) and looking with ( the instrument, the optics, the sensor, ...). This is especially important with data from advanced and less common but very powerful? instruments like SAR, hyperspectral, pansharpening, NDVI, ... Amazingly, our brains do -or can learn- to understand most of that intuitively or intellectually.? It takes incredible effort to rebuild that capacity with computers, since we need to start from scratch. And we need to do that so we can scale this process up.?
But it’s even worse. Imagine that after you finish, you also need to categorize types of crops. Or count swimming pools. Or trace roads. Or find lakes. Again, the human brain is deceptively quick to switch tasks, and impossible to efficiently scale. On the other hand, computer vision, in most cases, requires to re-do the whole process starting from scratch with the same images. We might have some common tools, but computer vision is largely a "picture to outputs" pipeline.
> While images are innately easy to understand, computer vision needs entire dedicated pipelines for each output
This is not only intuitively a duplication of efforts, it also wastes time, resources and is more prone to errors.
This is also precisely how embeddings offer a faster, cheaper, less redundant creation of outputs across pipelines.
Image Encoding
Now let’s look at those images like a computer. Images are made of pixels, each pixel is made of three bands (red, green, and blue), and each band is digitally encoded using 8 bits, which allows for numbers from 0 to 255. In our case, each image is 256x256 (width and height) x3 (red, green, blue), that’s 200k numbers, and each pixel is 1 byte, or 8bits (from 0 to 255). So this image comes to? ~200KB (or 200K numbers, where each is anything from 0 to 255). There’s a lot you can do with that many numbers. What if I told you we can reduce any image to ~1KB (a thousand numbers, from 0 to 255), yet retain most of the information? Look how it looks if we force one of our images to be less than 4KB (1KB was impossible).
There really isn't much one can do with that image.?
Yet, the answer to find grass, or pretty much anything on that image, is a new ability to create those extreme summaries of the image using AI. So efficient that they are, in the case of Clay, just 768 numbers. These summaries are called “embeddings”, and they are the new kings of geospatial.
Here's the catch, embeddings are not an image, but a list of numbers. Embeddings literally looks like this:
-0.13493, 0.03277, 0.12231, 0.02537, 0.09245, 0.12471, 0.07522, 0.21141, -0.02121, -0.05623,...
Despite having 0.3% of the size, AI tools can retrieve very similar results than without AI. Here's some examples of tests we've run with Clay: +90% of the same biomass estimate with embeddings than using full images. Or +90% of the land cover maps. Or detect more than 90% of the aquaculture locations.?
But there's even more. It takes 100s, thousands, even 10.000s times less time to recreate these outputs with embeddings than with the full image.
At Clay we released not only an AI model to do this, but to our knowledge both the largest training, and only model that is global, instrument agnostic and fully open: Clay Model v1 . And, we also just released a demo for anyone to click around and do semantic search.
Embeddings are too promising not to learn about them.
One way to think of embeddings is that they highly abstracted semantic summaries of the images, in mathematical terms. Images are pixels, and only the interpretations of those values and patterns define semantics. Embeddings encode directly those abstractions, as numbers. This means they already embed a lot of the computations one otherwise needs to detect things. Embeddings make retrieving and computing semantics extremely fast, partly because they have part of the computation already baked in.
They are also extremely new, especially in geospatial, hence we don't really yet understand them well, or how to work with them, or how to create the best ones. But it's already patently clear that they encode highly abstracted semantics at a very small fraction of the size.
That's why I'm writing this article. To help myself understand better the embeddings, and bring others along to find if and how to use embeddings.
Where do embeddings come from?
Embeddings are not the point of the AI for Earth model, but their utility and modular nature have drawn a lot of attention to them as separate assets. Embeddings are the highly abstracted summaries of the input data, and the narrow neck of the model,? that it uses to perform the task it is asked. These models tend to have a "U" shape with a wide input and output, and a narrow middle point that serves as the choke point to force the model to learn by abstracting least most useful semantics.?
The AI model takes an image of some size (width and height, say 512x512 pixels) with several bands (say red, green, blue). On each band we typically have 8 bits, so a number from 0 (black) to 255 (white). So in total in our case we have 256x256x3 = 200K dimensions. At the end of the encoding process we will have figured out summarizing the entire image into just 768 dimensions. That's quite impressive! A factor .4% of the size, yet contains most of the information. This is even more impressive when we have 13 bands, like on Sentinel-2 satellite images, when then the ratio is 0.01%.
The image gets split into chunks, of size 8x8 in the case of Clay. These chunks of images are the units of the embeddings. We actually create embeddings for each chunk and then make the average of all the chunks to create the final embedding for the whole image. Why the average? There is no hard rule, and we can certainly improve on this aspect. It seems very crude to me to just average them all.
But why averaging them at all? Because a Transformer-based model learns to embed each chunk not only the semantics within, but also in context of all the chunks around it, and their relative position (this is called "self attention", and that idea proved so powerful, that the paper that introduced this is called "Attention is all you need"). This means that embedding of a patch will also include semantics outside of itself. This makes it really powerful for some applications, but also confusing when you only care about what’s within a specific context.
After the embedding, there is usually a decoder that mirrors the encoder to bring back up the reconstruction of the same input image. The difference between that reconstruction and the input is literally the "loss" to minimize. The model looks at what changes help the loss go down, and it slowly updates all the millions of parameters to make this loss as small as possible.
After the task is finished, you can also replace the decoder with another architecture whose output is for example the amount of biomass in the image (regression problem), or the land cover class (segmentation problem), … Because you already have an encoder (or embeddings), these decoders are much lighter, faster and flexible than traditional methods where each output requires building a whole pipeline starting from the input image.
How are Earth Semantics learned?
I believe Earth semantics are learned into embeddings through 3 main mechanisms:
The way the model learns is also affected by other factors, for example how many images we use before allowing the model to update the way it makes the guesses ("backpropagation of model weights") to achieve high scores in our task (with stochastic gradient descent). If we update the model with every image, the learning will be very noisy and bumpy, trying to learn from all errors, even those from very rare cases. If we update the model after averaging the errors of too many examples, we will improve very smoothly, but missing many opportunities to pay attention to more rare but still common examples.
Embeddings, embeddings, embeddings
I've lately often said that bringing the AI to geospatial, or "geoAI" or "AI for Earth" is the end of pixels. Of course we will always use pixels, images are really powerful to tell stories, but with AI comes the power of embeddings, with such obvious advantages that we can’t at least seriously consider them.
But, how can be try to understand embeddings?? It's impossible to imagine a vector of 782 dimensions. So let’s take a bunch of them and see if looking at how groups of embeddings behave we can get some intuition. For one of our tests at Clay we created embeddings for the entire state of California at ~1m resolution, in little tiles of size 256x256 (the ones at the top of this post). That's 20,851,044 tiles. Let's see what we can learn from having ~20M embeddings.
If we plot the length (norm) of each vector we see that the vast majority of them are of the same length, around 3.45.
领英推荐
Quite literally most embeddings have the same length, even if they have 782 dimensions to play with. Let's see what the extremes look like.
These are particularly short embeddings:
Note: Really short ones (norm <0.31 are actually chips with errors where most of the image is black (edges of the source raster). That’s too the source of the little bump around 0.33
There's no common pattern here.
And longest ones:
Long vectors are clearly mostly water. Since water doesn't seem to be particularly hard to describe, it seems safe to assume that the length of the vector is not that relevant, especially when the vast majority of the vectors are the same length. Turns out this is one purpose, since normalizing all vectors to the same length has many advantages in computation. E.g. Computers struggle when they need to divide by very small numbers.
The implication of this is that when working with embeddings, angles between vectors are much more relevant than “straight” distances, like euclidean. These “flat” distances might be useful when working very locally, but since the embedding space has such defined overall shape, doing such metrics at global scales tells you about the topology shape more than it tells you about the semantics. In other words, you don’t measure the distance between Boston and Madrid going through Earth, you measure on the surface of Earth.
We can imagine the embedding space roughly as a hollow sphere (in 768 dimensions, not just 3) with a radius ~3.45 and a populated crust of less than 1/10 of the radius (~0.3). On that crust, somehow, we have all the semantics, like clusters of dots making strange patterns.
On that “crust” all our embeddings are grouped by similarity, and these groups are arranged also by similarity. It seems easy to imagine a cluster of for example only water, and another group of only land, and a stream of dots in between with more and more coast… but one can also imagine islands, or pools, … where would be put them?. I think this is where having 782 dimensions really helps the model to find as many directions as needed to build separate links across semantics. But it’s not an easy intuition. Are these dots all packed? Are they spread across the entire crust? I imagine that distributing the dots across the entire space allows for even more ways to find relationships of similarity (even in 2D there are infinite directions from where to approach a dot). So, for example is the whole sphere surface populated? One way to check this is to reduce the dimensions to two, either with PCA (retain variability of data with least dimensions) or tSNE (keep distances between points while reducing dimensions), or UMAP (similar to tSNE but tries to keep global distributions more intact than tSNE).
The way I read this PCA (and the histogram on the right) is that the first two components PCA 1 and PCA 2, are clearly bounded for most of the space. This seems to align with the hollow sphere hypothesis. We also see the left part of the blob more opaque (blue histogram peak around “-1”, hence more dense, which would also mean that the sphere is more dense in some places, and thus the crust of that hollow sphere more dense in some places. We also have a weak component with high PCA2 ... maybe something besides the sphere that doesn't follow the same pattern, in fact, seems to belong to a different distribution, since PC3 (color) seems to align with it more than with the blob that makes most of PCA1-PCA2.
This t-SNE graph also aligns with the PCA analysis. Because t-SNE works by trying to keep the distances between points, it also tells us that the embedding space is extremely rich in concepts with many small clusters, which sometimes group into clusters of clusters, and with a few very strong compact clusters, which I assume will be water tiles, or ice. The more distributed but still compact clusters are probably urban or similar semantics that are much differentiated than the rest of the land.
UMAP is like t-SNE but gives more importance to retain large scale distributions. A bundle of bounded dots seems to confirm the hollow sphere, and also a separate distribution, with rich internal clusters of semantics. There are also some isolated and very separated semantics, which again I would imagine are blank images, water, snow or similarly virtual clones of empty semantics.
Doing dimensionality reduction seems to be a good way to get an intuitive understanding. Let’s pull the images of for example that stream at the top of the PCA.
We defined a region on the PCA scatter, and then calculated the corresponding bounds on t-SNE and UMAP, and also pulled 9 random example images within. This cluster seems to clearly be water, and it's clearly differentiated on all three dimensionality reduction. Makes sense, water is very distinct. It seems to also make sense that also in all three reductions there is a stream towards the rest of the main pack, these would be coasts, and lakes, and other images with water.
Let’s take another example and pick the two dots on the middle right of the t-SNE:
These are clearly agricultural plots, with roads. There are two dots, which I checked and seemed to correspond to images of agriculture plots with and without roads/paths. We also see that what makes a tight semantic on tSNE corresponds to a wide region on UMAP and PCA. This makes sense. Imagine organizing books by year, or genre, or author, or cover color, … a tight group of books in one classification, might be widely distributed in another.
Let’s look for expected semantics, like urban. I took the image of an urban location, and plotted its 8 closest neighbors.
This semantic seems to be located in the middle of the pack. It might make sense that there are many ways an urban image might change. In fact all 9 examples are uncannily similar, with diagonal roads, big and small buildings, … Remember that there are 782 dimensions on the embedding, so somewhere there you might find trees, or red cars parked, or zebra crossings, …
We can plot the same location, but zoom in on the scatter plot:
Again we see that dimensionality reduction in some makes clustering much less obvious, like here PCA or tSNE, but on UMAP we can clearly see a dense cluster, which might represent this type of urban setting. Because there are 782 dimensions, and on each many possible values, there are lots of ways to articulate similarities across semantics. Dimensionality reduction is a crude forced attempt to make a continuous semantic space fit a much more limited range of options.
But sometimes some semantics reduce much better than others, probably because of the limited variability, like solar panels:?
?Solar panels are clearly isolated on UMAP, but not on PCA or tSNE. I wonder what semantics are close to solar panels on the other reductions. I’m also glad that on UMAP this concept is clearly isolated. Let’s zoom in:
At close range, solar panels are indeed alone on UMAP, but also somewhat differentiated on t-SNE. We can only imagine how all this looks like on the full 782 dimensions, but this is a great example to show that these semantics are computable. We clearly only need one, or few, examples to define a bounded concept (specially in UMAP) that finds solar panels. We could tag this cluster of dots as “solar panels”. We can even easily count how many there are, and therefore find and compute all solar panels in California.
This is the dream we talk about. A method to index abstract concepts. If Google Maps had these tagged embeddings, it would be extremely easy to find solar panels, or any other cluster. We are then only limited to our capacity to find, and label these clusters. Or if we don’t want to label them, just give it a few examples to find similar ones.
Naming the semantic clusters:
So far we've gone from images to embeddings of semantics. We still cannot search for the word "grass" even though we now know that the semantics for "grass" is encoded in the embeddings. In practice, embeddings are written in mathematics, not human language. This is not a problem, since there are several approaches we can take to bridge those embeddings to text. One we've tried that has success in some cases are literally forcing them to align: For each image of Earth we pull the information from a normal map (we use Open Street Map), things like "road here", "house there", "lake here". We then use the embedding of the image, and create an encoder that makes a random encoding of the description of the text. Then we ask the text encoder to learn to tweak the text embedding so that it's the same as the embedding of the image.? Thus, we can go from image to text and vice versa. We can even take an image, make the image embedding, find the closest text embedding that describes it, and then the closest text embeddings that describe images whose descriptions yield closest embeddings. A bit of a roundabout, but essentially a similarity search based on map descriptions.
Computable semantics
How do we find similar images? As we’ve seen above 782 dimensions are many more than we can get an intuition of clusters, let alone relationships between clusters. It is therefore hard to even conceptualize how to operate with them. Is the average embedding of water and desert a beach? It seemed so, and if I check it does, but why? Why is the midpoint of those semantics the expected one? I don’t know. I suspect that in that case it is not a new concept but having both concepts in the image, just like the midpoint of the tree and parking lot is a parking lot with trees. But why is not something else completely random? Is like going from an extremely poor village to a very rich one and expecting to see the suburbs. It seems too good, and unpredictable.
What's even more crazy is that these semantics, operate with highly abstracted concepts. We can retrieve land cover classes, or find floods, or biomass within the image... We are so early in understanding Earth embeddings.
I believe part of the challenge in understanding Earth semantics is that they inherit known properties of other types of embeddings (like text embeddings) but they are also unique in other ways. One of them is what I called polysemy versus semantic colocation:
Polysemy vs semantic colocation
One of the biggest differences between text embeddings and Earth embeddings is how we deal with cases of embeddings of concepts that must contain different meanings.
In text, the word "bank" could mean where you put your money, or the side of a river, or a group of fish. A word with many meanings (polysemy) is really common. This is very common. In our case, the embedding of the word "bank" needs to encode all those meanings. The models we talk about here, transformers, are literally trained on looking at the context of the word ("self-attention"). This means that while the word itself needs to encode different meanings, the context on each case will define which of them applies to each case. This makes the task simpler. My intuition here is that the embedding can locate the different meanings in different locations within the embedding vector, so when the model performs the self-attention, it just zeros out the irrelevant parts. But if you take the word embedding, you will have all those meanings, so if you operate with the embeddings directly without the model, you'll need to deal with many meanings. It's hard to imagine the many dimensions of an embedding, but my intuition is that it's easy to encode them in different axes. E.g. with our embeddings having 3 dimensions, we could dedicate one for each meaning. This makes it easy to encode similarities to all the similar words in all directions. The bottom line is that this is known and there are many ways to deal with this.
But on Earth data, we have a different problem, and I think we have not yet figured out how to solve this. Let's take this example:
All 4 images contain the semantic "house" in different contexts (desert in California, crops in France, soil in Mongolia and water in Maldives). We can split images, in fact the model does, but we will never have a unit that just has the concept "house", which then will carry the core of the concept (with or without different meanings). With Earth data we both have absolute anchors on the actual pixels of the locations, and relative anchors of the information around them. We never have "words" isolated, or tokens. It is always patterns, and their surroundings. From that the model must learn the concept of "house", and "Water", and "crops", ... Semantics of Earth images are more deeply rooted in both pixels and context, than semantics of text where words can live isolated in the abstract, in fact we split text by them (or sub-words, tokens). Moreover, I believe that this colocation has a very small variability. That is, most things tend to be close to few other things. E.g. houses and roads, not houses and corals. This makes learning to reconstruct Earth locations with embeddings easy, but isolating Earth semantics more difficult.
Let's consider the case where we want to find "houses". If we pick one image with a house and see what other images are closest, we might also see one with "desert" on it. If I include all but the mongolian Yurt in the bottom right, we might average out the surroundings of the house, but will also reinforce the idea that houses are only squares, and we'll miss the circular yurt. In essence, I believe it is hard to define precisely semantics in Earth data, and fundamentally different than text.
One approach we follow is to search both with positive and negative examples. If we take an image of a house surrounded by grass as a positive example, and then give it a negative example of an image with only grass, we are much closer to the concept of house, without reinforcing the specific houses in the examples.
> Negative examples in embeddings means to stay as far away as possible from that point, just like a positive example is to stay as close as possible. We must be careful to remember that a negative example is not an example of the opposite concept (if that exists). Embeddings of opposite concepts are not necessarily in opposite locations in the embedding space. E.g. A person might consider water and desert opposites, but in the embedding space they might actually be close to each other. Also worth noting that embeddings cannot encode negative concepts (e.g. "not a house"). Embeddings are abstractions of the data, and therefore cannot encode the specific ways data might be missing.
Because of this, a while ago I tried to increase the quality of our similarity search by doing what I called "semantic pruning". Basically use the few available examples of a semantic to find out which dimensions of the embedding are more important for that semantic, and drop the rest. This, in theory, would make similarity searches on fewer dimensions faster and cleaner. It's quite simple to do that: I took the few examples I had available and I fit a Random Forest classifier (this method basically picks random dimensions and random thresholds to divide the data into ever smaller buckets, and the answer is the random choices that yield the most accurate buckets with the right labels). This method also tells you what dimensions are most important ("feature importance", or what bucket choice splits the data most accurately towards the labels). Since Random Forest is very fast, we can filter out dimensions after every example given, and repeat the process. Long story short, it yielded no improvements in overall speed or quality.
We know Earth embeddings are extremely useful, and we also know we don’t yet know how to work with them well.
There is no road ahead, explorers needed
We have the tools, the data, and the demand, to fundamentally disrupt and improve what and how we know what is happening where on Earth. It can capture very nuisance semantics, extremely fast, cheaply and openly. But the process is still very new, poorly understood, and not robust, and extremely different from the well tested and tried tools we use today. As I hope to explain in this article the promise and first results are just too promising not to explore further.
But there is no road ahead, everything is new, we are building as we go. The things we see are promising, and we also see clear gaps, like working with semantic colocation, or how to tweak the models to get the most out of the embeddings. And we also know there are many unexplored directions.
To me, it’s clear a future where we leverage these tools is a much better one, but it is also clear that it won’t happen easily. For one, there are not many people with skills both on AI and geo to build, or even travel, this path. My main intention here is to put some light on the challenges and opportunities of this geoAI opportunity. But the ecosystem of AI is mostly focused on text applications, and some on generating images. Earth AI is different, and we need different tools.
Moreover, we believe the basis of this technology, and services, should be a public asset, not a for-profit. We do see tremendous potential for commercial and profit services at many points of this stack, and on the applications, but we believe that the fastest path to increase both this field and to generate profits and impact from a market here, is to create a common base to seed the field.
AI for Earth is like clay, able to be so many things, ready to be shaped. That’s why we chose that name for our ngo…
Are you ready to make Clay?
New Space l Entrepreneurship
3 个月Annalisa Riccardi thought of you
Woman courageously doing what it takes. Director of Secretariat | Group on Earth Observations (GEO)
3 个月Nicholas Murray Mark Otterlee and #GlobalEcosystemsAtlas
Founder, CEO at Goal17, Inc
3 个月Bruno, this is a wonderful post - I enjoyed the journey in and out and back into levels that I could or couldn't fully grasp. A few questions coming from a place of profound ignorance, as I am not a practitioner: - are there folks outside of EarthAI and geo that might have useful bodies of work? I'm thinking that there are some, ahem, groups (that maybe you don't want to work with) that (**clears throat**) have dedicated considerable resources to annotating images from above. - as you move from embeddings to semantics, could you put your models in conversations (GANs?) with models that deal primarily with semantics, like LLMs, to speed your work? - can you leverage mechanisms like Captchas to distribute the annotation part of this? - (...and this question is probably particularly stupid)...is there value in tiling at different zoom levels for pattern recognition (ie seeing the forest for the trees) or does that just come out in the wash computationally from the embeddings? Thanks for sharing this - I love the transparency and curiosity.
Host of the Deep Seed podcast ???Regenerative Food Systems | Agroecology | Sustainable Diets | Rewilding & Biodiversity ??
3 个月Really interesting! Thanks a lot for sharing your work
Executive Director at GeoForum Finland | Lecturer at Metropolia UAS | Vice President MIL ry | MBA | M.Eng. | JCI Senator
3 个月Very interesting article. Finnish Geospatial Research Institute (FGI) has been studying similar topics including the use of AI with LiDAR data in for building and tree extraction. https://www.youtube.com/watch?v=aJryoFPwiTQ Best accuracies they achieved for AI-based change detection, which can already limit the processing area even with manual processing.