Text Analytics and Location Intelligence in ArcGIS - The New Normal
1. Introduction to AI Machine Learning, and Deep Learning in ArcGIS
Artificial Intelligence is the bigger idea of achieving the human level of intelligence, Machine Learning is a subset of Artificial Intelligence that deals with big data –deriving rules and extracting patterns; Deep Learning is a kind of Machine Learning that uses a specific technique called deep neural networks and used with highly unstructured data or high dimensional data like images, voice, and text recognition.
There is a wide range of products within ArcGIS and Artificial Intelligence spans multiple products across ArcGIS products. Some of the ArcGIS products that support Artificial Intelligence and Machine Learning capabilities are ArcGIS API for Python, ArcGIS Analytics for lot (in R&D), ArcGIS Notebooks, ArcGIS Pro, ArcGIS Online, ArcGIS Enterprise, ArcGIS Hub, ArcGIS QuickCapture, ArcGIS Pro for Intelligence. In addition to these platforms, ArcGIS also supports integration with Machine Learning and Deep Learning frameworks such as AWS, TensorFlow, IBM Watson, mxnet, scikit learn and others using python API.
Machine Learning is used for making predictions, clustering data, or detect objects and extracting features, or making classifications for imagery.
2. What is unstructured Data?
Basically, data that does not have a recognized data structure meaning it doesn’t have a specific data model and cannot be easily identified, for example, text, videos, images, voice, and so on. Most unstructured data is text. Text can be in a variety of formats and storage mechanisms like word documents, emails, social media posts, PowerPoint, pdf, share drive, and more. It is interesting that 80% of this data is unstructured. There are tons and tons of unstructured data being produced every day. The big questions are how much spatial information are we missing out on? Can we capture this information in ArcGIS? If yes, how?
3. How to integrate Unstructured Data in ArcGIS?
ArcGIS has new Natural Language Processing (NLP) capabilities that can help you extract insights from unstructured text. NLP is not a stand-alone application—it works in integration with various other applications. It has many sub-fields within it, including Entity Extraction, Topic Modeling, Clustering and Retrieval, and Text Summarization. Sometimes, reading the unstructured data could be confusing to NLP, for example, if someone writes about a “bank”, it can NLP could be confused with a “river bank”, or “a financial institution”. If it were for humans, they could understand this context relying on the contextual and background information and the way people talk but this may not be the case for the NLP system.
Therefore, it is important to understand the type of unstructured data that we are dealing with and the different ways that we can integrate that into ArcGIS.
4. ArcGIS LocateXT
There are two main ways to integrate the unstructured data into ArcGIS: 1) using Native ESRI and 2) using the NLP / Machine Learning driven way.
4.1. Using Native ESRI
The native Esri capability answers three main questions: 1) What are you looking for? 2) What is the best tool? 3) How is it the best tool?
You may be looking for coordinates, or custom locations, or user-defined keywords. Depending on what you are looking for ArcGIS has these tools (ArcGIS Pro w/ LocateXT, ArcGIS Pro for Intelligence, ArcGIS Enterprise w/ LocateXT) that come as an extension to ArcGIS Pro. These tools are best used if the data is somewhat understandable, identifiable, and has repeatable patterns. This doesn’t require programming experience. These extensions use custom location lists to match/extract other patterns (place names, codes, and other terms) and use pattern-matching regular expressions (REGEX) to search for coordinates in a variety of formats. There are two geoprocessing tools in ArcGIS Pro “Extract Locations from Document” and “Extract Locations from Text” which allow you to simply drag and drop documents and create a new feature class or append it to an existing one. ArcGIS Pro also offers the ability to create custom attributes based on the contents or keywords within the document, where users can tag locations, scrape/harvest portions of documents based on keywords, extract numbers of characters/words or lines/blank lines, etc. It also gives the ability to extract addresses from documents based on the combination of state and zip code.
4.2. Using NLP / Machine Learning driven way
When your data is totally unstructured for example, tweets, text, emails, etc. and we are interested in extracting the entities like events, dates, and people automatically or we are interested in defining relationships, then we use Natural Language Processing (NLP). In nutshell, NLP is used when the data is not well understood, when data doesn’t contain identifiable and/or repeating patterns, and when integration is needed.
5. Entity Recognition with ArcGIS API for Python
Entity recognition is identifying entities (for example diseases, criminal actions, fire events, accidents, a person of interest, or anything) from the unstructured text of our interest. We can train our entity recognizer to extract the information for the unstructured text. ArcGIS API for Python is a great platform to start with to get handy with accessing, analyzing, and visualizing geospatial data. The ArcGIS API for Python is a powerful, modern, and easy-to-use Pythonic library to perform GIS visualization and analysis, spatial data management, and GIS system administration tasks that can run both in an interactive fashion, as well as using scripts. It enables power users, system administrators, and developers to leverage the rich SciPy ecosystem for automating their workflows and performing repetitive tasks using scripts. It integrates well with the Jupyter Notebook and enables academics, data scientists, GIS analysts, and visualization enthusiasts to share geo-enriched literate programs and reproducible research with others. This guide describes how to use the ArcGIS API for Python to write Python scripts, incorporating capabilities such as mapping, query, analysis, geocoding, routing, portal administration, and more. A great place to start developing once you've installed the API is to browse the sample notebooks.
How does it work?
We have unstructured data (thousands of reports in word format) and a trained EntityRecognizer model. This model is passed through ArcGIS API for Python learn module. It transforms the unstructured data into a structured data format. In addition, it also extracts the location information. This location information is converted into a feature class with addresses that are overlaid into maps using ArcGIS API for Python geocoding model. This map layer now can be used by other users for further analyses.
Steps to train an EntityRecognizer
- Step 1: Labelling the data
- Step 2: Training the model using the labeled training data
- Step 3: Inferencing on unseen data (extracting entities from unseen data).
NLP in addition to spatial analysis is crucial in carrying the spatial analysis from a location perspective. The NLP capability can be integrated with various other platforms such as Python APIs/SDKs and/or communicate over REST. NLP also integrates seamlessly with ArcPy. Or we may choose to create Python toolboxes / Script Tools as well. There is a plethora of information in ArcGIS.Learn to learn more about this.
More resources from ESRI
Here are a few resources from ESRI to get more information on the following topics:
ArcGIS API for Python: Learn how to pull geospatial information from unstructured text with ArcGIS API for Python
ArcGIS.Learn EntityRecognizer: Have a look at the documentation for ArcGIS.Learn EntityRecognizer
GeoAI LinkedIn Group: The new LinkedIn group that provides one place for discussions, resources, and news related to AI and Location Intelligence