Revolutionizing Data Management with AI
A Comprehensive Workflow for Enriching and Updating Listings Using Azure AI
I must have read and watched twenty-five articles detailing various scenarios for implementing Open AI Assistants. They ALL described classroom use cases too simple to be of use to me or performing tasks I had no interest in performing. Also, virtually every one assumed user interaction in the workflow. I had a different plan in mind, an Open AI Assistants supporting automated data workflows. Not a user in sight.
This article discusses a real scenario my team deployed in a recent B2B Listing project. The goal was leveraging existing Azure components to enrich and maintain Listings for a B2B service. By automating the enrichment and consistency checks, the code would exponentially improve data accuracy, completeness and freshness, adding both quantitative and qualitative value, to our users and subscribers.
In the rapidly evolving landscape of AI integrated data management, using advanced AI tools to automate and enhance data flow is essential. Essential for quality and essential to maintain a competitive edge. The workflow in this article details a robust workflow accessing Azure AI Search, Storage Blob Containers, Azure Bing Entity searches. The process is coordinated by a workflow manager working with a highly customized Azure OpenAI Assistant. The team enriched and updated Listings automatically, ensuring that expensive resources, like Azure AI Search reindex events and human data analysis were performed only when value-add outweighed expense. The Assistant Workflow manager achieved (exceeded) our objectives. Look for the metrics in a future article. Data management costs were contained while data quality soared.
Step 1: Identify Candidates for Enrichment (Azure Search AI)
Azure AI Search (formerly Cognitive Search) queries were performed to find Listings with data holes (missing fields), inconsistencies (latitude/longitude was in Asia, address said San Diego), and those marked for enrichment by the customer owning the Listing.
Step 2: Load Listing Data (Azure Blob Storage)
The Assistant Workflow loads Listing data from individual Listing blobs stored in Azure Storage Containers. These containers serve as the data source for Azure AI Search. Cheap storage and lightning-fast Listing retrieval.
Step 3: Entity Lookup (Bing Entity Search)
Once the data is loaded, the Azure Bing Entity Search API is used to look up entities stored in the Azure AI Search engine.
The listing is interrogated by code to extract indicative information that is precise enough to use as parameters for a Bing Entity Search. This could be the listing name and location, its Lat/Lon, CEO or other indicative data, depending on what is already available in the Listing.
Lesson Learned: Our initial code used the AI Assistant to identify indicative data. It quickly became obvious this was an unnecessary expense and time wastage. Simple C# functions substituted for the former AI prompt/response cycle. The phenomena of "unusing AI" happened more than once during this project.
Step 4: Enrichment Opportunities (Azure OpenAI Assistant)
The Azure OpenAI Assistant analyzes the listing AND the retrieved Bing entities to identify areas Listings will benefit from enrichment. The assistant responds with function calls to enrich the listings with additional data.
Lesson Learned: Early iterations of the assistant identified a superset of data open for enrichment. Most of the fields and values marked for updates were unused by the end-user. To limit complexity and reduce token usage (expense) we put governors in place with specific language limiting items of interest to the AI enrichment engine.
Step 5. Simple Enrichment (Azure Fn Workflow Manager)
A code layer runs the 1-to-n enrichment functions returned by the AI Assistant to update the Listing.
领英推荐
Lesson Learned: The data contours of our B2B product rely on Azure API/Azure Functions for all CRUD/access operations. Initial design performed interim enrichments via REST calls. Expensive and unnecessary. Multiple updates made multiple writes to the same blob, adding storage expense and burning bandwidth. Refactored modifications relied on a transitory, in-memory Listing, stored only when the AI Assistant certified it as enriched and consistent. This got us to a single update via a simple REST call, reducing bandwidth and storage costs. If the process was interrupted, we just destroy any in-flight changes and perform the enrichment during the next enrichment cycle.
Step 6: Ensuring Consistency (Azure OpenAI Assistant)
After enrichment, the updated Listings are passed back to the AI Assistant for a consistency. check. The Assistant evaluates enriched data, returning function calls to modify the Listing if any static data (items not addressed by the enrichment process) and enriched data have fallen out of sync.
Lesson Anticipated: This is one of those areas we got right from the start. We knew we could write code to check for consistency without involving the expense or wait time associated with an AI Assistant pass. We were less sure that we could anticipate future enrichments and the downstream data inconsistencies updates to enrich-read fields might produce. AI was selected to discover discrepancies, wherever they might reside in the JSON data. We were proven right multiple times as enrichment targets changed, and the pool of static data related to enriched fields grew.
Step 7: Modifying Inconsistent Listings (Azure Fn Workflow Manager)
If inconsistencies are found, the Assistant returns function calls necessary to modify the listings. Modifications are executed to ensure data consistency. Discrepancy mediation is performed by the Assistant workflow management code, through simple in-memory data functions.
Step 8: Updating Listing Core Data (Azure Blob Storage)
Once the enrichment and consistency checks are complete, if the Listing was modified, it is saved to its Azure Storage container. Versioning tracks changes and gives us rollback capacity. This is vital in case the enrichment process fails or provides false enrichments (devalues data).
Lesson Anticipated: If you are updating important data via AI integrations, data version control is essential. AIs are still in their infancy. Hallucinations, over-zealous responses, and GIGO (garbage in, garbage out) should be expected and accounted for. Always give yourself a way back to the before time prior to AI marking up your data.
Step 8: Updating Indexes (Azure AI Search)
After the data is updated in place (Azure Blobs), the Assistant is asked a final question: "Do these changes rise to the level of an Azure AI Search index rebuild?" The assistant uses its own knowledge of Azure Cognitive Search (now AI Search) indexing to determine the result. If the differences between the original and enriched listings warrant an index refresh, an incremental index update is initiated by the workflow manager.
Lesson Learned: We held two or three brainstorming sessions to design refresh event criteria. The meetings produced rulesets either woefully incomplete or far too token intensive (expensive). Eventually we decided that gpt-4o knows a whole lot more about Azure AI Search indexing than we do, and let it arbitrate without instruction (other than to describe the goal). So far, no complaints. Relevant search changes like latitude/longitude enrichments on Geo distance indexes trigger a refresh, while less impactful mods such as changing a Listing Description fall below the threshold.
The Flow
So, deep breath, here is how it all works: An Azure AI Search identifies candidates for enrichment which are loaded from individual, versioned Listing blobs. The Listing generates a Bing Entity Search for better or updated data. The AI Assistant manages the Listing update and consistency checks against Bing's most recent Entity data. The updated listing is saved to its blob container and, if necessary, the AI Search index is refreshed with the new data. Easy peasy.
Conclusion
This workflow demonstrates how Azure AI services can be integrated to create a powerful automated systems for enriching and maintaining data. By automating enrichment and consistency checks, data managers ensure their information remains accurate and valuable, ultimately driving better business outcomes. Embracing such AI-driven processes is a crucial step towards a more efficient and intelligent data management strategy.
Technologies: Azure Functions, Azure API Management, Azure Storage (blobs), Azure AI Search (formerly Cognitive Search), Azure OpenAI, Azure OpenAI Assistant, Azure Maps
This is one is a series of articles referencing infrastructure and best practices relating to AI. The goal is not to solve every issue, though solutions are proposed, so much as to acknowledge the relative youth of the discipline and the need increasing rigor in our standards, practices, and models as the technology matures.