登录查看更多内容

TotalAgility 8 - Maps for Seamless Third-Party Data Extraction

Vijay Chaudhary

Lead Software Engineer

发布日期: 2024年3月24日

Total Agility (KTA) 8.0 introduces sample process maps for third-party data extraction, enabling users to leverage Azure and Google Document AI services for common document types. By integrating with these services, KTA enhances data processing capabilities, allowing for the direct updating of document fields and supporting complex JSON responses – basically one can call Azure and Google document AI services for common document types (e.g Driver’s license or receipts) which these sample processes support. Idea is to use cloud providers doc AI models to extract data and use Tungsten document processing capability to streamline document workflows and reduce reliance on Transformation sever (which is Kofax internal tool for image data extraction).

High level you would achieve below,

In last couple of articles, we saw how to build our own custom document model and use Azure Document AI to set up extraction for organization specific document extraction needs. This one we will use most of those experiences and see how to call DocAI idDocument model and then parse the response in Total Agility eco system. ?Here are the high level steps on how to achieve this,

Step 1: Configure web services for TA-AzureDocAI integration

Calling the AzureAI with image document binary as input - MyAzureDocAI
Analysing response once processing is done - AzureFetchOperationResponse

Look at this pervious article to see more details on the endpoints URL and calling and analyzing mechanism from Postman application – Click here.

Step 2: Workflow to make Azure Calls

Build a business rule or process to make the Azure calls (two calls) and get the response. Image/document should be an input variable.

Get Image File and Convert to Base64 - First two steps in this example map is to get the base64 binary string for the input image.
Stage for Azure DocAI processing - Next is to call (POST method) the AzureDocAI “idDocument” model, in the request body you can pass the image binary(base64imagedata) as shown below. Also note that operation id should be obtained from the response header for the next API call.

Wait - for few seconds (better you can write a logic to check the status and if is running then wait and retry in few seconds)
Seek response from Azure - In the last step you basically read the data back from Azure using operationID and AnazlyzeResult method (GET method)

Step 3: Consuming Azure DocAI response

Once JSON response from AzureAI is obtained it can be parsed using various techniques, you can write a code to parse it and then associate it with the Kofax internal capture fields using CaptureDocumentService methods or you can build your own flow to parse the response. Tungsten has done all that heavy lifting for you in new 8.0 version by providing a sample process map and shown how to build this in perfectly codeless manner. In this article we will use Kofax provided process map.?

Call Azure DocAI - First step is to call the business rule/process developed in Step 2, basically you pass the input image of identity document and get the response from Azure.

Note – Manually added in sample process map provided by Kofax.

领英推荐

Back to Square Ones & Zeroes

LatentView Analytics 4 周前

Chapter 4: Data Readiness - The Make-or-Break Factor…

Yap Laurence 6 个月前

Big Data Rules for AI: How to Build a Foundation That…

Aditya Katira 1 个月前

Extraction - Raw JSON obtained from the Azure is assigned to the compatible data model object (internal to Kofax) in extraction activity.
Transform extraction results – These are series of nested subprocesses here to achieve two things: -

[1] Associate Azure returned doc field values to Kofax internal capture fields.

[2] Associate bounding boxes to capture fields to highlight data on image

Clear down raw data – Is to delete the variables values, as JSON obtained from Azure contains high number of lines.
Validate – Human in the loop to verify if data is appropriately extracted from the image.

With above sample process maps Tungsten has literally saved weeks for development effort, the way they have done (in codeless manner) is highly understandable for Citizen developers, highly appreciated.

?Step 4: Here are testing steps after above changes are made.

Upload the sample document to the sample process (updated with above AzureAI API call as first step) using any of the available ingestion channel.
Once job created notice the flow of the job?

Perform Validation steps once the job get there (notice that date of birth is highlighted, this is an important feature)

Note - Data coming from Azure but all the processing steps are orchestrated in TotalAgility.

Step 5: Analyze internal variable like API Response Status, JSON response, base64 string.

Azure JSON Response from FinalResponse variable available has fields and data returned.

Summary

Third-party data extraction can play a pivotal role in modern document processing workflows, offering a gateway to external OCR services. TotalAgility extends its capabilities by seamlessly integrating with these external services, enabling the extraction of data to update document fields, thus facilitating the document's smooth progression through various process flows. This functionality not only allows for the direct updating of Capture fields but also supports the processing of more intricate JSON responses. Additionally, it provides the ability to access REST headers and handle diverse responses, allowing users to set document fields using data extraction results from external sources, reducing reliance on the Transformation Server. Sample maps illustrate how document fields can be updated with values extracted from responses obtained from leading third-party services such as Google Cloud Document AI and Azure AI Document Intelligence.

If this article has been helpful, I would appreciate your feedback and comments. Additionally, if there are any other Azure Cognitive Services or AI related topics that you would like me to cover and share information on, please do let me know.

Reference - Tungsten Total Agility official product documentation. (Available online)

AI-ML & Automations

1,575 位关注者

Richa Sardana

6 个月

Thanks for sharing this article, would you also be able to share as an what would you pass as header in web service reference for AzureDocAI

Deepti Nirwal

Software Developer @Symcor | Streamlining Business Processes

12 个月

This is an excellent article Vijay!! Thanks for sharing.

1 次回应

Kristoffer J.

Chief Product Owner Intelligent Automation|| Document Automation

12 个月

This is great Vijay! Can't wait to start building in 8.

1 次回应

查看更多评论

要查看或添加评论，请登录

Vijay Chaudhary的更多文章

Understanding RAG Evaluation: A Practical Approach to Retrieval Metrics

2025年3月16日

Understanding RAG Evaluation: A Practical Approach to Retrieval Metrics

Retrieval-Augmented Generation (RAG) systems are gaining popularity, helping users find relevant documents to answer…

1 条评论
Splitting Text Right Way - NLTK, SpaCy or Markdown

2025年3月2日

Splitting Text Right Way - NLTK, SpaCy or Markdown

For natural language processing (NLP) working with large pieces of text can be challenging. Many language models have…

1 条评论
Unlocking Entities and Relations: Creating Knowledge Graphs with AI

2025年2月16日

Unlocking Entities and Relations: Creating Knowledge Graphs with AI

GraphRAG is something which is picking up recently, in this article we will try to get to the basics of GraphRag…
Structured Outputs from LLMs: LangChain Output Parsers

2025年2月9日

Structured Outputs from LLMs: LangChain Output Parsers

LLMs are good at generating human-like text (hence called Generative AI), but when it comes to integrating to…
Handling Sensitive Data: Redaction, Masking and Compliance

2025年2月2日

Handling Sensitive Data: Redaction, Masking and Compliance

In today's data-driven world, digital documents containing sensitive information pose challenges to privacy and…
Optimizing AI Workflows with LangChain - A Practical Introduction

2025年1月25日

Optimizing AI Workflows with LangChain - A Practical Introduction

LangChain is a framework for developing applications powered by large language models (LLMs). It helps in simplifying…
Effortlessly Organize Mixed Documents with GCP's Custom Splitter Feature

2025年1月19日

Effortlessly Organize Mixed Documents with GCP's Custom Splitter Feature

In real-world scenarios, it's common to encounter multiple documents combined into a single, multi-page image or PDF…
Improving AI Contextual Understanding -Retrieval Augmented Generation (RAG)

2025年1月4日

Improving AI Contextual Understanding -Retrieval Augmented Generation (RAG)

Retrieval-Augmented Generation (RAG) is a technique in natural language processing that uses knowledgebase information…

2 条评论
Understanding Custom Classifiers in Google Document AI

2024年12月29日

Understanding Custom Classifiers in Google Document AI

There are three categories of models or services in GCP Document AI – General Document processors (Layout, Form and Doc…
Processing with GCP Document AI: Exploring Pretrained Parsers

2024年12月15日

Processing with GCP Document AI: Exploring Pretrained Parsers

GCP Document AI offers multiple products to process documents for information for different use cases. Below…

2 条评论

See all articles

TotalAgility 8 - Maps for Seamless Third-Party Data Extraction

Vijay Chaudhary

Lead Software Engineer

Step 1: Configure web services for TA-AzureDocAI integration

Step 2: Workflow to make Azure Calls

Step 3: Consuming Azure DocAI response

领英推荐

?Step 4: Here are testing steps after above changes are made.

Step 5: Analyze internal variable like API Response Status, JSON response, base64 string.

Summary

AI-ML & Automations

1,575 位关注者

Vijay Chaudhary的更多文章

社区洞察

其他会员也浏览了

Unlocking Next-Gen Efficiency: AI/ML Automation for Future-Ready Data Migration

Data quality management in the age of AI

The Top 7 Problems With Data Quality

Coding the future: Why data science and analytics are important in the IT industry in this modern age

Data Technology Growth in the new age

Revolutionizing Data Processing: How DSPyGen and Control Flow DSL Are Set to Save Days and Millions

June 14, 2024

Exploring the Best Auto Labeling Methods with Microsoft Purview

Learning from data democratization pioneers

From Chaos to Clarity: 4 Ways AI/ML ensures data quality

Step 1: Configure web services for TA-AzureDocAI integration

Step 2: Workflow to make Azure Calls

Step 3: Consuming Azure DocAI response

领英推荐

?Step 4: Here are testing steps after above changes are made.

Step 5: Analyze internal variable like API Response Status, JSON response, base64 string.

Summary

AI-ML & Automations

1,575 位关注者

Vijay Chaudhary的更多文章

Understanding RAG Evaluation: A Practical Approach to Retrieval Metrics

Splitting Text Right Way - NLTK, SpaCy or Markdown

Unlocking Entities and Relations: Creating Knowledge Graphs with AI

Structured Outputs from LLMs: LangChain Output Parsers

Handling Sensitive Data: Redaction, Masking and Compliance

Optimizing AI Workflows with LangChain - A Practical Introduction

Effortlessly Organize Mixed Documents with GCP's Custom Splitter Feature

Improving AI Contextual Understanding -Retrieval Augmented Generation (RAG)

Understanding Custom Classifiers in Google Document AI

Processing with GCP Document AI: Exploring Pretrained Parsers

社区洞察

其他会员也浏览了

Unlocking Next-Gen Efficiency: AI/ML Automation for Future-Ready Data Migration

Data quality management in the age of AI

The Top 7 Problems With Data Quality

Coding the future: Why data science and analytics are important in the IT industry in this modern age

Data Technology Growth in the new age

Revolutionizing Data Processing: How DSPyGen and Control Flow DSL Are Set to Save Days and Millions

June 14, 2024

Exploring the Best Auto Labeling Methods with Microsoft Purview

Learning from data democratization pioneers

From Chaos to Clarity: 4 Ways AI/ML ensures data quality