ç™»å½•æŸ¥çœ‹æ›´å¤šå†…å®¹

Document Processing - The White Hart of Straight Through Processing

Ahmed Zaidi

å‘å¸ƒæ—¥æœŸ: 2022å¹´9æœˆ8æ—¥

I often have conversations with customers who have started Intelligent Document Processing projects as a part of their RPA practice, and a common thread that has emerged is a general discontentment with the Straight Through Processing rate of transaction processing.?Just to be clear, straight through processing refers to transactions that do not require a human touch.?I consistently hear from customers that we are seeing too many documents end up for human validation.?

Before I address the underlying issues and possible resolutions, lets outline the various components of a generic automated document processing use case.

The STP Challenge

A few factors determine whether a document can be processed without any human intervention. I outline these items below and some detail on common challenges

1.??????OCR Quality:?OCR engines convert the text on a scanned document to machine readable text.?The quality of the incoming document determines the quality of this conversion.?Distortions within scanning, stains and blotches on the paper, handwriting on printed text, low quality scans all affect how an OCR engine â€œreadsâ€ the text.?Just like a human, OCR engines can make mistakes and identify the letter â€œoâ€ as a zero or a letter â€œBâ€ as the number 8.?

2.??????Document Classification: For a document to be processed the system needs to understand what type of document is being processed so that the appropriate data extraction model can be used.?This is done by either evaluating the text or the images on the document to determine the type of document.?This can be done either through traditional programmatic methods or by using a Machine Learning model that is "trained" to classify/identify a document.?Multi page documents are always tougher to detect than single page documents.?It is almost always advisable to segregate the documents at the intake stage to classify them.?For example, all invoices should be received on a separate mailbox than purchase orders.

3.??????Document Data Extraction: ?Almost all intelligent document processing platforms depend upon a machine learning model that is â€œtrainedâ€ on documents to extract data from documents.?Training involves a human to identify data elements on a document.?The model then uses this data to extract data from a document.?The greater the variety of documents the larger the training data set needs to be.?Once the model is trained when a document is presented to the model, it returns the extracted data along with an additional parameter referred to as a confidence score.?This confidence score is really a â€œfamiliarityâ€ score for the extracted data.?This is the modelâ€™s prediction of its own confidence in the extraction of the data.?A confidence threshold is set and any values falling below the threshold require the data extraction to be verified by a human.?Note:?For document to be processed straight through without human intervention ALL fields on the document must be extracted with a confidence score above the threshold.?This can be very challenging if a lot of fields are being extracted.?Even a single field with a low confidence score will result in a human needing to look at the document. One way to handle low confidence score extraction is to compare extracted data to an independent source of data if available.?E.g., compare the PO number and the vendorâ€™s name on an invoice to a PO number in the accounting system and if they match, the low confidence score can be ignored. As humans validate more documents, the data captured from this validation is then used to "re-train" the model and accuracy of extraction increases over time. The time is dictated by variations in documents and the number of documents being processed through the solution.

4.??????Post Processing Errors: Generally, these errors are related to data consistency between extracted data and the system of record, e.g., Customer Name not found, or product description not matching.?These are usually resolved through data cleansing and mapping tables.

As you can see there are quite a few variables that need to line up for straight through processing to occur.?

é¢†è‹±æŽ¨è

M-Files & AI: The Simple Way to Manage Documents Better

Ashish Dwivedi 1 ä¸ªæœˆå‰

The Evolution of Order Processing: From Manual Methods to Cutting-Edge AI

The Evolution of Order Processing: From Manual Methodsâ€¦

VAO 8 ä¸ªæœˆå‰

Why OCR APIs Remain in High Demand in the Modern Digital?World?

Why OCR APIs Remain in High Demand in the Modernâ€¦

API4AI 2 å‘¨å‰

Why the Fuss?

Despite all the challenges i list above, almost all our document processing engagements share a single common metric:?

The human processing time per transaction is almost always lower than the previously manual process, averaging between 50-80%.

So, in essence, the business staff was spending less time processing these documents than they were previously. So why the fuss??In my opinion it has a lot to do with expectations from these solutions.?When an IDP Platform is pitched with 75-90% accuracy statistic, sales and technical engineers often fail to set the context, especially of the straight through processing percentage.?Take a scenario of a document with 10 fields.?Every single document processed in this scenario could have 9 out of 10 fields above the threshold (90% accuracy) but 1 required field that is consistently below the threshold would result in 0% straight through processing.?Customers often equate IDP accuracy to percentage of documents not requiring human validation.??

Business leaders hear â€œstraight through processingâ€ whenever â€œautomationâ€ of the process is promised.?This leads to business planning that reduces resources from document processing and assigns them to other tasks.?When the business sees a lot of human touch to these documents it impacts this resource planning and results in general displeasure of the implementation. We have seen significant technical efforts being made to increase this STP percentage with mixed results.?The reality is that this 50-80% efficiency would not be possible without the implementation of an intelligent document processing platform.

Technical implementation Teams must advise the business of the Intelligent Document Processing paradigm and set the expectations that there may be a significant percentage of documents that may require human validation.

Know Your Why

Identification of the primary value proposition for the automation needs to be front and center when designing the automation. If the main goal of the automation is to reduce the end to end processing time (Average Handle Time) of a transaction, then straight through processing is much more relevant and the solution needs to incorporate this consideration in the design. This will require thorough testing and evaluation of document types and formats. This may also require a rethink of the business process itself.

However, if the focus is on overall reduction of manual labor and human time spent on the transaction then the solution may be designed differently. A significant reduction in human time spent on the transaction can be reduced by implementing data entry and validation rules in the automation logic after the data has been extracted. We find that 50-70% of human time is spent on data entry, validation and other mechanical tasks such as uploading/downloading documents, emailing status of the transaction etc. So abstracting the data validation from the other tasks that can be performed by a bot can result in significant time savings. Over time, this efficiency will increase due to retraining of the ML Models making the classification and data extraction more accurate.

Jim Neidhardt

Business Growth Guide, Architect of CEO Peer Groups, Connector of SMB growth-minded Business Owners, Presidents, and CEOs

11 ä¸ªæœˆ

Ahmed, thanks for sharing!

èµž

å›žå¤

Ralph Aboujaoude Diaz

Global Head - Product and Operations Cybersecurity

2 å¹´

Very insightful Ahmed Zaidi! Thanks for sharing

èµž

å›žå¤

Rajeev M A

Enterprise Architect at Tata Consultancy Services Focused on Artificial Intelligence

2 å¹´

Are you extracting information from a set of standard document templates or from free form documents is an important question. Understanding the layout of free from documents is a much harder problem for current ML.

èµž

å›žå¤

Ranjeet Deshpande

Intelligent Automation and Generative Solutions - Communication, Media and Tech at Accenture

2 å¹´

Very informative. Thanks Ahmed Zaidi

èµž

å›žå¤

Vino Livan Nadar

3 x UiPath MVP | New York Chapter Lead | RPA Specialist | AI Enthusiast | Intelligent Automation Lead

2 å¹´

Detailed practical insights of one of the overlooked aspects of IDP on ST processing. Indeed this context of the ST processing must be set while considering an IDP solution along with the ease and interactive level of the HiTL layer for doing the data labelling tasks.

èµž

å›žå¤

1 æ¬¡å›žåº”

æŸ¥çœ‹æ›´å¤šè¯„è®º

è¦æŸ¥çœ‹æˆ–æ·»åŠ è¯„è®ºï¼Œè¯·ç™»å½•

Ahmed Zaidiçš„æ›´å¤šæ–‡ç«

Are you testing like its 1999?

2024å¹´9æœˆ18æ—¥

Are you testing like its 1999?

So, hereâ€™s is a general QA process overview that hasn't changed much over the years. (note: I'll stop at the creationâ€¦

4 æ¡è¯„è®º
The New Era of Document Processing

2023å¹´7æœˆ18æ—¥

The New Era of Document Processing

We have come a long way since OCR! The latest advancement in Intelligent Document Processing is using Generative AIâ€¦

7 æ¡è¯„è®º
Invoice Processing Sure.. RPA + OCR.... Not so fast!

2019å¹´10æœˆ23æ—¥

Invoice Processing Sure.. RPA + OCR.... Not so fast!

AP Process Automation â€“ Think its just a RPA and OCR Problem â€“ Think again Invoice Processing is a very common use caseâ€¦

6 æ¡è¯„è®º
Post the RPA Pilot - An Operating Model for Scale (Psst..You have help)

2018å¹´3æœˆ1æ—¥

Post the RPA Pilot - An Operating Model for Scale (Psst..You have help)

In my last 2 posts I talked about setting up the RPA CoE and how that can be a very daunting list of activities. Butâ€¦

1 æ¡è¯„è®º
What it takes to Setup a CoE

2018å¹´2æœˆ22æ—¥

What it takes to Setup a CoE

In my last post I talked about why CoE and where. Today I want to talk about what it takes to set up the CoE.
"The" Center of Excellence (CoE)

2018å¹´2æœˆ12æ—¥

"The" Center of Excellence (CoE)

Every Enterprise conversation I have regarding RPA includes a question re the coveted â€œCOEâ€. Discussions are broad andâ€¦

2 æ¡è¯„è®º
Our best RPA engineer at Offshore almost quit! Here's why.

2018å¹´2æœˆ7æ—¥

Our best RPA engineer at Offshore almost quit! Here's why.

One of the first things we did when we started Accelirate was setup an offshore delivery center and hire Perdie, theâ€¦

4 æ¡è¯„è®º
You donâ€™t need to be a Developer to do this! â€“ The RPA Edition

2018å¹´1æœˆ31æ—¥

You donâ€™t need to be a Developer to do this! â€“ The RPA Edition

In the heyday of Y2K testing craze, Test Automation tools were all any one could talk about. I recall vendors sellingâ€¦

8 æ¡è¯„è®º
RPA and Intelligent Automation for Optical Character Recognition based Business Processes

2018å¹´1æœˆ24æ—¥

RPA and Intelligent Automation for Optical Character Recognition based Business Processes

Many organizations today have well established OCR Processes where accuracy rates are pre-established based on the OCRâ€¦

1 æ¡è¯„è®º
Applied AI and Machine Learning - Address Parsing - Old problem, new solution

2017å¹´6æœˆ28æ—¥

Applied AI and Machine Learning - Address Parsing - Old problem, new solution

Machine learning and AI are very rapidly moving from the realm of research to business and consumer applications. Itâ€¦

4 æ¡è¯„è®º

See all articles

Document Processing - The White Hart of Straight Through Processing

Ahmed Zaidi

The STP Challenge

é¢†è‹±æŽ¨è

Why the Fuss?

Ahmed Zaidiçš„æ›´å¤šæ–‡ç«

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

Unlocking Document Magic: How to Leverage OCR for Deeper Understanding

OCR API vs Traditional OCR Tools: Which One?Wins?

The Document Processing Revolution!

Leveraging Generative AI & RPA to Enhance Intelligent Document Processing â€“ IDP

CloudOCR Solves the Document Processing Headache By Combining AI With Human Expertise

How AI is Transforming Document Automation

From Manual to Machine: The Future of Document Automation in the Digital Age

Supercharging Workflow Automation: Combining OCR and Agentic AI

Intelligent Document Processing - what to consider before you start

How to Automate Receipt Data Extraction with OCR and AI

The STP Challenge

é¢†è‹±æŽ¨è

Why the Fuss?

Ahmed Zaidiçš„æ›´å¤šæ–‡ç«

Are you testing like its 1999?

The New Era of Document Processing

Invoice Processing Sure.. RPA + OCR.... Not so fast!

Post the RPA Pilot - An Operating Model for Scale (Psst..You have help)

What it takes to Setup a CoE

"The" Center of Excellence (CoE)

Our best RPA engineer at Offshore almost quit! Here's why.

You donâ€™t need to be a Developer to do this! â€“ The RPA Edition

RPA and Intelligent Automation for Optical Character Recognition based Business Processes

Applied AI and Machine Learning - Address Parsing - Old problem, new solution

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

Unlocking Document Magic: How to Leverage OCR for Deeper Understanding

OCR API vs Traditional OCR Tools: Which One?Wins?

The Document Processing Revolution!

Leveraging Generative AI & RPA to Enhance Intelligent Document Processing â€“ IDP

CloudOCR Solves the Document Processing Headache By Combining AI With Human Expertise

How AI is Transforming Document Automation

From Manual to Machine: The Future of Document Automation in the Digital Age

Supercharging Workflow Automation: Combining OCR and Agentic AI

Intelligent Document Processing - what to consider before you start

How to Automate Receipt Data Extraction with OCR and AI

é¢†è‹±æŽ¨è

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†