Document Processing - The White Hart of Straight Through Processing

Document Processing - The White Hart of Straight Through Processing

I often have conversations with customers who have started Intelligent Document Processing projects as a part of their RPA practice, and a common thread that has emerged is a general discontentment with the Straight Through Processing rate of transaction processing.?Just to be clear, straight through processing refers to transactions that do not require a human touch.?I consistently hear from customers that we are seeing too many documents end up for human validation.?

Before I address the underlying issues and possible resolutions, lets outline the various components of a generic automated document processing use case.

No alt text provided for this image
No alt text provided for this image

The STP Challenge

A few factors determine whether a document can be processed without any human intervention. I outline these items below and some detail on common challenges

1.??????OCR Quality:?OCR engines convert the text on a scanned document to machine readable text.?The quality of the incoming document determines the quality of this conversion.?Distortions within scanning, stains and blotches on the paper, handwriting on printed text, low quality scans all affect how an OCR engine “reads” the text.?Just like a human, OCR engines can make mistakes and identify the letter “o” as a zero or a letter “B” as the number 8.?

2.??????Document Classification: For a document to be processed the system needs to understand what type of document is being processed so that the appropriate data extraction model can be used.?This is done by either evaluating the text or the images on the document to determine the type of document.?This can be done either through traditional programmatic methods or by using a Machine Learning model that is "trained" to classify/identify a document.?Multi page documents are always tougher to detect than single page documents.?It is almost always advisable to segregate the documents at the intake stage to classify them.?For example, all invoices should be received on a separate mailbox than purchase orders.

3.??????Document Data Extraction: ?Almost all intelligent document processing platforms depend upon a machine learning model that is “trained” on documents to extract data from documents.?Training involves a human to identify data elements on a document.?The model then uses this data to extract data from a document.?The greater the variety of documents the larger the training data set needs to be.?Once the model is trained when a document is presented to the model, it returns the extracted data along with an additional parameter referred to as a confidence score.?This confidence score is really a “familiarity” score for the extracted data.?This is the model’s prediction of its own confidence in the extraction of the data.?A confidence threshold is set and any values falling below the threshold require the data extraction to be verified by a human.?Note:?For document to be processed straight through without human intervention ALL fields on the document must be extracted with a confidence score above the threshold.?This can be very challenging if a lot of fields are being extracted.?Even a single field with a low confidence score will result in a human needing to look at the document. One way to handle low confidence score extraction is to compare extracted data to an independent source of data if available.?E.g., compare the PO number and the vendor’s name on an invoice to a PO number in the accounting system and if they match, the low confidence score can be ignored. As humans validate more documents, the data captured from this validation is then used to "re-train" the model and accuracy of extraction increases over time. The time is dictated by variations in documents and the number of documents being processed through the solution.

4.??????Post Processing Errors: Generally, these errors are related to data consistency between extracted data and the system of record, e.g., Customer Name not found, or product description not matching.?These are usually resolved through data cleansing and mapping tables.

As you can see there are quite a few variables that need to line up for straight through processing to occur.?

Why the Fuss?

Despite all the challenges i list above, almost all our document processing engagements share a single common metric:?

The human processing time per transaction is almost always lower than the previously manual process, averaging between 50-80%.


So, in essence, the business staff was spending less time processing these documents than they were previously. So why the fuss??In my opinion it has a lot to do with expectations from these solutions.?When an IDP Platform is pitched with 75-90% accuracy statistic, sales and technical engineers often fail to set the context, especially of the straight through processing percentage.?Take a scenario of a document with 10 fields.?Every single document processed in this scenario could have 9 out of 10 fields above the threshold (90% accuracy) but 1 required field that is consistently below the threshold would result in 0% straight through processing.?Customers often equate IDP accuracy to percentage of documents not requiring human validation.??

Business leaders hear “straight through processing” whenever “automation” of the process is promised.?This leads to business planning that reduces resources from document processing and assigns them to other tasks.?When the business sees a lot of human touch to these documents it impacts this resource planning and results in general displeasure of the implementation. We have seen significant technical efforts being made to increase this STP percentage with mixed results.?The reality is that this 50-80% efficiency would not be possible without the implementation of an intelligent document processing platform.

Technical implementation Teams must advise the business of the Intelligent Document Processing paradigm and set the expectations that there may be a significant percentage of documents that may require human validation.

Know Your Why

Identification of the primary value proposition for the automation needs to be front and center when designing the automation. If the main goal of the automation is to reduce the end to end processing time (Average Handle Time) of a transaction, then straight through processing is much more relevant and the solution needs to incorporate this consideration in the design. This will require thorough testing and evaluation of document types and formats. This may also require a rethink of the business process itself.

However, if the focus is on overall reduction of manual labor and human time spent on the transaction then the solution may be designed differently. A significant reduction in human time spent on the transaction can be reduced by implementing data entry and validation rules in the automation logic after the data has been extracted. We find that 50-70% of human time is spent on data entry, validation and other mechanical tasks such as uploading/downloading documents, emailing status of the transaction etc. So abstracting the data validation from the other tasks that can be performed by a bot can result in significant time savings. Over time, this efficiency will increase due to retraining of the ML Models making the classification and data extraction more accurate.

Jim Neidhardt

Business Growth Guide, Architect of CEO Peer Groups, Connector of SMB growth-minded Business Owners, Presidents, and CEOs

11 个月

Ahmed, thanks for sharing!

赞
回复
Ralph Aboujaoude Diaz

Global Head - Product and Operations Cybersecurity

2 å¹´

Very insightful Ahmed Zaidi! Thanks for sharing

赞
回复
Rajeev M A

Enterprise Architect at Tata Consultancy Services Focused on Artificial Intelligence

2 å¹´

Are you extracting information from a set of standard document templates or from free form documents is an important question. Understanding the layout of free from documents is a much harder problem for current ML.

赞
回复
Ranjeet Deshpande

Intelligent Automation and Generative Solutions - Communication, Media and Tech at Accenture

2 å¹´

Very informative. Thanks Ahmed Zaidi

赞
回复
Vino Livan Nadar

3 x UiPath MVP | New York Chapter Lead | RPA Specialist | AI Enthusiast | Intelligent Automation Lead

2 å¹´

Detailed practical insights of one of the overlooked aspects of IDP on ST processing. Indeed this context of the ST processing must be set while considering an IDP solution along with the ease and interactive level of the HiTL layer for doing the data labelling tasks.

要查看或添加评论,请登录

Ahmed Zaidi的更多文章

  • Are you testing like its 1999?

    Are you testing like its 1999?

    So, here’s is a general QA process overview that hasn't changed much over the years. (note: I'll stop at the creation…

    4 条评论
  • The New Era of Document Processing

    The New Era of Document Processing

    We have come a long way since OCR! The latest advancement in Intelligent Document Processing is using Generative AI…

    7 条评论
  • Invoice Processing Sure.. RPA + OCR.... Not so fast!

    Invoice Processing Sure.. RPA + OCR.... Not so fast!

    AP Process Automation – Think its just a RPA and OCR Problem – Think again Invoice Processing is a very common use case…

    6 条评论
  • Post the RPA Pilot - An Operating Model for Scale (Psst..You have help)

    Post the RPA Pilot - An Operating Model for Scale (Psst..You have help)

    In my last 2 posts I talked about setting up the RPA CoE and how that can be a very daunting list of activities. But…

    1 条评论
  • What it takes to Setup a CoE

    What it takes to Setup a CoE

    In my last post I talked about why CoE and where. Today I want to talk about what it takes to set up the CoE.

  • "The" Center of Excellence (CoE)

    "The" Center of Excellence (CoE)

    Every Enterprise conversation I have regarding RPA includes a question re the coveted “COE”. Discussions are broad and…

    2 条评论
  • Our best RPA engineer at Offshore almost quit! Here's why.

    Our best RPA engineer at Offshore almost quit! Here's why.

    One of the first things we did when we started Accelirate was setup an offshore delivery center and hire Perdie, the…

    4 条评论
  • You don’t need to be a Developer to do this! – The RPA Edition

    You don’t need to be a Developer to do this! – The RPA Edition

    In the heyday of Y2K testing craze, Test Automation tools were all any one could talk about. I recall vendors selling…

    8 条评论
  • RPA and Intelligent Automation for Optical Character Recognition based Business Processes

    RPA and Intelligent Automation for Optical Character Recognition based Business Processes

    Many organizations today have well established OCR Processes where accuracy rates are pre-established based on the OCR…

    1 条评论
  • Applied AI and Machine Learning - Address Parsing - Old problem, new solution

    Applied AI and Machine Learning - Address Parsing - Old problem, new solution

    Machine learning and AI are very rapidly moving from the realm of research to business and consumer applications. It…

    4 条评论

社区洞察

其他会员也浏览了