A New Frontier in Document Processing

A New Frontier in Document Processing

This article explores potential solutions for automating Intelligent Invoice Processing (IIP), a critical subset of Intelligent Document Processing (IDP). While traditional AI methods using OCR and rule-based extraction have limitations, especially with complex layouts and low-confidence extractions, this article highlights the potential of Visual capabilities on some large language models as a game-changer.

Intelligent Data Processing and one of its branches specifically on Invoice processing

Intelligent document processing (IDP) uses software to capture, transform, and process data from documents, such as emails, invoices, or other text. IDP uses AI technologies like computer vision, Optical Character Recognition (OCR), Natural Language Processing (NLP), and machine/deep learning to analyse, categorise, transform, and export the extracted data in an end-to-end process. IDP solutions work with various formats, including structured, unstructured, and semi-structured.

Some of its advantages are:

  • Reduces manual data entry: IDP automates extracting data from documents, reducing or eliminating the need for manual data entry.
  • Reduces costs: IDP can reduce overhead costs by automating repetitive tasks and eliminating bottlenecks.
  • Increases efficiency: IDP can process documents much faster than humans, which can help businesses improve their overall efficiency.
  • Improves accuracy: IDP solutions can extract data from documents highly, reducing the risk of human error.?

The challenges in the traditional approach

A customer challenged us to develop a system that uses traditional cloud provider approaches to IDP, such as AWS Comprehend and Textract, Azure Document Intelligence, or Google’s Document AI. However, we found some challenges, mainly:

  • Accuracy and Completeness: Without specific training, IDP solutions can struggle to achieve consistently high accuracy and completeness, mainly when dealing with a wide variety of invoice formats and complex table structures.
  • Low-Confidence Fields: Extracting data from fields using generic and out-of-box models revealed low confidence scores. Even after some time, information is extracted correctly, but it still translates into human reviews.
  • Manual Mapping and Customization: IDP solutions often load data into 3rd party systems. In the case of invoices, ERP could be used for automatic validation. Either you train models to extract entities or extract key/values pair and map those into fields that are relevant for ERP systems. So, depending on the vendor or invoice layout and language, you might have to map them all. Now imagine hundreds of suppliers and invoice formats.
  • Multilingual Challenges: IDP solutions can have difficulty processing invoices in languages other than English. This can limit their applicability in global business environments.
  • Table Extraction: Extracting data from tables, especially those with variable structures, can be challenging because, in our case, tables were not very structured, presenting another challenge for hundreds of invoice formats and vendors.
  • Custom training: Custom training is a possibility however if you have may hundreds of different invoices layouts, that becomes a heavy process and lengthy process where you still need to go for a very large project, reducing the opportunity for savings.

A new approach

Given the challenges of the results using the traditional or even more modern approaches, we’ve developed an alternative solution.

The approach uses GenAI, mainly Large Language Models (LLM) with Vision capabilities, where we ask to extract the information into a specific format mapped automatically to a predefined set of relevant fields, combined with quality checks.

You might think that it addresses part of the problem, given that you won’t get confidence scores or bounding boxes where the text is located. That’s where we merged the information from the results of OCR systems along with intelligent methods for similarity scoring and Data Quality to get to know where there is a need for human intervention. A diagram of the solution is such:


Solution Diagram

The diagram shows a first step: We use an OCR tool to extract text and bounding boxes for the text. We then run page classification on the text to remove irrelevant pages, such as legal terms and conditions, and attached pages, such as orders and delivery slips or others.

The next step is to use the visual capabilities of LLMs, which are very good at identifying information on images. Keep in mind that a document can be considered an image, and the position of each word could be relevant to extraction.

Once we have the information, we perform a closed-source similarity algorithm that will give us the confidence scores for the information extracted by the LLM, along with the bounding boxes critical for another stage of the process and any manual actions that might be required.

So if an invoice fails automatic processing, instead of the operator loading the invoice by hand, we present a user interface with the invoice pages and highlight the fields extracted in two colours, red or green, where red might require manual intervention. If needed, once the fields requiring attention are updated, the invoice can be returned to the automation process, saving time again on human interventions.

There is still a step missing after the similarity score, where we apply run data quality checks that will depend on the third-party system we are integrating with. This step will dictate whether manual action or a happy path will be taken. ?

Results and Costs

In our case, while traditional approaches provided 65% of automation, using a more modern IDP approach can give 95% or more. However, the training and customisation process could be pretty expressive if your company has many vendors and formats being processed.

We’ve tested our approach using three different datasets with accurate invoices: two datasets from two other vendors where invoice formats are similar in each vendor and another dataset with entirely different formats from tens of vendors.

The approach represented a minimum of 30% increase on the efficiency of the system.

Results were quite impressive since, depending on the algorithm selected, we could achieve between 98% to 99% or even 100% of automation for the two vendors’ dataset and 91% to 95% of automation for the multi-vendor dataset. This represents a minimum of 30% increase on the efficiency of the system.

You still get a considerable reduction on costs when compared with the more traditional approaches or even some modern approaches.

In terms of costs, it depends on how big the documents are, and using LLMs, we could double the price when compared to services such as AWS Comprehend and Textract, Azure Document Intelligence, or Google’s Document AI, still if we consider that in this specific datasets, one document could cost between $0.016 and $0.03, or that costs can even be reduced by using smaller models, you still get a considerable reduction on costs when compared with the more traditional approaches or even some modern approaches.

Summary

In the ever-evolving landscape of artificial intelligence, a groundbreaking approach reshapes how businesses handle document processing. Intelligent Invoice Processing (IIP), a crucial subset of Intelligent Document Processing (IDP), is experiencing a paradigm shift thanks to the visual capabilities of Large Language Models (LLMs). This innovative solution promises to overcome the limitations of traditional methods and usher in a new era of efficiency and accuracy.

The solution uses GenAI and LLMs with Vision capabilities to extract information into a predefined format, integrating OCR results with similarity scoring and data quality checks to identify areas needing human intervention. The process includes using OCR to extract text and bounding boxes, classifying pages to remove irrelevant content, leveraging LLMs’ visual capabilities for information extraction, applying similarity algorithms for confidence scoring, and highlighting fields needing manual intervention in a user interface. Results showed significant improvements in automation rates, achieving up to 99% or even 100% for some datasets and a 30% increase in efficiency. Although costs may double compared to traditional methods, the overall reduction in manual processing and increased accuracy make it a viable solution.

?

?

?

?

?

要查看或添加评论,请登录

社区洞察

其他会员也浏览了