A New Frontier in Document Processing
This article explores potential solutions for automating Intelligent Invoice Processing (IIP), a critical subset of Intelligent Document Processing (IDP). While traditional AI methods using OCR and rule-based extraction have limitations, especially with complex layouts and low-confidence extractions, this article highlights the potential of Visual capabilities on some large language models as a game-changer.
Intelligent Data Processing and one of its branches specifically on Invoice processing
Intelligent document processing (IDP) uses software to capture, transform, and process data from documents, such as emails, invoices, or other text. IDP uses AI technologies like computer vision, Optical Character Recognition (OCR), Natural Language Processing (NLP), and machine/deep learning to analyse, categorise, transform, and export the extracted data in an end-to-end process. IDP solutions work with various formats, including structured, unstructured, and semi-structured.
Some of its advantages are:
The challenges in the traditional approach
A customer challenged us to develop a system that uses traditional cloud provider approaches to IDP, such as AWS Comprehend and Textract, Azure Document Intelligence, or Google’s Document AI. However, we found some challenges, mainly:
A new approach
Given the challenges of the results using the traditional or even more modern approaches, we’ve developed an alternative solution.
The approach uses GenAI, mainly Large Language Models (LLM) with Vision capabilities, where we ask to extract the information into a specific format mapped automatically to a predefined set of relevant fields, combined with quality checks.
You might think that it addresses part of the problem, given that you won’t get confidence scores or bounding boxes where the text is located. That’s where we merged the information from the results of OCR systems along with intelligent methods for similarity scoring and Data Quality to get to know where there is a need for human intervention. A diagram of the solution is such:
The diagram shows a first step: We use an OCR tool to extract text and bounding boxes for the text. We then run page classification on the text to remove irrelevant pages, such as legal terms and conditions, and attached pages, such as orders and delivery slips or others.
The next step is to use the visual capabilities of LLMs, which are very good at identifying information on images. Keep in mind that a document can be considered an image, and the position of each word could be relevant to extraction.
Once we have the information, we perform a closed-source similarity algorithm that will give us the confidence scores for the information extracted by the LLM, along with the bounding boxes critical for another stage of the process and any manual actions that might be required.
So if an invoice fails automatic processing, instead of the operator loading the invoice by hand, we present a user interface with the invoice pages and highlight the fields extracted in two colours, red or green, where red might require manual intervention. If needed, once the fields requiring attention are updated, the invoice can be returned to the automation process, saving time again on human interventions.
领英推荐
There is still a step missing after the similarity score, where we apply run data quality checks that will depend on the third-party system we are integrating with. This step will dictate whether manual action or a happy path will be taken. ?
Results and Costs
In our case, while traditional approaches provided 65% of automation, using a more modern IDP approach can give 95% or more. However, the training and customisation process could be pretty expressive if your company has many vendors and formats being processed.
We’ve tested our approach using three different datasets with accurate invoices: two datasets from two other vendors where invoice formats are similar in each vendor and another dataset with entirely different formats from tens of vendors.
The approach represented a minimum of 30% increase on the efficiency of the system.
Results were quite impressive since, depending on the algorithm selected, we could achieve between 98% to 99% or even 100% of automation for the two vendors’ dataset and 91% to 95% of automation for the multi-vendor dataset. This represents a minimum of 30% increase on the efficiency of the system.
You still get a considerable reduction on costs when compared with the more traditional approaches or even some modern approaches.
In terms of costs, it depends on how big the documents are, and using LLMs, we could double the price when compared to services such as AWS Comprehend and Textract, Azure Document Intelligence, or Google’s Document AI, still if we consider that in this specific datasets, one document could cost between $0.016 and $0.03, or that costs can even be reduced by using smaller models, you still get a considerable reduction on costs when compared with the more traditional approaches or even some modern approaches.
Summary
In the ever-evolving landscape of artificial intelligence, a groundbreaking approach reshapes how businesses handle document processing. Intelligent Invoice Processing (IIP), a crucial subset of Intelligent Document Processing (IDP), is experiencing a paradigm shift thanks to the visual capabilities of Large Language Models (LLMs). This innovative solution promises to overcome the limitations of traditional methods and usher in a new era of efficiency and accuracy.
The solution uses GenAI and LLMs with Vision capabilities to extract information into a predefined format, integrating OCR results with similarity scoring and data quality checks to identify areas needing human intervention. The process includes using OCR to extract text and bounding boxes, classifying pages to remove irrelevant content, leveraging LLMs’ visual capabilities for information extraction, applying similarity algorithms for confidence scoring, and highlighting fields needing manual intervention in a user interface. Results showed significant improvements in automation rates, achieving up to 99% or even 100% for some datasets and a 30% increase in efficiency. Although costs may double compared to traditional methods, the overall reduction in manual processing and increased accuracy make it a viable solution.
?
?
?
?
?