登录查看更多内容

Automating Invoice Classification with Chainlit, Langgraph, Gemini Flash, Tesseract, and?EasyOCR

Samir Kerroumi

IT Engineering Manager@AXA, Senior software engineer and scientist

发布日期: 2024年10月9日

+ 关注

In today’s data-driven business environment, processing a large volume of invoices efficiently is a crucial task. Businesses struggle with the time-consuming nature of manually processing invoices, as well as the inconsistent accuracy across different OCR tools. This article presents the first iteration of a solution for automating invoice classification using a combination of Optical Character Recognition (OCR) and Language Models (LLMs), implemented through Langgraph, deep learning-based OCR tools, and Gemini Flash?LLM.

Purpose of the Application

This Proof of Concept aims to simplify the management of PDF invoices by automating data extraction and classification. By utilizing multiple OCR engines and comparing the results through an LLM-based similarity assessment, this system can accurately classify invoices and generate detailed comparative reports. The end result improves workflow efficiency and ensures data accuracy, essential for businesses managing large numbers of invoices.

Problem Statement: Challenges in Invoice Processing

Manual Classification is Time-Consuming: Manually sorting and extracting data from invoices leads to inefficiency, especially when businesses are handling thousands of documents daily.
OCR Accuracy Varies Across Tools: Many OCR engines produce varying results based on the document’s format or quality, requiring manual validation.
Need for Reliable Similarity Assessment: Inconsistent data extraction makes it difficult to reliably classify documents based on textual similarities, which is important for detecting duplicates or errors.

Additionally, third-party software often faces limitations such as:

Struggles with handling diverse invoice formats, leading to errors.
Bugs that are poorly handled by service providers.
High costs related to licensing and infrastructure.

Solution Overview: Application Workflow

The system involves the following steps:

Document Ingestion: Retrieves PDF invoices from an input folder. Previously, users upload invoices to a predefined folder.
Parallel OCR Processing and Text Extraction: The system triggers OCR on the PDFs, compares results, and computes similarity. It executes both EasyOCR and PyTesseract in parallel to perform OCR on the PDFs. The process includes image preprocessing, text extraction, and post-processing.
Comparison and Similarity Calculation: Using Gemini Flash for textual comparison and maths formula for a simple cosine similarity computation, the system determines how closely the OCR outputs match.
Data Classification with Gemini Flash Function call feature?: the system generates a detailed markdown report for each invoice and displays classification summaries on the UI. Based on similarity scores, the system classifies (Using AI langchain function call feature) and copies the processed files to categorized folders (low, medium, or high similarity).

OCR Nodes: Detailed?Overview

The system uses two OCR tools:

PyTesseract: Google’s OCR engine with support for multiple languages and customizable settings.
EasyOCR: A deep learning-based OCR tool that supports over 80 languages and is suitable for complex backgrounds.

Both tools are used in parallel to extract text, which is then compared using a language model for classification.

Before going trough both OCRs, pdf are converted page by page in images (png) and pre processed including various steps :

conversion to grey
denoised
thresholded
contrast_enhanced
resized

This preprocessing step is totally improvable.

领英推荐

Fully Automated Business Document Processing System…

ganesh prasad bhandari 6 个月前

Benefits of Intelligent Document Processing (IDP)

Revalsys Technologies 6 个月前

2024 DORA Report Summary

Laura Tacho 5 个月前

Comparison Node: Report and Classification

The comparison node evaluates the OCR extracts based on key information like invoice number, emitter, and receiver. Using embeddings from the sentence-transformers/all-MiniLM-L6-v2 model, it computes cosine similarity to assess how closely the extracted texts match. The output includes a markdown report and similarity score for each invoice.

File Classification and Similarity Assessment

Based on the the similarity assessment the LLM agent classifies invoices and the markdown report into “low,” “medium,” or “high” similarity folders, using function-calling capabilities via the Langchain framework.

this is done with that piece of code to configure the LLM agent with the FS toolkit:

from langchain.agents import initialize_agent, AgentType
from langchain_community.agent_toolkits import FileManagementToolkit
from langchain_google_genai import ChatGoogleGenerativeAI

from prompts.classification_prompt import classification_prompt


class ClassificationAgent:
    def __init__(self):
        # Initialize LLM
        self.llm = ChatGoogleGenerativeAI(
            model="gemini-1.5-flash-latest",
            temperature=0,
            max_tokens=None,
            timeout=None,
            max_retries=2,
            # other params...
        )
        # Define the File System tools
        self.working_directory = os.path.abspath('./')
        self.toolkit = FileManagementToolkit(root_dir=self.working_directory)
        self.tools = self.toolkit.get_tools()
        self.agent = initialize_agent(self.tools, self.llm, agent=AgentType.STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION,
                                      verbose=True,
                                      agent_executor_kwards={"handle_parsing_errors": True})

and this prompt instruction is passed to the agent?:

def classification_prompt(working_directory, file_path, report_path, similarity):
    prompt = (
        f"Here are the instructions for organizing the files based on their similarity scores:\n\n"
        f"1. There is an original PDF file located at: '{file_path}'.\n"
        f"2. A report file has been generated and is available at: '{report_path}'.\n\n"
        f"Now, follow these rules based on the average similarity score ({similarity}):\n"
        f"- If the average similarity is less than 0.5, move the files to the folder: "
        f"'{working_directory}/data/output/low_similarity'.\n"
        f"- If the average similarity is between 0.5 and 0.8, move the files to the folder: "
        f"'{working_directory}/data/output/medium_similarity'.\n"
        f"- If the average similarity is greater than 0.8, move the files to the folder: "
        f"'{working_directory}/data/output/high_similarity'.\n\n"
        f"Make sure to copy  the original PDF and to cut and paste the report file to the correct folder based on this logic."
    )

    return prompt

It could have been done with a simple piece of algorithmic code but this is a simple demonstration of how to use tool calling with LLM.

Constraints and?Outputs:

In this example we are processing two “little“ pdf files, one is clean invoice and the other is noisy, her is the UI final result:

The other output provided by the tool is located in the /data/output folder (processed folder is used to persist original images and preprocessed version of these images for the OCR):

As you can see Gemini Flash + the tool calling on file management toolkit feature has properly classified files in the dedicated output with regard to the similarity computation done for both OCR extracts.

Future improvements include:

Implementing parallel processing to enhance performance.
Conducting finer-grained similarity checks based on invoice headers.
Improving image preprocessing techniques to boost OCR accuracy.
Introducing the usage of a Layout LM model for better invoice layout understanding.

In summary, while the tool already offers robust capabilities, these improvements will significantly enhance its performance and accuracy. By refining the similarity checks, optimizing image preprocessing, and leveraging advanced models like Layout LM, the system will be better equipped to handle a variety of invoice formats with increased efficiency and precision.?

This tool is only a prototype and is only an heuristic way of exploring AI abilities to enhance some enterprise processes.

It is available on this github repo

要查看或添加评论，请登录

Samir Kerroumi的更多文章

Bittensor (TAO): Complete Overview of a Protocol that Combines AI and Blockchain

2025年3月19日

Bittensor (TAO): Complete Overview of a Protocol that Combines AI and Blockchain

The AI sector in cryptocurrencies has been booming since the beginning of 2024, and Bittensor is often seen as one of…
Essential Strategies for Managers to Build Relationships and Inspire Teams

2024年10月8日

Essential Strategies for Managers to Build Relationships and Inspire Teams

Understanding what motivates people and how to effectively connect with them is a valuable skill. This guide outlines…
From Code to Collaboration: Leadership Principles for Software Teams

2024年10月8日

From Code to Collaboration: Leadership Principles for Software Teams

While the life of a software engineer may not involve life-threatening challenges, it demands its own form of…
Human Consciousness and the Horizons of AI: Exploring the bounds with Roger Penrose

2024年8月22日

Human Consciousness and the Horizons of AI: Exploring the bounds with Roger Penrose

How are our complex experiences, such as love, happiness, pain, sadness, angry, aesthetic sensitivity, will, and…
Engineering Great Teams: A Manager's Guide

2024年8月13日

Engineering Great Teams: A Manager's Guide

The software industry is experiencing unprecedented growth, demanding a new breed of leaders at its helm. While…
Navigating Hybrid Work Management: Insights from a Manager of Hybrid teams

2024年7月18日

Navigating Hybrid Work Management: Insights from a Manager of Hybrid teams

As a manager overseeing both remote and on-site teams, I've seen firsthand how organizations are tackling the…

See all articles

Automating Invoice Classification with Chainlit, Langgraph, Gemini Flash, Tesseract, and?EasyOCR

Samir Kerroumi

IT Engineering Manager@AXA, Senior software engineer and scientist

Purpose of the Application

Problem Statement: Challenges in Invoice Processing

Solution Overview: Application Workflow

OCR Nodes: Detailed?Overview

领英推荐

Comparison Node: Report and Classification

File Classification and Similarity Assessment

Constraints and?Outputs:

Future improvements include:

Samir Kerroumi的更多文章

社区洞察

其他会员也浏览了

AI, AI, AI? Aye Yai Yai!

AI deployment options and trade-offs. Guide for CIO's

A Primer for Buyers of Enterprise AI

How Machine Learning Algorithms Can Optimize Test Coverage

Rossum Newsletter - The Good, The Bad, And The Ugly Of AI

The Right Machine Learning Lifecycle Tool?

Oracle CloudWorld 2024 – The AI story continues

Automating Machine Learning Workflows with Amazon SageMaker Pipelines

Advent of AI in ERP O2C cycles

PDF Screenshot OCR Analysis with Google Gemini Pro

Purpose of the Application

Problem Statement: Challenges in Invoice Processing

Solution Overview: Application Workflow

OCR Nodes: Detailed?Overview

领英推荐

Comparison Node: Report and Classification

File Classification and Similarity Assessment

Constraints and?Outputs:

Future improvements include:

Samir Kerroumi的更多文章

Bittensor (TAO): Complete Overview of a Protocol that Combines AI and Blockchain

Essential Strategies for Managers to Build Relationships and Inspire Teams

From Code to Collaboration: Leadership Principles for Software Teams

Human Consciousness and the Horizons of AI: Exploring the bounds with Roger Penrose

Engineering Great Teams: A Manager's Guide

Navigating Hybrid Work Management: Insights from a Manager of Hybrid teams

社区洞察

其他会员也浏览了

AI, AI, AI? Aye Yai Yai!

AI deployment options and trade-offs. Guide for CIO's

A Primer for Buyers of Enterprise AI

How Machine Learning Algorithms Can Optimize Test Coverage

Rossum Newsletter - The Good, The Bad, And The Ugly Of AI

The Right Machine Learning Lifecycle Tool?

Oracle CloudWorld 2024 – The AI story continues

Automating Machine Learning Workflows with Amazon SageMaker Pipelines

Advent of AI in ERP O2C cycles

PDF Screenshot OCR Analysis with Google Gemini Pro