Automating Invoice Classification with Chainlit, Langgraph, Gemini Flash, Tesseract, and?EasyOCR
In today’s data-driven business environment, processing a large volume of invoices efficiently is a crucial task. Businesses struggle with the time-consuming nature of manually processing invoices, as well as the inconsistent accuracy across different OCR tools. This article presents the first iteration of a solution for automating invoice classification using a combination of Optical Character Recognition (OCR) and Language Models (LLMs), implemented through Langgraph, deep learning-based OCR tools, and Gemini Flash?LLM.
Purpose of the Application
This Proof of Concept aims to simplify the management of PDF invoices by automating data extraction and classification. By utilizing multiple OCR engines and comparing the results through an LLM-based similarity assessment, this system can accurately classify invoices and generate detailed comparative reports. The end result improves workflow efficiency and ensures data accuracy, essential for businesses managing large numbers of invoices.
Problem Statement: Challenges in Invoice Processing
Additionally, third-party software often faces limitations such as:
Solution Overview: Application Workflow
The system involves the following steps:
OCR Nodes: Detailed?Overview
The system uses two OCR tools:
Both tools are used in parallel to extract text, which is then compared using a language model for classification.
Before going trough both OCRs, pdf are converted page by page in images (png) and pre processed including various steps :
This preprocessing step is totally improvable.
领英推荐
Comparison Node: Report and Classification
The comparison node evaluates the OCR extracts based on key information like invoice number, emitter, and receiver. Using embeddings from the sentence-transformers/all-MiniLM-L6-v2 model, it computes cosine similarity to assess how closely the extracted texts match. The output includes a markdown report and similarity score for each invoice.
File Classification and Similarity Assessment
Based on the the similarity assessment the LLM agent classifies invoices and the markdown report into “low,” “medium,” or “high” similarity folders, using function-calling capabilities via the Langchain framework.
this is done with that piece of code to configure the LLM agent with the FS toolkit:
from langchain.agents import initialize_agent, AgentType
from langchain_community.agent_toolkits import FileManagementToolkit
from langchain_google_genai import ChatGoogleGenerativeAI
from prompts.classification_prompt import classification_prompt
class ClassificationAgent:
def __init__(self):
# Initialize LLM
self.llm = ChatGoogleGenerativeAI(
model="gemini-1.5-flash-latest",
temperature=0,
max_tokens=None,
timeout=None,
max_retries=2,
# other params...
)
# Define the File System tools
self.working_directory = os.path.abspath('./')
self.toolkit = FileManagementToolkit(root_dir=self.working_directory)
self.tools = self.toolkit.get_tools()
self.agent = initialize_agent(self.tools, self.llm, agent=AgentType.STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION,
verbose=True,
agent_executor_kwards={"handle_parsing_errors": True})
and this prompt instruction is passed to the agent?:
def classification_prompt(working_directory, file_path, report_path, similarity):
prompt = (
f"Here are the instructions for organizing the files based on their similarity scores:\n\n"
f"1. There is an original PDF file located at: '{file_path}'.\n"
f"2. A report file has been generated and is available at: '{report_path}'.\n\n"
f"Now, follow these rules based on the average similarity score ({similarity}):\n"
f"- If the average similarity is less than 0.5, move the files to the folder: "
f"'{working_directory}/data/output/low_similarity'.\n"
f"- If the average similarity is between 0.5 and 0.8, move the files to the folder: "
f"'{working_directory}/data/output/medium_similarity'.\n"
f"- If the average similarity is greater than 0.8, move the files to the folder: "
f"'{working_directory}/data/output/high_similarity'.\n\n"
f"Make sure to copy the original PDF and to cut and paste the report file to the correct folder based on this logic."
)
return prompt
It could have been done with a simple piece of algorithmic code but this is a simple demonstration of how to use tool calling with LLM.
Constraints and?Outputs:
In this example we are processing two “little“ pdf files, one is clean invoice and the other is noisy, her is the UI final result:
The other output provided by the tool is located in the /data/output folder (processed folder is used to persist original images and preprocessed version of these images for the OCR):
As you can see Gemini Flash + the tool calling on file management toolkit feature has properly classified files in the dedicated output with regard to the similarity computation done for both OCR extracts.
Future improvements include:
In summary, while the tool already offers robust capabilities, these improvements will significantly enhance its performance and accuracy. By refining the similarity checks, optimizing image preprocessing, and leveraging advanced models like Layout LM, the system will be better equipped to handle a variety of invoice formats with increased efficiency and precision.?
This tool is only a prototype and is only an heuristic way of exploring AI abilities to enhance some enterprise processes.
It is available on this github repo