Generative AI to improve OCR
Introduction
Optical Character Recognition (OCR) technology plays a crucial role in various industry sectors by enabling the extraction and interpretation of text from images, scanned documents, or other visual inputs.
Below some example of OCR utilization for a set of industry
There are a lot of solutions able to address OCR use cases, with different level of accuracy, both open source and proprietary. Below some example:
In some cases, OCR solutions are not able to extract the expected result and this can be related to different reasons, including:
Address these cases is not so simple and is the reasons of principal error in OCR use cases.
Empowering OCR solutions with Generative AI
If you read my previous blogs (Innovative approach to AI project delivery with Generative AI, Unlocking the power of generative AI to visualize functional requirements, Generative AI for tabular data explanation: prompt limit is not a limit, AI pipeline to "play a picture of a musical score", and its implication in generative AI, Talking with a GraphDB leveraging generative AI, Generative AI impact on data platform solutions), you discovered that generative AI is not only about text generation but it is a technology able to open the scene to a large set of use cases.
Now, if we consider OCR use case, Generative AI can be used to improve the output of an OCR service, being able to reconstruct the generated output, detecting and correcting OCR errors, if they occurs.
Consider the following handwritten example, captured with a smartphone camera:
Below an example of OCR results, extracted directly using native OCR feature of the smartphone that acquired the picture:
领英推荐
Below the extracted text:
- Draumerely muprove propuctivity
- "SUPER INTELLIGENT AN EVERYWHERE" F revolutia
- USE CASE
1) Search engive
TODAY
→ TOMORROW
1 RSSULT!
the corvet de
auswers
2) ADVERTISMENT /e - COMMEReL
Eig. AMAZON → every user will see a ol
antomaticelly jenerale
As you can see, there are a lot of errors due to the quality of the picture and to the fact that text was handwritten, so more complex to analyze.
As I mention in each of my GenAI blog series, GenAI happened!
Now we can use generative AI, in particular, a large language model, to identify and correct unmanaged sentences. In this example, I'll use IBM watsonx as a generative AI tool to complete my task.
First of all, I formulate a prompt (request) for the LLM, that is the one below:
You have following sentences extract from an OCR process.
You need to recognize the words/sentences that have not been correctly matched from original picture and correct the overall text.
Generate an output including a list of unmanaged words or sentences with the correct reconstruction.
Below the sentences to analyze:
- Draumerely muprove propuctivity
- "SUPER INTELLIGENT AN EVERYWHERE" F revolutia
- USE CASE
1) Search engive
TODAY
→ TOMORROW
1 RSSULT!
the corvet de
auswers
2) ADVERTISMENT /e - COMMEReL
Eig. AMAZON → every user will see a ol
antomaticelly jenerale
Below the output generated by LLM:
As you can see, the Large Language Model did a great job, resolving all errors from the smartphone OCR system (to be honest, 1 error is still present; it is your mission to detect it and reformulate the prompt to avoid this error!)
Conclusion
In conclusion, Optical Character Recognition (OCR) technology is a very important technology in many industry sectors able to enhance processes speed and efficiency interpreting text from visual inputs.
While there are numerous OCR solutions available, both open source and proprietary, addressing the challenges tied to image quality and handwritten text remains crucial to improving OCR performance.
By integrating Generative AI into OCR systems, we can further enhance their capabilities and minimize errors, thereby driving more effective and reliable results. As industries continue to embrace and adopt OCR technology, the combination of OCR and Generative AI promises to unlock new levels of productivity, accuracy, and automation across a wide range of use cases and applications.
#OCR #OpticalCharacterRecognition #GenerativeAI #IndustryInnovation #Banking #Retail #Automotive #Pharmaceuticals #AIinBusiness #TextExtraction #MachineLearning #DocumentProcessing #DataAccuracy #Automation #watsonx
Sоftwаrе Еnginееr at Triodos Bank
9 个月At Project Gutenberg Distributed Proofreaders, we use a page-by-page interface where volunteers correct the output of OCR manually. I would very much like to try this out, using this approach, for handling the first round of corrections automatically. I will look into the degree this will be possible, especially with older texts, which are often hard to read, and use non-standard spellings (which we want to retain).