登录查看更多内容

Generative AI to improve OCR

Simone Romano

Associate Partner - AI & Analytics Practice Leader

发布日期: 2024年1月10日

Introduction

Optical Character Recognition (OCR) technology plays a crucial role in various industry sectors by enabling the extraction and interpretation of text from images, scanned documents, or other visual inputs.

Below some example of OCR utilization for a set of industry

Banking: OCR is employed to extract information from scanned checks, invoices, and various financial documents. This improves accuracy in data entry and speeds up transaction processing.
Retail: OCR is used to digitize and manage inventory data by extracting information from product labels and barcodes. This helps in maintaining accurate stock levels and reducing manual errors. Another retail use case is related to the extraction of data from receipts, facilitating expense tracking, and improving overall financial management.
Automotive: OCR is employed in supply chain processes to extract information from shipping documents, invoices, and packaging labels, enhancing transparency and efficiency.
Pharmaceuticals: OCR assists in the extraction of information from regulatory documents, ensuring compliance with industry standards and regulations.

There are a lot of solutions able to address OCR use cases, with different level of accuracy, both open source and proprietary. Below some example:

IBM Datacap: IBM? Datacap acquires documents, extracts useful information from them, and feeds them into other business processes downstream. Its strength is its ability to complete these tasks with a high degree of automation, flexibility, and accuracy.
Amazon Textract is a machine learning (ML) service that automatically extracts text, handwriting, layout elements, and data from scanned documents. It goes beyond simple optical character recognition (OCR) to identify, understand, and extract specific data from documents
Google Cloud Vision API: Integrates Google Vision features, including image labeling, face, logo, and landmark detection, optical character recognition (OCR), and detection of explicit content, into applications.
Azure AI Vision: The cloud-based Azure AI Vision API provides developers with access to advanced algorithms for processing images and returning information. Microsoft's Read OCR engine is composed of multiple advanced machine-learning based models supporting global languages. It can extract printed and handwritten text including mixed languages and writing styles.
Tesseract: Tesseract is an open source optical character recognition (OCR) platform. OCR extracts text from images and documents without a text?layer?and outputs the document?into a new searchable text file, PDF, or most?other popular formats. Tesseract?is highly customizable and can operate using most languages, including multilingual documents and vertical text.

In some cases, OCR solutions are not able to extract the expected result and this can be related to different reasons, including:

quality of the image
text is not digital but handwritten

Address these cases is not so simple and is the reasons of principal error in OCR use cases.

Empowering OCR solutions with Generative AI

If you read my previous blogs (Innovative approach to AI project delivery with Generative AI, Unlocking the power of generative AI to visualize functional requirements, Generative AI for tabular data explanation: prompt limit is not a limit, AI pipeline to "play a picture of a musical score", and its implication in generative AI, Talking with a GraphDB leveraging generative AI, Generative AI impact on data platform solutions), you discovered that generative AI is not only about text generation but it is a technology able to open the scene to a large set of use cases.

Now, if we consider OCR use case, Generative AI can be used to improve the output of an OCR service, being able to reconstruct the generated output, detecting and correcting OCR errors, if they occurs.

Consider the following handwritten example, captured with a smartphone camera:

Below an example of OCR results, extracted directly using native OCR feature of the smartphone that acquired the picture:

领英推荐

What is AI and Data Science Engineering?

Analytics Insight? 8 个月前

Deconstructing LLM API Integration: An Exhaustive…

John Enoh 3 个月前

How to Use Deep Learning-Based OCR: A Technical…

LandingAI 1 年前

Below the extracted text:

- Draumerely muprove propuctivity
- "SUPER INTELLIGENT AN EVERYWHERE" F revolutia
- USE CASE
1) Search engive
TODAY
→ TOMORROW
1 RSSULT!
the corvet de
auswers
2) ADVERTISMENT /e - COMMEReL
Eig. AMAZON → every user will see a ol
antomaticelly jenerale

As you can see, there are a lot of errors due to the quality of the picture and to the fact that text was handwritten, so more complex to analyze.

As I mention in each of my GenAI blog series, GenAI happened!

Now we can use generative AI, in particular, a large language model, to identify and correct unmanaged sentences. In this example, I'll use IBM watsonx as a generative AI tool to complete my task.

First of all, I formulate a prompt (request) for the LLM, that is the one below:

You have following sentences extract from an OCR process. 
You need to recognize the words/sentences that have not been correctly matched from original picture and correct the overall text.
Generate an output including a list of unmanaged words or sentences with the correct reconstruction.

Below the sentences to analyze:
- Draumerely muprove propuctivity
- "SUPER INTELLIGENT AN EVERYWHERE" F revolutia
- USE CASE
1) Search engive
TODAY
→ TOMORROW
1 RSSULT!
the corvet de
auswers
2) ADVERTISMENT /e - COMMEReL
Eig. AMAZON → every user will see a ol
antomaticelly jenerale

Below the output generated by LLM:

OCR reconstruction via Large Language Models.

As you can see, the Large Language Model did a great job, resolving all errors from the smartphone OCR system (to be honest, 1 error is still present; it is your mission to detect it and reformulate the prompt to avoid this error!)

Conclusion

In conclusion, Optical Character Recognition (OCR) technology is a very important technology in many industry sectors able to enhance processes speed and efficiency interpreting text from visual inputs.

While there are numerous OCR solutions available, both open source and proprietary, addressing the challenges tied to image quality and handwritten text remains crucial to improving OCR performance.

By integrating Generative AI into OCR systems, we can further enhance their capabilities and minimize errors, thereby driving more effective and reliable results. As industries continue to embrace and adopt OCR technology, the combination of OCR and Generative AI promises to unlock new levels of productivity, accuracy, and automation across a wide range of use cases and applications.

#OCR #OpticalCharacterRecognition #GenerativeAI #IndustryInnovation #Banking #Retail #Automotive #Pharmaceuticals #AIinBusiness #TextExtraction #MachineLearning #DocumentProcessing #DataAccuracy #Automation #watsonx

Jeroen Hellingman

Sоftwаrе Еnginееr at Triodos Bank

9 个月

At Project Gutenberg Distributed Proofreaders, we use a page-by-page interface where volunteers correct the output of OCR manually. I would very much like to try this out, using this approach, for handling the first round of corrections automatically. I will look into the degree this will be possible, especially with older texts, which are often hard to read, and use non-standard spellings (which we want to retain).

1 次回应

要查看或添加评论，请登录

Simone Romano的更多文章

Music composition and rapid prototyping with generative AI and IBM watsonx

2024年5月8日

Music composition and rapid prototyping with generative AI and IBM watsonx

Introduction Welcome back to the fascinating world of GenAI, this time to investigate its powerful capabilities in…

4 条评论
Revolutionizing Document Management in SAP with Generative AI

2024年2月1日

Revolutionizing Document Management in SAP with Generative AI

Introduction Extracting information from digitized documents, such as photos or scans, can be a challenging task…
Generative AI happened

2024年1月10日

Generative AI happened

Last year was a special year for the artificial intelligence: as I mention in my last blogs, "Generative AI Happened"…

4 条评论
Innovative approach to AI project delivery with Generative AI

2023年11月24日

Innovative approach to AI project delivery with Generative AI

Introduction Traditional AI is really effective to address specific use cases, supported by data scientists team and…
Unlocking the power of generative AI to visualize functional requirements

2023年10月24日

Unlocking the power of generative AI to visualize functional requirements

Introduction One of major time-consuming activity for an IT architect is to convert functional requirements of an IT…
Generative AI for tabular data explanation: prompt limit is not a limit

2023年10月18日

Generative AI for tabular data explanation: prompt limit is not a limit

INTRODUCTION Generative AI, and in particular large language models (LLMs), have being experimented to summarise texts,…
AI pipeline to "play a picture of a musical score", and its implication in generative AI

2023年10月15日

AI pipeline to "play a picture of a musical score", and its implication in generative AI

Introduction Understand, interpret and listen the content of a musical score is something difficult if you are not a…

5 条评论
Talking with a GraphDB leveraging generative AI

2023年10月13日

Talking with a GraphDB leveraging generative AI

Have you ever wondered if it's possible to navigate a graph database using the power of generative AI? The answer is a…

2 条评论
Generative AI impact on data platform solutions

2023年10月8日

Generative AI impact on data platform solutions

Introduction Cognitive enterprises exist. Many organisations reshaped themself in last decade creating data driven…

1 条评论
Serverless streaming job on IBM Cloud

2023年2月13日

Serverless streaming job on IBM Cloud

Introduction If you are designing a solution on IBM Cloud including streaming data ingestion flow, you must consider…

See all articles

Generative AI to improve OCR

Simone Romano

Associate Partner - AI & Analytics Practice Leader

Introduction

Empowering OCR solutions with Generative AI

领英推荐

Conclusion

Simone Romano的更多文章

社区洞察

其他会员也浏览了

A Deep Dive into How Annotation Works in Machine Learning

OpenAI o3 vs. DeepSeek r1: A Comparative Analysis of Reasoning Models

Introduction to Retrieval-Augmented Generation (RAG) Architectures

Generative AI's Potential in the Creation of Synthetic Data

Introducing IBM's New Granite 3.0 Models for Enterprise AI! ??

Building LLM-based Application Using Langchain and OpenAI

Leveraging IBM watsonx.ai for Deployment & Inference of DeepSeek-R1 Distilled Models

Unlocking Insights from PDFs Using a Purpose-Built Annotation Tool

The Transformative Impact of Generative AI on Data Engineering

LLMOps Series: Machine Learning Pipelines for LLMs – Comparing the Best Tools

Introduction

Empowering OCR solutions with Generative AI

领英推荐

Conclusion

Simone Romano的更多文章

Music composition and rapid prototyping with generative AI and IBM watsonx

Revolutionizing Document Management in SAP with Generative AI

Generative AI happened

Innovative approach to AI project delivery with Generative AI

Unlocking the power of generative AI to visualize functional requirements

Generative AI for tabular data explanation: prompt limit is not a limit

AI pipeline to "play a picture of a musical score", and its implication in generative AI

Talking with a GraphDB leveraging generative AI

Generative AI impact on data platform solutions

Serverless streaming job on IBM Cloud

社区洞察

其他会员也浏览了

A Deep Dive into How Annotation Works in Machine Learning

OpenAI o3 vs. DeepSeek r1: A Comparative Analysis of Reasoning Models

Introduction to Retrieval-Augmented Generation (RAG) Architectures

Generative AI's Potential in the Creation of Synthetic Data

Introducing IBM's New Granite 3.0 Models for Enterprise AI! ??

Building LLM-based Application Using Langchain and OpenAI

Leveraging IBM watsonx.ai for Deployment & Inference of DeepSeek-R1 Distilled Models

Unlocking Insights from PDFs Using a Purpose-Built Annotation Tool

The Transformative Impact of Generative AI on Data Engineering

LLMOps Series: Machine Learning Pipelines for LLMs – Comparing the Best Tools