登录查看更多内容

Extract Numbers from Images by OCR

Dhiraj Patra

AI, ML, GenAI, IoT Innovator & Mentor | Software Architect

发布日期: 2023年5月28日

All business orgranizations required to process unstructured data especially images. Like invoices we need to process manually to fed data into structured form eg. database or spred sheet.

You can do this with the help of Tesseract OCR, there are other OCR libraries and APIs available that you can use to read text from images. Here are a few alternatives:

1. **Google Cloud Vision API**: Google Cloud Vision API provides powerful OCR capabilities. You can send image data to the API and receive text extraction results. It supports various languages and provides options for advanced OCR features such as document layout analysis. You’ll need to sign up for the Google Cloud platform and set up the Vision API to use this service.

2. **Microsoft Azure Computer Vision API**: Microsoft Azure offers the Computer Vision API, which includes OCR functionality. It allows you to extract text from images, supports multiple languages, and provides options for advanced OCR features. You’ll need to sign up for Microsoft Azure and create an API key to access the Computer Vision API.

3. **PyOCR**: PyOCR is a Python library that provides a simple interface to various OCR engines, including Tesseract, OCRopus, and Google Tesseract OCR. It allows you to extract text from images using different OCR engines by providing a consistent interface. You can install PyOCR using pip (`pip install pyocr`) and choose the OCR engine you want to use.

4. **Amazon Textract**: Amazon Textract is a fully managed OCR service provided by Amazon Web Services (AWS). It enables you to extract text and data from images and PDF documents. You can integrate Textract into your applications using the AWS SDKs or command-line interface (CLI). You’ll need to sign up for AWS and set up the Textract service to use this OCR solution.

These are just a few alternatives to Tesseract OCR. Each option has its own features, advantages, and usage requirements. You can explore these options and choose the one that best suits your needs and constraints.

However we are going make our application FREE of cloud tool cost with the help of open source libraries and Python.

Steps we will follow for this demo project are following:

Download any invoice image from internet for demo. I have taken this one

Now we need to get the co-ordinate of all those data we want to extract and read. For this demo we will read only three numbers from this invoice image.

To get the co-ordinates of those three numbers to get position in the invoice image. Go to?https://pixspy.com/?[you can use any of this kind of tool available in your laptop.

We need x1, y1, x2, y2 position values for each of those numbers. x1, y1 is the near left top corner of a number and x2, y2 are lower right corner co-ordinate. When you hover your pointer on specific place [keep little gap not much from the number to read the co-ordinates] and write down the values.

Now we need to install few librarires for python application including pytesseract and opencv if they already not installed in your system. Take help from the below help. This example, you need:

OpenCV (pip install opencv-python)
pytesseract (pip install pytesseract), along with the Tesseract OCR engine (available at?https://github.com/tesseract-ocr/tesseract)

领英推荐

A Brief History of AI

Timescale 7 个月前

A Metaflow serverless Story

Nuvolaris Inc 1 年前

SAS Viya and its cloud economics facts that could help…

Sankhyana Consultancy Services Pvt. Ltd. 1 年前

We will use image processing techniques along with Optical Character Recognition (OCR) to recognize and extract text or other relevant information.

Here is a general outline of the steps involved in extracting data from specific spaces in an image:

Load and preprocess the image: Use a suitable library, such as OpenCV or PIL, to load the image. Preprocess the image as necessary, which may involve resizing, cropping, or enhancing the image to improve text recognition.

Identify and localize the specific spaces: Use image processing techniques to locate and isolate the regions of interest (ROIs) that contain the data you want to extract. This may involve techniques such as edge detection, contour detection, or template matching, depending on the specific characteristics of the spaces you want to extract data from.

Perform OCR on the ROIs: Apply OCR algorithms to the localized ROIs to recognize and extract the text or relevant information. Tesseract is a popular open-source OCR engine that you can use in combination with Python libraries like pytesseract to extract text from images.

Post-process the extracted data: Once you have extracted the text from the ROIs, you can perform additional post-processing steps to clean, validate, or format the extracted data as per your requirements.

You can save the extracted data into database [not added those process in this demo script].

Keep in mind that the specific implementation may vary depending on the nature of the images and the spaces you are working with. It’s important to experiment with different image processing techniques, OCR settings, and post-processing steps to achieve accurate and reliable extraction of data from the specific spaces in your images.

After done the installation of libraries and collected the co-ordinates. Now we need to write the code. You can use any IDE or jupyter notebook for this.

import cv
import pytesseract
import matplotlib.pyplot as plt
import re

# Load the image change as per your invoice image name and path 
image = cv2.imread('invoice-template-us-neat-750px.png')

def read_character(roi_coordinates):
    """
    This will read the image part and find the word or number in it
    """
    # Iterate over the ROIs
    for i, (x1, y1, x2, y2) in enumerate(roi_coordinates):
        # Ensure the coordinates are within the image dimensions
        x1 = max(0, x1)
        y1 = max(0, y1)
        x2 = min(image.shape[1], x2)
        y2 = min(image.shape[0], y2)
    
        # Crop the ROI from the image
        roi = image[y1:y2, x1:x2]
    
        # Convert the ROI to grayscale
        gray_roi = cv2.cvtColor(roi, cv2.COLOR_BGR2GRAY)

        # Display the grayscale ROI
        plt.figure()
        plt.imshow(cv2.cvtColor(gray_roi, cv2.COLOR_GRAY2RGB))
        plt.axis('off')
        plt.show()
    
        # Apply image preprocessing if required
        # Example 1: Thresholding
        _, thresholded_roi = cv2.threshold(gray_roi, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)
    
        # Example 2: Denoising (using a bilateral filter)
        denoised_roi = cv2.bilateralFilter(thresholded_roi, 9, 75, 75)
    
        # Perform OCR on the preprocessed ROI using pytesseract
        extracted_text = pytesseract.image_to_string(denoised_roi, config='--psm 7')  # Use page segmentation mode 7 for treating the image as a single line of text
        
        # Extract numbers from the extracted text
        numbers = re.findall(r'\d+', extracted_text)
        
        # Display the extracted numbers
        if numbers:
            print(f"Numbers from ROI: {'.'.join(numbers)}")
        else:
            print(f"No numbers found in ROI")

# change the co-ordinate values as per your required numbers location 
# in image
image_parts = [{'x1':640, 'y1':385, 'x2':690, 'y2':400}, 
               {'x1':640, 'y1':410, 'x2':690, 'y2':435}, 
               {'x1':640, 'y1':448, 'x2':690, 'y2':470}]

# Define the regions of interest (ROIs) where you want to extract data
for coordinate in image_parts:
    x1, y1, x2, y2 = coordinate.values()
    roi_coordinates = [
        (x1, y1, x2, y2),  # Format: (top-left x, top-left y, bottom-right x, bottom-right y)
        # Add more ROI coordinates as needed
    ]
    read_character(roi_coordinates)2

Output should looks like this one

You can also use some other technique to optimize this code and process.

If you like different AI/ML and other microservices template code, kindly visit my personal github repo here?https://github.com/dhirajpatra.

Hope this will help you. Thank you.

Vaibhav V

Seasoned Product | 15+ Years in Diverse Roles from Product & Growth to Marketing and PR | AI & EdTech Enthusiast | Open to Immediate Opportunities

1 年

great article, would love to connect with you and discuss it if you are available

要查看或添加评论，请登录

Dhiraj Patra的更多文章

Future Career Options in Emerging & High-growth Technologies

2025年3月11日

Future Career Options in Emerging & High-growth Technologies

1. Artificial Intelligence & Machine Learning Generative AI (LLMs, AI copilots, AI automation) AI for cybersecurity and…
Construction Pollution in India: A Silent Killer of Lungs and Lives

2025年3月9日

Construction Pollution in India: A Silent Killer of Lungs and Lives

Construction Pollution in India: A Silent Killer of Lungs and Lives India is witnessing rapid urbanization, with…
COBOT with GenAI and Federated Learning

2025年3月3日

COBOT with GenAI and Federated Learning

The integration of Generative AI (GenAI) and Large Language Models (LLMs) is poised to significantly enhance the…
Robotics Study Guide

2025年2月27日

Robotics Study Guide

image credit wikimedia Here is a comprehensive study guide for robotics covering the topics you mentioned: Linux for…
Some Handy Git Use Cases

2025年2月26日

Some Handy Git Use Cases

Let's dive deeper into Git commands, especially those that are more advanced and relate to your workflow. Understanding…
Kafka with KRaft (Kafka Raft)

2025年2月26日

Kafka with KRaft (Kafka Raft)

Kafka and KRaft (Kafka Raft) Explained with Examples 1. What is Kafka? Kafka is a distributed event streaming platform…
Conversational AI Agent for SME Executive

2025年2月25日

Conversational AI Agent for SME Executive

Use Case: Consider Management Consulting companies like McKinsey, PwC or BCG. They consult with large scale enterprises…
AI Agents for EDGE AI

2025年2月23日

AI Agents for EDGE AI

?? GenAI LLM-Based Agents on Edge AI: Why, When, and How? ?? Why Use GenAI LLMs on Edge AI? Deploying Generative AI…
Introducing the Intelligent Smart Forklift

2025年2月20日

Introducing the Intelligent Smart Forklift

Introducing the Intelligent Sensor Fork Revolutionizing Forklift Safety and Efficiency Say goodbye to relying on…
Investing in Bonds to Diversified Your Portfolio

2025年2月18日

Investing in Bonds to Diversified Your Portfolio

Investing in Tax-Free Bonds: A Smart Choice for Conservative Investors Are you looking for a low-risk investment option…

See all articles

Extract Numbers from Images by OCR

Dhiraj Patra

AI, ML, GenAI, IoT Innovator & Mentor | Software Architect

领英推荐

Dhiraj Patra的更多文章

社区洞察

其他会员也浏览了

Vector Databases: Unleashing the full potential of AI

Practical Data Science with Amazon sagemaker

Diverse RAG AI Architecture Overview and Vector Search on Metadata Cloud Platform, Latest updates OpenAI o1 - Edition 3

Vector Databases: Open Source and Commercial Solutions

Zifo Semantic Search Service - technical details

Choosing a Vector Database for Your Gen AI Stack

OSAI more… 11th ed — The ‘’Fauxpen:Open’’ Ratio Approaching 10:1

AWS Machine Learning Workflow

Fueling Generative AI's Potential through Databases

Issue #199 - THE ML ENGINEER ??

领英推荐

Dhiraj Patra的更多文章

Future Career Options in Emerging & High-growth Technologies

Construction Pollution in India: A Silent Killer of Lungs and Lives

COBOT with GenAI and Federated Learning

Robotics Study Guide

Some Handy Git Use Cases

Kafka with KRaft (Kafka Raft)

Conversational AI Agent for SME Executive

AI Agents for EDGE AI

Introducing the Intelligent Smart Forklift

Investing in Bonds to Diversified Your Portfolio

社区洞察

其他会员也浏览了

Vector Databases: Unleashing the full potential of AI

Practical Data Science with Amazon sagemaker

Diverse RAG AI Architecture Overview and Vector Search on Metadata Cloud Platform, Latest updates OpenAI o1 - Edition 3

Vector Databases: Open Source and Commercial Solutions

Zifo Semantic Search Service - technical details

Choosing a Vector Database for Your Gen AI Stack

OSAI more… 11th ed — The ‘’Fauxpen:Open’’ Ratio Approaching 10:1

AWS Machine Learning Workflow

Fueling Generative AI's Potential through Databases

Issue #199 - THE ML ENGINEER ??