登录查看更多内容

Processing with GCP Document AI: Exploring Pretrained Parsers

Vijay Chaudhary

Lead Software Engineer

发布日期: 2024年12月15日

GCP Document AI offers multiple products to process documents for information for different use cases. Below pre-trained extraction models are currently available for OCR and doc specific fields extraction. ?

Bank Statement ?
W2 ?
US Passport ?
Utility ?
Identity Document Proofing ?
Pay Slip ?
US Driver License ?
Expense ?
Invoice ?

Two main functions of these processors?- perform OCR and extract fixed list of entity as per the document types. Only Identity document proofing parser does OCR and quality analysis for the image passed as input. ?For some of the processors there are options for uptraining as well. These APIs can be called in two ways:?

Synchronous (online) requests typically handle a limited number of pages per document, suitable for real-time processing.?

Asynchronous (batch/offline) requests support processing larger volumes, with certain processors handling up to 200 pages per document.

To understand how these pretrained parsers work, we will focus on the US Driving License parser here and see how?fields are extracted without additional training. Below fields are part of this processor. In the case of certain fields (like date, amount, timestamp, address etc.) Document AI also returns a?normalized value?in addition to the raw extracted field, normalizing the literal text. This contains the data in a standardized format to reduce post-processing.?

Step [1] Search and navigate to document AI in console. Enable the Document AI API if required.? Choose US Driver License Parser from the list of General processors. ?

Step [2] Enter the name of processor and select region. Once created get the prediction end point to use it in Postman application. Also, you can test license image in GCP console and see returned fields.

领英推荐

Intelligent Document Processing with AWS, Mastering…

Open Data Science Conference (ODSC) 1 年前

Issue #307 - The ML Engineer ??

Alejandro Saucedo 4 个月前

Microsoft's Unified AI Building Blocks for .NET

developrec 5 个月前

Step [3] Enable required access to validate the POST request. Login to IAM & Admin section in the Google Cloud Console and ensure that documentai.user role is enabled for your login.? Print access token using this command in GCP command line to get the authorization token - gcloud auth print-access-token ???

Step [4] Open postman application, create a Post request add prediction end point copying URL. Add the authorization parameter in the header as shown here with token copied from step above. ?

Step [5] Get the Base64 encoded string for license image, needed for request body for API call.

?Step [5] Prepare the request Body in below format, replace the Content tag with your PDF base64 string. Use these parameters in the request body JSON.? ?

Step [6] If everything is set up correctly you should get a 200 status in response and you should be able to analyze the JSON response from this service.? You should be able to see the text layer, page tags and Driver’s license entity detail. ?

Summary?

Pretrained parsers in Google Cloud Document AI simplify document analysis by providing out-of-the-box solutions for common document types such as bank statements, W2 forms, utility bills, pay slips, US driver licenses, and more. These parsers primarily perform Optical Character Recognition (OCR) and extract predefined entities specific to the document type. For instance, the US Driver License Parser identifies key fields like name, address, and date of birth without additional training. The Identity Document Proofing Parser stands out by performing both OCR and image quality analysis. Pretrained parsers also support uptraining to enhance extraction for use cases where extraction is not working with the current level of training.?

Setting up involves enabling the Document AI API, creating a processor for the desired parser, and obtaining a prediction endpoint. This endpoint, along with an access token, facilitates API calls for document processing. Postman or similar tools can be used to send API requests with Base64-encoded document content, returning structured JSON responses with extracted data. Pretrained parsers eliminate the need for extensive model training, offering efficient solutions for general document extraction needs.?

AI-ML & Automations

1,576 位关注者

Ashley Andrien

ECM Business Development Executive @enChoice | Growth Strategist | Marketing & Event Planning | Fitness Enthusiast | Biohazard & Dental PPC Expert

3 个月

Insightful!

查看更多评论

要查看或添加评论，请登录

Vijay Chaudhary的更多文章

Understanding RAG Evaluation: A Practical Approach to Retrieval Metrics

2025年3月16日

Understanding RAG Evaluation: A Practical Approach to Retrieval Metrics

Retrieval-Augmented Generation (RAG) systems are gaining popularity, helping users find relevant documents to answer…

1 条评论
Splitting Text Right Way - NLTK, SpaCy or Markdown

2025年3月2日

Splitting Text Right Way - NLTK, SpaCy or Markdown

For natural language processing (NLP) working with large pieces of text can be challenging. Many language models have…

1 条评论
Unlocking Entities and Relations: Creating Knowledge Graphs with AI

2025年2月16日

Unlocking Entities and Relations: Creating Knowledge Graphs with AI

GraphRAG is something which is picking up recently, in this article we will try to get to the basics of GraphRag…
Structured Outputs from LLMs: LangChain Output Parsers

2025年2月9日

Structured Outputs from LLMs: LangChain Output Parsers

LLMs are good at generating human-like text (hence called Generative AI), but when it comes to integrating to…
Handling Sensitive Data: Redaction, Masking and Compliance

2025年2月2日

Handling Sensitive Data: Redaction, Masking and Compliance

In today's data-driven world, digital documents containing sensitive information pose challenges to privacy and…
Optimizing AI Workflows with LangChain - A Practical Introduction

2025年1月25日

Optimizing AI Workflows with LangChain - A Practical Introduction

LangChain is a framework for developing applications powered by large language models (LLMs). It helps in simplifying…
Effortlessly Organize Mixed Documents with GCP's Custom Splitter Feature

2025年1月19日

Effortlessly Organize Mixed Documents with GCP's Custom Splitter Feature

In real-world scenarios, it's common to encounter multiple documents combined into a single, multi-page image or PDF…
Improving AI Contextual Understanding -Retrieval Augmented Generation (RAG)

2025年1月4日

Improving AI Contextual Understanding -Retrieval Augmented Generation (RAG)

Retrieval-Augmented Generation (RAG) is a technique in natural language processing that uses knowledgebase information…

2 条评论
Understanding Custom Classifiers in Google Document AI

2024年12月29日

Understanding Custom Classifiers in Google Document AI

There are three categories of models or services in GCP Document AI – General Document processors (Layout, Form and Doc…
Custom Document Extractors with Google Document AI

2024年12月8日

Custom Document Extractors with Google Document AI

GCP Document AI broadly has three categories of document extraction models – General Document processors (Layout, Form…

See all articles

Processing with GCP Document AI: Exploring Pretrained Parsers

Vijay Chaudhary

Lead Software Engineer

领英推荐

AI-ML & Automations

1,576 位关注者

Vijay Chaudhary的更多文章

社区洞察

其他会员也浏览了

Comparing RAG Chatbot implementations with Databricks, Snowflake, and Azure OpenAI

Docker AI Catalog : The Future of Curated AI Models and LLMs

Comparison between OpenAI and OCI Gen AI Services - Pricing, Data Security, and Model Diversity

Foundation Models Made Easy with Bedrock

No Code Retrieval-Augmented Generation (RAG) with OCI Generative AI Agents

How Generative AI, Knowledge Graphs, and Data Fabric Can Revolutionize Law Firm Data Management

Deploying Machine Learning Models – Overcoming Key Challenges

A Closer Look at the Major Players GenAI Stack

The Right Machine Learning Lifecycle Tool?

ITVersity's AI and Data Newsletter - 25-07 Edition - 1

领英推荐

AI-ML & Automations

1,576 位关注者

Vijay Chaudhary的更多文章

Understanding RAG Evaluation: A Practical Approach to Retrieval Metrics

Splitting Text Right Way - NLTK, SpaCy or Markdown

Unlocking Entities and Relations: Creating Knowledge Graphs with AI

Structured Outputs from LLMs: LangChain Output Parsers

Handling Sensitive Data: Redaction, Masking and Compliance

Optimizing AI Workflows with LangChain - A Practical Introduction

Effortlessly Organize Mixed Documents with GCP's Custom Splitter Feature

Improving AI Contextual Understanding -Retrieval Augmented Generation (RAG)

Understanding Custom Classifiers in Google Document AI

Custom Document Extractors with Google Document AI

社区洞察

其他会员也浏览了

Comparing RAG Chatbot implementations with Databricks, Snowflake, and Azure OpenAI

Docker AI Catalog : The Future of Curated AI Models and LLMs

Comparison between OpenAI and OCI Gen AI Services - Pricing, Data Security, and Model Diversity

Foundation Models Made Easy with Bedrock

No Code Retrieval-Augmented Generation (RAG) with OCI Generative AI Agents

How Generative AI, Knowledge Graphs, and Data Fabric Can Revolutionize Law Firm Data Management

Deploying Machine Learning Models – Overcoming Key Challenges

A Closer Look at the Major Players GenAI Stack

The Right Machine Learning Lifecycle Tool?

ITVersity's AI and Data Newsletter - 25-07 Edition - 1