Processing with GCP Document AI: Exploring Pretrained Parsers
GCP Document AI offers multiple products to process documents for information for different use cases. Below pre-trained extraction models are currently available for OCR and doc specific fields extraction. ?
Two main functions of these processors?- perform OCR and extract fixed list of entity as per the document types. Only Identity document proofing parser does OCR and quality analysis for the image passed as input. ?For some of the processors there are options for uptraining as well. These APIs can be called in two ways:?
To understand how these pretrained parsers work, we will focus on the US Driving License parser here and see how?fields are extracted without additional training. Below fields are part of this processor. In the case of certain fields (like date, amount, timestamp, address etc.) Document AI also returns a?normalized value?in addition to the raw extracted field, normalizing the literal text. This contains the data in a standardized format to reduce post-processing.?
Step [1] Search and navigate to document AI in console. Enable the Document AI API if required.? Choose US Driver License Parser from the list of General processors. ?
Step [2] Enter the name of processor and select region. Once created get the prediction end point to use it in Postman application. Also, you can test license image in GCP console and see returned fields.
领英推荐
Step [3] Enable required access to validate the POST request. Login to IAM & Admin section in the Google Cloud Console and ensure that documentai.user role is enabled for your login.? Print access token using this command in GCP command line to get the authorization token - gcloud auth print-access-token ???
Step [4] Open postman application, create a Post request add prediction end point copying URL. Add the authorization parameter in the header as shown here with token copied from step above. ?
Step [5] Get the Base64 encoded string for license image, needed for request body for API call.
?Step [5] Prepare the request Body in below format, replace the Content tag with your PDF base64 string. Use these parameters in the request body JSON.? ?
Step [6] If everything is set up correctly you should get a 200 status in response and you should be able to analyze the JSON response from this service.? You should be able to see the text layer, page tags and Driver’s license entity detail. ?
Summary?
Pretrained parsers in Google Cloud Document AI simplify document analysis by providing out-of-the-box solutions for common document types such as bank statements, W2 forms, utility bills, pay slips, US driver licenses, and more. These parsers primarily perform Optical Character Recognition (OCR) and extract predefined entities specific to the document type. For instance, the US Driver License Parser identifies key fields like name, address, and date of birth without additional training. The Identity Document Proofing Parser stands out by performing both OCR and image quality analysis. Pretrained parsers also support uptraining to enhance extraction for use cases where extraction is not working with the current level of training.?
Setting up involves enabling the Document AI API, creating a processor for the desired parser, and obtaining a prediction endpoint. This endpoint, along with an access token, facilitates API calls for document processing. Postman or similar tools can be used to send API requests with Base64-encoded document content, returning structured JSON responses with extracted data. Pretrained parsers eliminate the need for extensive model training, offering efficient solutions for general document extraction needs.?
ECM Business Development Executive @enChoice | Growth Strategist | Marketing & Event Planning | Fitness Enthusiast | Biohazard & Dental PPC Expert
3 个月Insightful!