Google Cloud Vision and Language APIs in Action

Google Cloud Vision and Language APIs in Action


In my journey of exploration into the kingdom of artificial intelligence (AI), I've discovered a gateway to understanding and harnessing its power through Google Cloud APIs. As I delved into the world of Generative AI and cloud computing, these APIs emerged as invaluable tools, guiding me along the path of AI development with their wealth of features and capabilities.

With a thirst for knowledge and a desire to unlock the mysteries of Generative AI, I embarked on a quest to study and experiment with various AI models and frameworks. However, I soon realized that the true magic lay in the fusion of AI with cloud technologies, enabling me to access vast computational resources and sophisticated AI algorithms at my fingertips.

Enter Google Cloud APIs - the enchanted gateways to a world of AI-driven possibilities. Through these APIs, I discovered the power of Google's AI and machine learning services, offering a seamless integration of cutting-edge AI models and cloud infrastructure. From image recognition to natural language processing, these APIs provided me with the tools to explore the depths of AI development with ease and efficiency.

As I immersed myself in the study of Generative AI, I found that the Google Cloud APIs offered a myriad of features that brought me closer to my goals. Whether it was the Vision API for analyzing images, the Language API for understanding text, or the Speech API for processing audio, each API served as a stepping stone on my journey towards mastering AI development.

In this article, I will share some quick example and insights gained from using Google Cloud APIs to study Generative AI. Through practical examples and tutorials, we will explore how these APIs can be leveraged to unlock the potential of AI and pave the way for groundbreaking innovations in AI development. Join me as we embark on a fascinating journey into the world of AI-powered creativity and discovery.

Google Cloud Vision API: A Gateway to Visual Understanding Picture this: You have an image, and you want to unravel its secrets. With Google Cloud Vision API, it's like waving a wand to reveal the hidden treasures within images. Let's explore its enchanting features:

Magical Use Cases:

  • Spell of OCR: Imagine having the power to transform images into readable text. With Optical Character Recognition (OCR), the Vision API can extract text from images, opening doors to a world of possibilities in data entry and document digitization.
  • Glimpse into Labels: Have you ever wished for a magical assistant to tell you what's in an image? Label detection with the Vision API can do just that! It identifies objects, landmarks, and even emotions captured in images, turning them into easily understandable labels.

Google Cloud Language API: Unraveling the Mysteries of Language Now, let's turn our attention to the Language API - the master of understanding the complexities of human language. It's like having a wise sage by your side, decoding the meanings behind every word:

Mystical Use Cases:

  • Emotion Interpreter: What if you could sense the emotions behind text? With sentiment analysis, the Language API can do just that! It evaluates the emotional tone of text, helping you gauge sentiments in customer feedback, social media posts, and beyond.
  • Entity Whisperer: Ever wished for a guide to navigate through the sea of names, places, and things in text? The Language API's entity recognition can identify and categorize entities, making sense of the chaos and bringing order to the linguistic realm.

Let's Dive into the Magic: A Tutorial Now, let's experience the magic firsthand with a captivating tutorial. We'll use the Vision API to perform OCR on an image, extract text, and then analyze its sentiment using the Language API. Ready to witness the magic unfold?

import io
import os
from google.cloud import vision
from google.auth import exceptions
from google.cloud import language_v1

# Set the path to your service account key JSON file
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = './gcp-ocr-key.json'

try:
    # Set up Google Cloud Vision API client
    client_vision = vision.ImageAnnotatorClient()
except exceptions.DefaultCredentialsError as e:
    print("Error: {}".format(e))

# Read the document image
with io.open('./nlp3.jpg', 'rb') as image_file:
    content = image_file.read()

image = vision.Image(content=content)

# Perform text detection
response = client_vision.text_detection(image=image)
texts = response.text_annotations

for text in texts:
    print('\n"{}"'.format(text.description))

# Extract OCR text
ocr_text = texts[0].description if texts else ''

# Initialize Natural Language API client
client_nlp = language_v1.LanguageServiceClient()

# Pass OCR text to NLP for analysis
# document = {"content": ocr_text, "type": language_v1.Document.Type.PLAIN_TEXT}
document = {
    "content": ocr_text,
    "type": language_v1.Document.Type.PLAIN_TEXT,
    "language": "en"  # Specify the language code of the OCR text (e.g., 'en' for English)
}

# Analyze sentiment
response = client_nlp.analyze_sentiment(request={'document': document})
# Print sentiment score
print('Sentiment Score: ', response.document_sentiment.score)

# Read the document image
with io.open('./nlp3.jpg', 'rb') as image_file:
    content = image_file.read()

image = vision.Image(content=content)

# Perform label detection using Google Cloud Vision API
response = client_vision.label_detection(image=image)
labels = response.label_annotations

# Print detected labels
print('Labels detected:')
for label in labels:
    print(label.description)        

Here's a breakdown of what the code does:

  1. Setting Up Credentials:The code sets the path to the service account key JSON file required for authentication with Google Cloud services.
  2. Initializing Google Cloud Vision API Client:It sets up the Google Cloud Vision API client using vision.ImageAnnotatorClient() to interact with the Vision API.
  3. Reading Image File:The code reads the image file (nlp3.jpg) using the io.open() function and stores the content in memory.
  4. Performing Text Detection:Using the Vision API client, the code performs text detection on the image by calling client_vision.text_detection(image=image).It retrieves the detected text annotations from the response.
  5. Printing Detected Texts:The code iterates over the detected text annotations and prints them to the console.
  6. Extracting OCR Text:It extracts the OCR text from the detected text annotations, which is stored in the variable ocr_text.
  7. Initializing Google Cloud Language API Client:The code initializes the Google Cloud Language API client using language_v1.LanguageServiceClient() to interact with the Language API.
  8. Passing OCR Text to NLP for Analysis:The OCR text extracted from the image is passed to the Language API for sentiment analysis.It creates a document object containing the OCR text and specifies its type as plain text.
  9. Analyzing Sentiment:The code calls client_nlp.analyze_sentiment() with the document object to analyze the sentiment of the OCR text.It retrieves the sentiment score from the response.
  10. Printing Sentiment Score:Finally, the sentiment score obtained from the Language API response is printed to the console.
  11. Performing Label Detection:

  • The code reads the image file again to prepare for label detection.
  • Using the Vision API client, it performs label detection on the image by calling client_vision.label_detection(image=image).
  • It retrieves the detected label annotations from the response and prints them to the console.

Running the code

This is the picture that I want to explore Google Vision capabilities:

Running my python code:


The output of the code reveals the following results:

  1. Detected Texts:The text detection process has successfully identified and printed the following text annotations:"Be there for others, but never leave yourself behind.""DODINSKY""woman's day"
  2. OCR Text:The OCR text extracted from the detected texts is:"Be there for others, but never leave yourself behind. DODINSKY woman's day"
  3. Sentiment Score:The sentiment analysis conducted on the OCR text yields a sentiment score of 0.0.

The sentiment score is a numerical value that represents the overall sentiment or emotional tone conveyed by a piece of text. It is calculated based on the analysis of the language used in the text and can range from negative to positive, with zero typically representing a neutral sentiment.

In the context of the code you provided, the sentiment score is obtained through the sentiment analysis performed using the Google Cloud Natural Language API. This API examines the text extracted from the image and assigns a sentiment score to indicate whether the text expresses a positive, negative, or neutral sentiment.

A sentiment score of:

  • 0.0 indicates a neutral sentiment, meaning that the text does not convey a particularly positive or negative emotion.
  • Positive values indicate a positive sentiment, with higher values representing stronger positive emotions.
  • Negative values indicate a negative sentiment, with lower values representing stronger negative emotions.

In the case of your results with a sentiment score of 0.0, it suggests that the text extracted from the image does not convey a strong emotional sentiment and is considered neutral.

Overall, the code has effectively performed Optical Character Recognition (OCR) on the given image (nlp3.jpg), extracted the text, and then analyzed the sentiment of the extracted text using the Google Cloud Vision and Language APIs. The sentiment score indicates a neutral sentiment in this case.

Labels: The output "Labels detected" represents the objects, concepts, or entities that have been identified within the image through label detection using the Google Cloud Vision API. Let's break down what each label represents:

  1. Human: This label indicates the presence of a human being or part of a human body in the image.
  2. Gesture: It suggests that there may be a hand gesture or body movement captured in the image.
  3. Thumb: This label identifies the presence of a thumb, which is a digit on the human hand.
  4. Finger: It indicates the presence of fingers, which are the digits on the human hand.
  5. Font: This label suggests that there may be text in a specific font style present in the image.
  6. Nail: It identifies the presence of fingernails, which are part of the human body.
  7. Wrist: This label indicates the presence of a wrist, which is the joint connecting the hand to the forearm.
  8. Event: It suggests that there may be elements related to an event captured in the image, such as a celebration or gathering.
  9. Fashion accessory: This label indicates the presence of an accessory worn for fashion purposes, such as jewelry or watches.
  10. Jewelry: It identifies the presence of items like rings, bracelets, or necklaces worn as adornments.

Conclusion

This simple yet insightful example underscores the remarkable capabilities of Google Cloud APIs, demonstrating their potential to unravel the intricacies of image data through OCR and glean valuable insights from text through sentiment analysis. Through this demonstration, we've only scratched the surface of the vast possibilities that Google Cloud APIs offer, hinting at the boundless opportunities for innovation and exploration in the realm of AI-driven solutions. As we continue to delve deeper into the realms of artificial intelligence and cloud computing, let this example serve as a testament to the transformative power of technology and the endless horizons it presents for those embarking on the journey of AI development.

Mirko Peters

Digital Marketing Analyst @ Sivantos

8 个月

Excited to see the power of Google Cloud Vision and Language APIs in action! ???? #InnovativeTechnology #InsightfulAnalysis

Cindy McClung

??"Suggested Term" Optimization for Home Care/Health |??Sculpting Success With Fully Automated Marketing Process |??200+ businesses auto-suggested by Google | ???Effortlessly get online customer reviews | ??Near Me

8 个月

Exciting journey into the world of Google Cloud Vision and Language APIs! ??

Emeric Marc

I help companies resuscitate dead leads and sell using AI ?????????????? #copywriting #emailmarketing #coldemail #content #databasereactivation

8 个月

Such an insightful exploration! The integration of Google Cloud Vision and Language APIs truly opens doors to a world of valuable data insights.

Gil Araujo

DevOps Engineer | GCP Certified | AWS Cloud Engineer

8 个月

Amazing journey Juliano Souza!! Way to go!! ??

要查看或添加评论,请登录

Juliano Souza的更多文章

社区洞察

其他会员也浏览了