登录查看更多内容

Google Cloud Vision and Language APIs in Action

Juliano Souza

Director of Information Technology | Technology Mentor for Startups in the EMEA Region.

发布日期: 2024年3月29日

In my journey of exploration into the kingdom of artificial intelligence (AI), I've discovered a gateway to understanding and harnessing its power through Google Cloud APIs. As I delved into the world of Generative AI and cloud computing, these APIs emerged as invaluable tools, guiding me along the path of AI development with their wealth of features and capabilities.

With a thirst for knowledge and a desire to unlock the mysteries of Generative AI, I embarked on a quest to study and experiment with various AI models and frameworks. However, I soon realized that the true magic lay in the fusion of AI with cloud technologies, enabling me to access vast computational resources and sophisticated AI algorithms at my fingertips.

Enter Google Cloud APIs - the enchanted gateways to a world of AI-driven possibilities. Through these APIs, I discovered the power of Google's AI and machine learning services, offering a seamless integration of cutting-edge AI models and cloud infrastructure. From image recognition to natural language processing, these APIs provided me with the tools to explore the depths of AI development with ease and efficiency.

As I immersed myself in the study of Generative AI, I found that the Google Cloud APIs offered a myriad of features that brought me closer to my goals. Whether it was the Vision API for analyzing images, the Language API for understanding text, or the Speech API for processing audio, each API served as a stepping stone on my journey towards mastering AI development.

In this article, I will share some quick example and insights gained from using Google Cloud APIs to study Generative AI. Through practical examples and tutorials, we will explore how these APIs can be leveraged to unlock the potential of AI and pave the way for groundbreaking innovations in AI development. Join me as we embark on a fascinating journey into the world of AI-powered creativity and discovery.

Google Cloud Vision API: A Gateway to Visual Understanding Picture this: You have an image, and you want to unravel its secrets. With Google Cloud Vision API, it's like waving a wand to reveal the hidden treasures within images. Let's explore its enchanting features:

Magical Use Cases:

Spell of OCR: Imagine having the power to transform images into readable text. With Optical Character Recognition (OCR), the Vision API can extract text from images, opening doors to a world of possibilities in data entry and document digitization.
Glimpse into Labels: Have you ever wished for a magical assistant to tell you what's in an image? Label detection with the Vision API can do just that! It identifies objects, landmarks, and even emotions captured in images, turning them into easily understandable labels.

Google Cloud Language API: Unraveling the Mysteries of Language Now, let's turn our attention to the Language API - the master of understanding the complexities of human language. It's like having a wise sage by your side, decoding the meanings behind every word:

Mystical Use Cases:

Emotion Interpreter: What if you could sense the emotions behind text? With sentiment analysis, the Language API can do just that! It evaluates the emotional tone of text, helping you gauge sentiments in customer feedback, social media posts, and beyond.
Entity Whisperer: Ever wished for a guide to navigate through the sea of names, places, and things in text? The Language API's entity recognition can identify and categorize entities, making sense of the chaos and bringing order to the linguistic realm.

Let's Dive into the Magic: A Tutorial Now, let's experience the magic firsthand with a captivating tutorial. We'll use the Vision API to perform OCR on an image, extract text, and then analyze its sentiment using the Language API. Ready to witness the magic unfold?

import io
import os
from google.cloud import vision
from google.auth import exceptions
from google.cloud import language_v1

# Set the path to your service account key JSON file
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = './gcp-ocr-key.json'

try:
    # Set up Google Cloud Vision API client
    client_vision = vision.ImageAnnotatorClient()
except exceptions.DefaultCredentialsError as e:
    print("Error: {}".format(e))

# Read the document image
with io.open('./nlp3.jpg', 'rb') as image_file:
    content = image_file.read()

image = vision.Image(content=content)

# Perform text detection
response = client_vision.text_detection(image=image)
texts = response.text_annotations

for text in texts:
    print('\n"{}"'.format(text.description))

# Extract OCR text
ocr_text = texts[0].description if texts else ''

# Initialize Natural Language API client
client_nlp = language_v1.LanguageServiceClient()

# Pass OCR text to NLP for analysis
# document = {"content": ocr_text, "type": language_v1.Document.Type.PLAIN_TEXT}
document = {
    "content": ocr_text,
    "type": language_v1.Document.Type.PLAIN_TEXT,
    "language": "en"  # Specify the language code of the OCR text (e.g., 'en' for English)
}

# Analyze sentiment
response = client_nlp.analyze_sentiment(request={'document': document})
# Print sentiment score
print('Sentiment Score: ', response.document_sentiment.score)

# Read the document image
with io.open('./nlp3.jpg', 'rb') as image_file:
    content = image_file.read()

image = vision.Image(content=content)

# Perform label detection using Google Cloud Vision API
response = client_vision.label_detection(image=image)
labels = response.label_annotations

# Print detected labels
print('Labels detected:')
for label in labels:
    print(label.description)

Here's a breakdown of what the code does:

Setting Up Credentials:The code sets the path to the service account key JSON file required for authentication with Google Cloud services.
Initializing Google Cloud Vision API Client:It sets up the Google Cloud Vision API client using vision.ImageAnnotatorClient() to interact with the Vision API.
Reading Image File:The code reads the image file (nlp3.jpg) using the io.open() function and stores the content in memory.
Performing Text Detection:Using the Vision API client, the code performs text detection on the image by calling client_vision.text_detection(image=image).It retrieves the detected text annotations from the response.
Printing Detected Texts:The code iterates over the detected text annotations and prints them to the console.
Extracting OCR Text:It extracts the OCR text from the detected text annotations, which is stored in the variable ocr_text.
Initializing Google Cloud Language API Client:The code initializes the Google Cloud Language API client using language_v1.LanguageServiceClient() to interact with the Language API.
Passing OCR Text to NLP for Analysis:The OCR text extracted from the image is passed to the Language API for sentiment analysis.It creates a document object containing the OCR text and specifies its type as plain text.
Analyzing Sentiment:The code calls client_nlp.analyze_sentiment() with the document object to analyze the sentiment of the OCR text.It retrieves the sentiment score from the response.
Printing Sentiment Score:Finally, the sentiment score obtained from the Language API response is printed to the console.
Performing Label Detection:

The code reads the image file again to prepare for label detection.
Using the Vision API client, it performs label detection on the image by calling client_vision.label_detection(image=image).
It retrieves the detected label annotations from the response and prints them to the console.

领英推荐

IBM’s Growing Generative AI Investments

Sramana Mitra 1 年前

Unlocking Innovation: How AWS is Transforming Business…

Santosh G 2 个月前

Azure Speech Services, Midjourney 5.1, sktime, and…

Open Data Science Conference (ODSC) 1 年前

Running the code

This is the picture that I want to explore Google Vision capabilities:

Running my python code:

The output of the code reveals the following results:

Detected Texts:The text detection process has successfully identified and printed the following text annotations:"Be there for others, but never leave yourself behind.""DODINSKY""woman's day"
OCR Text:The OCR text extracted from the detected texts is:"Be there for others, but never leave yourself behind. DODINSKY woman's day"
Sentiment Score:The sentiment analysis conducted on the OCR text yields a sentiment score of 0.0.

The sentiment score is a numerical value that represents the overall sentiment or emotional tone conveyed by a piece of text. It is calculated based on the analysis of the language used in the text and can range from negative to positive, with zero typically representing a neutral sentiment.

In the context of the code you provided, the sentiment score is obtained through the sentiment analysis performed using the Google Cloud Natural Language API. This API examines the text extracted from the image and assigns a sentiment score to indicate whether the text expresses a positive, negative, or neutral sentiment.

A sentiment score of:

0.0 indicates a neutral sentiment, meaning that the text does not convey a particularly positive or negative emotion.
Positive values indicate a positive sentiment, with higher values representing stronger positive emotions.
Negative values indicate a negative sentiment, with lower values representing stronger negative emotions.

In the case of your results with a sentiment score of 0.0, it suggests that the text extracted from the image does not convey a strong emotional sentiment and is considered neutral.

Overall, the code has effectively performed Optical Character Recognition (OCR) on the given image (nlp3.jpg), extracted the text, and then analyzed the sentiment of the extracted text using the Google Cloud Vision and Language APIs. The sentiment score indicates a neutral sentiment in this case.

Labels: The output "Labels detected" represents the objects, concepts, or entities that have been identified within the image through label detection using the Google Cloud Vision API. Let's break down what each label represents:

Human: This label indicates the presence of a human being or part of a human body in the image.
Gesture: It suggests that there may be a hand gesture or body movement captured in the image.
Thumb: This label identifies the presence of a thumb, which is a digit on the human hand.
Finger: It indicates the presence of fingers, which are the digits on the human hand.
Font: This label suggests that there may be text in a specific font style present in the image.
Nail: It identifies the presence of fingernails, which are part of the human body.
Wrist: This label indicates the presence of a wrist, which is the joint connecting the hand to the forearm.
Event: It suggests that there may be elements related to an event captured in the image, such as a celebration or gathering.
Fashion accessory: This label indicates the presence of an accessory worn for fashion purposes, such as jewelry or watches.
Jewelry: It identifies the presence of items like rings, bracelets, or necklaces worn as adornments.

Conclusion

This simple yet insightful example underscores the remarkable capabilities of Google Cloud APIs, demonstrating their potential to unravel the intricacies of image data through OCR and glean valuable insights from text through sentiment analysis. Through this demonstration, we've only scratched the surface of the vast possibilities that Google Cloud APIs offer, hinting at the boundless opportunities for innovation and exploration in the realm of AI-driven solutions. As we continue to delve deeper into the realms of artificial intelligence and cloud computing, let this example serve as a testament to the transformative power of technology and the endless horizons it presents for those embarking on the journey of AI development.

Mirko Peters

Digital Marketing Analyst @ Sivantos

8 个月

Excited to see the power of Google Cloud Vision and Language APIs in action! ???? #InnovativeTechnology #InsightfulAnalysis

3 次回应

Cindy McClung

??"Suggested Term" Optimization for Home Care/Health |??Sculpting Success With Fully Automated Marketing Process |??200+ businesses auto-suggested by Google | ???Effortlessly get online customer reviews | ??Near Me

8 个月

Exciting journey into the world of Google Cloud Vision and Language APIs! ??

1 次回应

Emeric Marc

I help companies resuscitate dead leads and sell using AI ?????????????? #copywriting #emailmarketing #coldemail #content #databasereactivation

8 个月

Such an insightful exploration! The integration of Google Cloud Vision and Language APIs truly opens doors to a world of valuable data insights.

3 次回应

Gil Araujo

DevOps Engineer | GCP Certified | AWS Cloud Engineer

8 个月

Amazing journey Juliano Souza!! Way to go!! ??

1 次回应

查看更多评论

要查看或添加评论，请登录

Juliano Souza的更多文章

Building a Real-Time Data Pipeline with Apache Kafka, ClickHouseDB, and AWS S3 for Data Integration and Normalization

2024年11月4日

Building a Real-Time Data Pipeline with Apache Kafka, ClickHouseDB, and AWS S3 for Data Integration and Normalization

For businesses dealing with complex data from various databases, implementing an effective real-time data pipeline is…
Como Criar Aplicativos RAG para Tratamento de Documentos

2024年10月23日

Como Criar Aplicativos RAG para Tratamento de Documentos

No mundo corporativo atual, as empresas lidam com volumes gigantescos de dados diariamente. Documentos como contratos…
Streamlit: Simplifying the Creation of Web Apps for AI and Data

2024年10月15日

Streamlit: Simplifying the Creation of Web Apps for AI and Data

What is Streamlit? Streamlit is an open-source Python framework that makes it easy to create interactive, dynamic web…
Blockchain and DeFi: From Scratch to Local Testing with Ganache and Truffle Suite

2024年10月10日

Blockchain and DeFi: From Scratch to Local Testing with Ganache and Truffle Suite

The Disruptive Power of DeFi Decentralized Finance (DeFi) is transforming the financial landscape by providing open…
Comparing Rust, C++, Python, Java, Go, and TypeScript/Node.js for Low-Latency, HFT, and Trading Applications

2024年10月1日

Comparing Rust, C++, Python, Java, Go, and TypeScript/Node.js for Low-Latency, HFT, and Trading Applications

When developing high-performance, low-latency systems such as High-Frequency Trading (HFT) platforms or other financial…

3 条评论
Trading Systems 101

2024年9月25日

Trading Systems 101

Understanding Trading Systems: From Buy/Sell Orders to Market Data Publication In today’s financial markets, the…
Cursor: O Futuro da Programa??o Assistida por IA

2024年9月18日

Cursor: O Futuro da Programa??o Assistida por IA

Introdu??o Com o avan?o da inteligência artificial, ferramentas inovadoras têm surgido para revolucionar a maneira como…
Como a IA Revoluciona a Seguran?a da Informa??o e SIEM

2024年9月10日

Como a IA Revoluciona a Seguran?a da Informa??o e SIEM

Sumário Executivo A integra??o de Inteligência Artificial (IA) com Sistemas de Gerenciamento de Eventos e Informa??es…
Exploring Open Finance: The Future of Integrated Finance and the Role of APIs in the Banking Ecosystem

2024年8月31日

Exploring Open Finance: The Future of Integrated Finance and the Role of APIs in the Banking Ecosystem

Introduction In recent years, the concept of Open Finance has gained significant traction worldwide, promising to…
Comparando KrakenD com Outros API Gateways

2024年8月30日

Comparando KrakenD com Outros API Gateways

API Gateways s?o componentes essenciais na arquitetura moderna de microservi?os, atuando como intermediários que…

2 条评论

See all articles

Google Cloud Vision and Language APIs in Action

Juliano Souza

Director of Information Technology | Technology Mentor for Startups in the EMEA Region.

领英推荐

Running the code

Conclusion

Juliano Souza的更多文章

社区洞察

其他会员也浏览了

Google Cloud Unveils Sweeping Generative AI Lineup, Outlining an Ambitious Strategy

The Top 20 Artificial Intelligence (AI) Tools For Your Business

Microsoft Azure AI Fundamentals: Get started with artificial intelligence

Generative AI: Reshaping Cloud and Edge Computing

Develop and Deploy Generative AI Applications on AWS with Eviden’s GenOps Framework - Part 4

The API Wars, Stable Video 4D, Meta's 405Bn Parameter Model

?? Unlock the Power of AWS AI: Key Terms You Need to Know! ??

The AI Race: Amazon's Olympus Project Takes on Tech Titans

What is Artificial Intelligence as a Service (AIaaS)? All you need to know about!

APIs and AI. Challenges and Opportunities

领英推荐

Running the code

Conclusion

Juliano Souza的更多文章

Building a Real-Time Data Pipeline with Apache Kafka, ClickHouseDB, and AWS S3 for Data Integration and Normalization

Como Criar Aplicativos RAG para Tratamento de Documentos

Streamlit: Simplifying the Creation of Web Apps for AI and Data

Blockchain and DeFi: From Scratch to Local Testing with Ganache and Truffle Suite

Comparing Rust, C++, Python, Java, Go, and TypeScript/Node.js for Low-Latency, HFT, and Trading Applications

Trading Systems 101

Cursor: O Futuro da Programa??o Assistida por IA

Como a IA Revoluciona a Seguran?a da Informa??o e SIEM

Exploring Open Finance: The Future of Integrated Finance and the Role of APIs in the Banking Ecosystem

Comparando KrakenD com Outros API Gateways

社区洞察

其他会员也浏览了

Google Cloud Unveils Sweeping Generative AI Lineup, Outlining an Ambitious Strategy

The Top 20 Artificial Intelligence (AI) Tools For Your Business

Microsoft Azure AI Fundamentals: Get started with artificial intelligence

Generative AI: Reshaping Cloud and Edge Computing

Develop and Deploy Generative AI Applications on AWS with Eviden’s GenOps Framework - Part 4

The API Wars, Stable Video 4D, Meta's 405Bn Parameter Model

?? Unlock the Power of AWS AI: Key Terms You Need to Know! ??

The AI Race: Amazon's Olympus Project Takes on Tech Titans

What is Artificial Intelligence as a Service (AIaaS)? All you need to know about!

APIs and AI. Challenges and Opportunities