Analyze Image with Azure AI Vision
Dhiraj Patra
Cloud-Native Architect | AI, ML, GenAI Innovator & Mentor | Quantitative Financial Analyst
Analyze Images with Azure AI Vision
Azure AI Vision is an artificial intelligence capability that enables software systems to interpret visual input by analyzing images. In Microsoft Azure, the?Vision?Azure AI service provides pre-built models for common computer vision tasks, including analysis of images to suggest captions and tags, detection of common objects and people. You can also use the Azure AI Vision service to remove the background or create a foreground matting of images.
Clone the repository for this course
If you have not already cloned the?Azure AI Vision?code repository to the environment where you're working on this lab, follow these steps to do so. Otherwise, open the cloned folder in Visual Studio Code.
Provision an Azure AI Services resource
If you don't already have one in your subscription, you'll need to provision an?Azure AI Services?resource.
Prepare to use the Azure AI Vision SDK
In this exercise, you'll complete a partially implemented client application that uses the Azure AI Vision SDK to analyze images.
Note: You can choose to use the SDK for either?C#?or?Python. In the steps below, perform the actions appropriate for your preferred language.
View the images you will analyze
In this exercise, you will use the Azure AI Vision service to analyze multiple images.
Analyze an image to suggest a caption
Now you're ready to use the SDK to call the Vision service and analyze an image.
C#
C#// Authenticate Azure AI Vision client
ImageAnalysisClient client = new ImageAnalysisClient(
new Uri(aiSvcEndpoint),
new AzureKeyCredential(aiSvcKey));
Python
Python# Authenticate Azure AI Vision client
cv_client = ImageAnalysisClient(
endpoint=ai_endpoint,
credential=AzureKeyCredential(ai_key)
)
C#
C#// Get result with specified features to be retrieved
ImageAnalysisResult result = client.Analyze(
BinaryData.FromStream(stream),
VisualFeatures.Caption |
VisualFeatures.DenseCaptions |
VisualFeatures.Objects |
VisualFeatures.Tags |
VisualFeatures.People);
Python
Python# Get result with specified features to be retrieved
result = cv_client.analyze(
image_data=image_data,
visual_features=[
VisualFeatures.CAPTION,
VisualFeatures.DENSE_CAPTIONS,
VisualFeatures.TAGS,
VisualFeatures.OBJECTS,
VisualFeatures.PEOPLE],
)
C#
C#// Display analysis results
// Get image captions
if (result.Caption.Text != null)
{
Console.WriteLine(" Caption:");
Console.WriteLine($" \"{result.Caption.Text}\", Confidence {result.Caption.Confidence:0.00}\n");
}
// Get image dense captions
Console.WriteLine(" Dense Captions:");
foreach (DenseCaption denseCaption in result.DenseCaptions.Values)
{
Console.WriteLine($" Caption: '{denseCaption.Text}', Confidence: {denseCaption.Confidence:0.00}");
}
// Get image tags
// Get objects in the image
// Get people in the image
Python
Python# Display analysis results
# Get image captions
if result.caption is not None:
print("\nCaption:")
print(" Caption: '{}' (confidence: {:.2f}%)".format(result.caption.text, result.caption.confidence * 100))
# Get image dense captions
if result.dense_captions is not None:
print("\nDense Captions:")
for caption in result.dense_captions.list:
print(" Caption: '{}' (confidence: {:.2f}%)".format(caption.text, caption.confidence * 100))
# Get image tags
# Get objects in the image
# Get people in the image
C#
dotnet run images/street.jpg
Python
python image-analysis.py images/street.jpg
Get suggested tags for an image
It can sometimes be useful to identify relevant?tags?that provide clues about the contents of an image.
领英推荐
C#
C#// Get image tags
if (result.Tags.Values.Count > 0)
{
Console.WriteLine($"\n Tags:");
foreach (DetectedTag tag in result.Tags.Values)
{
Console.WriteLine($" '{tag.Name}', Confidence: {tag.Confidence:F2}");
}
}
Python
Python# Get image tags
if result.tags is not None:
print("\nTags:")
for tag in result.tags.list:
print(" Tag: '{}' (confidence: {:.2f}%)".format(tag.name, tag.confidence * 100))
Detect and locate objects in an image
Object detection?is a specific form of computer vision in which individual objects within an image are identified and their location indicated by a bounding box..
C#
C#// Get objects in the image
if (result.Objects.Values.Count > 0)
{
Console.WriteLine(" Objects:");
// Prepare image for drawing
stream.Close();
System.Drawing.Image image = System.Drawing.Image.FromFile(imageFile);
Graphics graphics = Graphics.FromImage(image);
Pen pen = new Pen(Color.Cyan, 3);
Font font = new Font("Arial", 16);
SolidBrush brush = new SolidBrush(Color.WhiteSmoke);
foreach (DetectedObject detectedObject in result.Objects.Values)
{
Console.WriteLine($" \"{detectedObject.Tags[0].Name}\"");
// Draw object bounding box
var r = detectedObject.BoundingBox;
Rectangle rect = new Rectangle(r.X, r.Y, r.Width, r.Height);
graphics.DrawRectangle(pen, rect);
graphics.DrawString(detectedObject.Tags[0].Name,font,brush,(float)r.X, (float)r.Y);
}
// Save annotated image
String output_file = "objects.jpg";
image.Save(output_file);
Console.WriteLine(" Results saved in " + output_file + "\n");
}
Python
Python# Get objects in the image
if result.objects is not None:
print("\nObjects in image:")
# Prepare image for drawing
image = Image.open(image_filename)
fig = plt.figure(figsize=(image.width/100, image.height/100))
plt.axis('off')
draw = ImageDraw.Draw(image)
color = 'cyan'
for detected_object in result.objects.list:
# Print object name
print(" {} (confidence: {:.2f}%)".format(detected_object.tags[0].name, detected_object.tags[0].confidence * 100))
# Draw object bounding box
r = detected_object.bounding_box
bounding_box = ((r.x, r.y), (r.x + r.width, r.y + r.height))
draw.rectangle(bounding_box, outline=color, width=3)
plt.annotate(detected_object.tags[0].name,(r.x, r.y), backgroundcolor=color)
# Save annotated image
plt.imshow(image)
plt.tight_layout(pad=0)
outputfile = 'objects.jpg'
fig.savefig(outputfile)
print(' Results saved in', outputfile)
Detect and locate people in an image
People detection?is a specific form of computer vision in which individual people within an image are identified and their location indicated by a bounding box.
C#
C#// Get people in the image
if (result.People.Values.Count > 0)
{
Console.WriteLine($" People:");
// Prepare image for drawing
System.Drawing.Image image = System.Drawing.Image.FromFile(imageFile);
Graphics graphics = Graphics.FromImage(image);
Pen pen = new Pen(Color.Cyan, 3);
Font font = new Font("Arial", 16);
SolidBrush brush = new SolidBrush(Color.WhiteSmoke);
foreach (DetectedPerson person in result.People.Values)
{
// Draw object bounding box
var r = person.BoundingBox;
Rectangle rect = new Rectangle(r.X, r.Y, r.Width, r.Height);
graphics.DrawRectangle(pen, rect);
// Return the confidence of the person detected
//Console.WriteLine($" Bounding box {person.BoundingBox.ToString()}, Confidence: {person.Confidence:F2}");
}
// Save annotated image
String output_file = "persons.jpg";
image.Save(output_file);
Console.WriteLine(" Results saved in " + output_file + "\n");
}
Python
Python# Get people in the image
if result.people is not None:
print("\nPeople in image:")
# Prepare image for drawing
image = Image.open(image_filename)
fig = plt.figure(figsize=(image.width/100, image.height/100))
plt.axis('off')
draw = ImageDraw.Draw(image)
color = 'cyan'
for detected_people in result.people.list:
# Draw object bounding box
r = detected_people.bounding_box
bounding_box = ((r.x, r.y), (r.x + r.width, r.y + r.height))
draw.rectangle(bounding_box, outline=color, width=3)
# Return the confidence of the person detected
#print(" {} (confidence: {:.2f}%)".format(detected_people.bounding_box, detected_people.confidence * 100))
# Save annotated image
plt.imshow(image)
plt.tight_layout(pad=0)
outputfile = 'people.jpg'
fig.savefig(outputfile)
print(' Results saved in', outputfile)
Note: In the preceding tasks, you used a single method to analyze the image, and then incrementally added code to parse and display the results. The SDK also provides individual methods for suggesting captions, identifying tags, detecting objects, and so on - meaning that you can use the most appropriate method to return only the information you need, reducing the size of the data payload that needs to be returned. See the?.NET SDK documentation?or?Python SDK documentation?for more details.
Remove the background or generate a foreground matte of an image
In some cases, you may need to create remove the background of an image or might want to create a foreground matte of that image. Let's start with the background removal.
C#
C#// Remove the background from the image or generate a foreground matte
Console.WriteLine($" Background removal:");
// Define the API version and mode
string apiVersion = "2023-02-01-preview";
string mode = "backgroundRemoval"; // Can be "foregroundMatting" or "backgroundRemoval"
string url = $"computervision/imageanalysis:segment?api-version={apiVersion}&mode={mode}";
// Make the REST call
using (var client = new HttpClient())
{
var contentType = new MediaTypeWithQualityHeaderValue("application/json");
client.BaseAddress = new Uri(endpoint);
client.DefaultRequestHeaders.Accept.Add(contentType);
client.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", key);
var data = new
{
url = $"https://github.com/MicrosoftLearning/mslearn-ai-vision/blob/main/Labfiles/01-analyze-images/Python/image-analysis/{imageFile}?raw=true"
};
var jsonData = JsonSerializer.Serialize(data);
var contentData = new StringContent(jsonData, Encoding.UTF8, contentType);
var response = await client.PostAsync(url, contentData);
if (response.IsSuccessStatusCode) {
File.WriteAllBytes("background.png", response.Content.ReadAsByteArrayAsync().Result);
Console.WriteLine(" Results saved in background.png\n");
}
else
{
Console.WriteLine($"API error: {response.ReasonPhrase} - Check your body url, key, and endpoint.");
}
}
Python
Python# Remove the background from the image or generate a foreground matte
print('\nRemoving background from image...')
url = "{}computervision/imageanalysis:segment?api-version={}&mode={}".format(endpoint, api_version, mode)
headers= {
"Ocp-Apim-Subscription-Key": key,
"Content-Type": "application/json"
}
image_url="https://github.com/MicrosoftLearning/mslearn-ai-vision/blob/main/Labfiles/01-analyze-images/Python/image-analysis/{}?raw=true".format(image_file)
body = {
"url": image_url,
}
response = requests.post(url, headers=headers, json=body)
image=response.content
with open("backgroundForeground.png", "wb") as file:
file.write(image)
print(' Results saved in backgroundForeground.png \n')
Let's now generate a foreground matte for our images.
Clean up resources
If you're not using the Azure resources created in this lab for other training modules, you can delete them to avoid incurring further charges. Here's how:
More information
In this exercise, you explored some of the image analysis and manipulation capabilities of the Azure AI Vision service. The service also includes capabilities for detecting objects and people, and other computer vision tasks.
For more information about using the?Azure AI Vision?service, see the?Azure AI Vision documentation.
Courtsy and taken from Azure.