Analyze Image with Azure AI Vision

Analyze Image with Azure AI Vision

Analyze Images with Azure AI Vision

Azure AI Vision is an artificial intelligence capability that enables software systems to interpret visual input by analyzing images. In Microsoft Azure, the?Vision?Azure AI service provides pre-built models for common computer vision tasks, including analysis of images to suggest captions and tags, detection of common objects and people. You can also use the Azure AI Vision service to remove the background or create a foreground matting of images.

Clone the repository for this course

If you have not already cloned the?Azure AI Vision?code repository to the environment where you're working on this lab, follow these steps to do so. Otherwise, open the cloned folder in Visual Studio Code.

  1. Start Visual Studio Code.
  2. Open the palette (SHIFT+CTRL+P) and run a?Git: Clone?command to clone the?https://github.com/MicrosoftLearning/mslearn-ai-vision?repository to a local folder (it doesn't matter which folder).
  3. When the repository has been cloned, open the folder in Visual Studio Code.
  4. Wait while additional files are installed to support the C# code projects in the repo.

Provision an Azure AI Services resource

If you don't already have one in your subscription, you'll need to provision an?Azure AI Services?resource.

  1. Open the Azure portal at?https://portal.azure.com, and sign in using the Microsoft account associated with your Azure subscription.
  2. In the top search bar, search for?Azure AI services, select?Azure AI Services, and create an Azure AI services multi-service account resource with the following settings:
  3. Select the required checkboxes and create the resource.
  4. Wait for deployment to complete, and then view the deployment details.
  5. When the resource has been deployed, go to it and view its?Keys and Endpoint?page. You will need the endpoint and one of the keys from this page in the next procedure.

Prepare to use the Azure AI Vision SDK

In this exercise, you'll complete a partially implemented client application that uses the Azure AI Vision SDK to analyze images.

Note: You can choose to use the SDK for either?C#?or?Python. In the steps below, perform the actions appropriate for your preferred language.

  1. In Visual Studio Code, in the?Explorer?pane, browse to the?Labfiles/01-analyze-images?folder and expand the?C-Sharp?or?Python?folder depending on your language preference.
  2. Right-click the?image-analysis?folder and open an integrated terminal. Then install the Azure AI Vision SDK package by running the appropriate command for your language preference:
  3. View the contents of the?image-analysis?folder, and note that it contains a file for configuration settings:
  4. Note that the?image-analysis?folder contains a code file for the client application:

View the images you will analyze

In this exercise, you will use the Azure AI Vision service to analyze multiple images.

  1. In Visual Studio Code, expand the?image-analysis?folder and the?images?folder it contains.
  2. Select each of the image files in turn to view them in Visual Studio Code.

Analyze an image to suggest a caption

Now you're ready to use the SDK to call the Vision service and analyze an image.

  1. In the code file for your client application (Program.cs?or?image-analysis.py), in the?Main?function, note that the code to load the configuration settings has been provided. Then find the comment?Authenticate Azure AI Vision client. Then, under this comment, add the following language-specific code to create and authenticate a Azure AI Vision client object:

C#

C#// Authenticate Azure AI Vision client
ImageAnalysisClient client = new ImageAnalysisClient(
    new Uri(aiSvcEndpoint),
    new AzureKeyCredential(aiSvcKey));        

Python

Python# Authenticate Azure AI Vision client
cv_client = ImageAnalysisClient(
    endpoint=ai_endpoint,
    credential=AzureKeyCredential(ai_key)
)        

  1. In the?Main?function, under the code you just added, note that the code specifies the path to an image file and then passes the image path to two other functions (AnalyzeImage?and?BackgroundForeground). These functions are not yet fully implemented.
  2. In the?AnalyzeImage?function, under the comment?Get result with specify features to be retrieved, add the following code:

C#

C#// Get result with specified features to be retrieved
ImageAnalysisResult result = client.Analyze(
    BinaryData.FromStream(stream),
    VisualFeatures.Caption | 
    VisualFeatures.DenseCaptions |
    VisualFeatures.Objects |
    VisualFeatures.Tags |
    VisualFeatures.People);        

Python

Python# Get result with specified features to be retrieved
result = cv_client.analyze(
    image_data=image_data,
    visual_features=[
        VisualFeatures.CAPTION,
        VisualFeatures.DENSE_CAPTIONS,
        VisualFeatures.TAGS,
        VisualFeatures.OBJECTS,
        VisualFeatures.PEOPLE],
)        

  1. In the?AnalyzeImage?function, under the comment?Display analysis results, add the following code (including the comments indicating where you will add more code later.):

C#

C#// Display analysis results
// Get image captions
if (result.Caption.Text != null)
{
    Console.WriteLine(" Caption:");
    Console.WriteLine($"   \"{result.Caption.Text}\", Confidence {result.Caption.Confidence:0.00}\n");
}

// Get image dense captions
Console.WriteLine(" Dense Captions:");
foreach (DenseCaption denseCaption in result.DenseCaptions.Values)
{
    Console.WriteLine($"   Caption: '{denseCaption.Text}', Confidence: {denseCaption.Confidence:0.00}");
}

// Get image tags


// Get objects in the image


// Get people in the image        

Python

Python# Display analysis results
# Get image captions
if result.caption is not None:
    print("\nCaption:")
    print(" Caption: '{}' (confidence: {:.2f}%)".format(result.caption.text, result.caption.confidence * 100))

# Get image dense captions
if result.dense_captions is not None:
    print("\nDense Captions:")
    for caption in result.dense_captions.list:
        print(" Caption: '{}' (confidence: {:.2f}%)".format(caption.text, caption.confidence * 100))

# Get image tags


# Get objects in the image


# Get people in the image        

  1. Save your changes and return to the integrated terminal for the?image-analysis?folder, and enter the following command to run the program with the argument?images/street.jpg:

C#

dotnet run images/street.jpg        

Python

python image-analysis.py images/street.jpg        

  1. Observe the output, which should include a suggested caption for the?street.jpg?image.
  2. Run the program again, this time with the argument?images/building.jpg?to see the caption that gets generated for the?building.jpg?image.
  3. Repeat the previous step to generate a caption for the?images/person.jpg?file.

Get suggested tags for an image

It can sometimes be useful to identify relevant?tags?that provide clues about the contents of an image.

  1. In the?AnalyzeImage?function, under the comment?Get image tags, add the following code:

C#

C#// Get image tags
if (result.Tags.Values.Count > 0)
{
    Console.WriteLine($"\n Tags:");
    foreach (DetectedTag tag in result.Tags.Values)
    {
        Console.WriteLine($"   '{tag.Name}', Confidence: {tag.Confidence:F2}");
    }
}        

Python

Python# Get image tags
if result.tags is not None:
    print("\nTags:")
    for tag in result.tags.list:
        print(" Tag: '{}' (confidence: {:.2f}%)".format(tag.name, tag.confidence * 100))        

  1. Save your changes and run the program once for each of the image files in the?images?folder, observing that in addition to the image caption, a list of suggested tags is displayed.

Detect and locate objects in an image

Object detection?is a specific form of computer vision in which individual objects within an image are identified and their location indicated by a bounding box..

  1. In the?AnalyzeImage?function, under the comment?Get objects in the image, add the following code:

C#

C#// Get objects in the image
if (result.Objects.Values.Count > 0)
{
    Console.WriteLine(" Objects:");

    // Prepare image for drawing
    stream.Close();
    System.Drawing.Image image = System.Drawing.Image.FromFile(imageFile);
    Graphics graphics = Graphics.FromImage(image);
    Pen pen = new Pen(Color.Cyan, 3);
    Font font = new Font("Arial", 16);
    SolidBrush brush = new SolidBrush(Color.WhiteSmoke);

    foreach (DetectedObject detectedObject in result.Objects.Values)
    {
        Console.WriteLine($"   \"{detectedObject.Tags[0].Name}\"");

        // Draw object bounding box
        var r = detectedObject.BoundingBox;
        Rectangle rect = new Rectangle(r.X, r.Y, r.Width, r.Height);
        graphics.DrawRectangle(pen, rect);
        graphics.DrawString(detectedObject.Tags[0].Name,font,brush,(float)r.X, (float)r.Y);
    }

    // Save annotated image
    String output_file = "objects.jpg";
    image.Save(output_file);
    Console.WriteLine("  Results saved in " + output_file + "\n");
}        

Python

Python# Get objects in the image
if result.objects is not None:
    print("\nObjects in image:")

    # Prepare image for drawing
    image = Image.open(image_filename)
    fig = plt.figure(figsize=(image.width/100, image.height/100))
    plt.axis('off')
    draw = ImageDraw.Draw(image)
    color = 'cyan'

    for detected_object in result.objects.list:
        # Print object name
        print(" {} (confidence: {:.2f}%)".format(detected_object.tags[0].name, detected_object.tags[0].confidence * 100))

        # Draw object bounding box
        r = detected_object.bounding_box
        bounding_box = ((r.x, r.y), (r.x + r.width, r.y + r.height)) 
        draw.rectangle(bounding_box, outline=color, width=3)
        plt.annotate(detected_object.tags[0].name,(r.x, r.y), backgroundcolor=color)

    # Save annotated image
    plt.imshow(image)
    plt.tight_layout(pad=0)
    outputfile = 'objects.jpg'
    fig.savefig(outputfile)
    print('  Results saved in', outputfile)        

  1. Save your changes and run the program once for each of the image files in the?images?folder, observing any objects that are detected. After each run, view the?objects.jpg?file that is generated in the same folder as your code file to see the annotated objects.

Detect and locate people in an image

People detection?is a specific form of computer vision in which individual people within an image are identified and their location indicated by a bounding box.

  1. In the?AnalyzeImage?function, under the comment?Get people in the image, add the following code:

C#

C#// Get people in the image
if (result.People.Values.Count > 0)
{
    Console.WriteLine($" People:");

    // Prepare image for drawing
    System.Drawing.Image image = System.Drawing.Image.FromFile(imageFile);
    Graphics graphics = Graphics.FromImage(image);
    Pen pen = new Pen(Color.Cyan, 3);
    Font font = new Font("Arial", 16);
    SolidBrush brush = new SolidBrush(Color.WhiteSmoke);

    foreach (DetectedPerson person in result.People.Values)
    {
        // Draw object bounding box
        var r = person.BoundingBox;
        Rectangle rect = new Rectangle(r.X, r.Y, r.Width, r.Height);
        graphics.DrawRectangle(pen, rect);

        // Return the confidence of the person detected
        //Console.WriteLine($"   Bounding box {person.BoundingBox.ToString()}, Confidence: {person.Confidence:F2}");
    }

    // Save annotated image
    String output_file = "persons.jpg";
    image.Save(output_file);
    Console.WriteLine("  Results saved in " + output_file + "\n");
}        

Python

Python# Get people in the image
if result.people is not None:
    print("\nPeople in image:")

    # Prepare image for drawing
    image = Image.open(image_filename)
    fig = plt.figure(figsize=(image.width/100, image.height/100))
    plt.axis('off')
    draw = ImageDraw.Draw(image)
    color = 'cyan'

    for detected_people in result.people.list:
        # Draw object bounding box
        r = detected_people.bounding_box
        bounding_box = ((r.x, r.y), (r.x + r.width, r.y + r.height))
        draw.rectangle(bounding_box, outline=color, width=3)

        # Return the confidence of the person detected
        #print(" {} (confidence: {:.2f}%)".format(detected_people.bounding_box, detected_people.confidence * 100))

    # Save annotated image
    plt.imshow(image)
    plt.tight_layout(pad=0)
    outputfile = 'people.jpg'
    fig.savefig(outputfile)
    print('  Results saved in', outputfile)        

  1. (Optional) Uncomment the?Console.Writeline?command under the?Return the confidence of the person detected?section to review the confidence level returned that a person was detected at a particular position of the image.
  2. Save your changes and run the program once for each of the image files in the?images?folder, observing any objects that are detected. After each run, view the?objects.jpg?file that is generated in the same folder as your code file to see the annotated objects.

Note: In the preceding tasks, you used a single method to analyze the image, and then incrementally added code to parse and display the results. The SDK also provides individual methods for suggesting captions, identifying tags, detecting objects, and so on - meaning that you can use the most appropriate method to return only the information you need, reducing the size of the data payload that needs to be returned. See the?.NET SDK documentation?or?Python SDK documentation?for more details.

Remove the background or generate a foreground matte of an image

In some cases, you may need to create remove the background of an image or might want to create a foreground matte of that image. Let's start with the background removal.

  1. In your code file, find the?BackgroundForeground?function; and under the comment?Remove the background from the image or generate a foreground matte, add the following code:

C#

C#// Remove the background from the image or generate a foreground matte
Console.WriteLine($" Background removal:");
// Define the API version and mode
string apiVersion = "2023-02-01-preview";
string mode = "backgroundRemoval"; // Can be "foregroundMatting" or "backgroundRemoval"

string url = $"computervision/imageanalysis:segment?api-version={apiVersion}&mode={mode}";

// Make the REST call
using (var client = new HttpClient())
{
    var contentType = new MediaTypeWithQualityHeaderValue("application/json");
    client.BaseAddress = new Uri(endpoint);
    client.DefaultRequestHeaders.Accept.Add(contentType);
    client.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", key);

    var data = new
    {
        url = $"https://github.com/MicrosoftLearning/mslearn-ai-vision/blob/main/Labfiles/01-analyze-images/Python/image-analysis/{imageFile}?raw=true"
    };

    var jsonData = JsonSerializer.Serialize(data);
    var contentData = new StringContent(jsonData, Encoding.UTF8, contentType);
    var response = await client.PostAsync(url, contentData);

    if (response.IsSuccessStatusCode) {
        File.WriteAllBytes("background.png", response.Content.ReadAsByteArrayAsync().Result);
        Console.WriteLine("  Results saved in background.png\n");
    }
    else
    {
        Console.WriteLine($"API error: {response.ReasonPhrase} - Check your body url, key, and endpoint.");
    }
}        

Python

Python# Remove the background from the image or generate a foreground matte
print('\nRemoving background from image...')

url = "{}computervision/imageanalysis:segment?api-version={}&mode={}".format(endpoint, api_version, mode)

headers= {
    "Ocp-Apim-Subscription-Key": key, 
    "Content-Type": "application/json" 
}

image_url="https://github.com/MicrosoftLearning/mslearn-ai-vision/blob/main/Labfiles/01-analyze-images/Python/image-analysis/{}?raw=true".format(image_file)  

body = {
    "url": image_url,
}

response = requests.post(url, headers=headers, json=body)

image=response.content
with open("backgroundForeground.png", "wb") as file:
    file.write(image)
print('  Results saved in backgroundForeground.png \n')        

  1. Save your changes and run the program once for each of the image files in the?images?folder, opening the?background.png?file that is generated in the same folder as your code file for each image. Notice how the background has been removed from each of the images.

Let's now generate a foreground matte for our images.

  1. In your code file, find the?BackgroundForeground?function; and under the comment?Define the API version and mode, change the mode variable to be?foregroundMatting.
  2. Save your changes and run the program once for each of the image files in the?images?folder, opening the?background.png?file that is generated in the same folder as your code file for each image. Notice how a foreground matte has been generated for your images.

Clean up resources

If you're not using the Azure resources created in this lab for other training modules, you can delete them to avoid incurring further charges. Here's how:

  1. Open the Azure portal at?https://portal.azure.com, and sign in using the Microsoft account associated with your Azure subscription.
  2. In the top search bar, search for?Azure AI services multi-service account, and select the Azure AI services multi-service account resource you created in this lab.
  3. On the resource page, select?Delete?and follow the instructions to delete the resource.

More information

In this exercise, you explored some of the image analysis and manipulation capabilities of the Azure AI Vision service. The service also includes capabilities for detecting objects and people, and other computer vision tasks.

For more information about using the?Azure AI Vision?service, see the?Azure AI Vision documentation.

Courtsy and taken from Azure.

要查看或添加评论,请登录

Dhiraj Patra的更多文章

社区洞察

其他会员也浏览了