AWS AI Service: Amazon Rekognition
Amazon Rekognition is an AI service that enables users to effortlessly incorporate image and video analysis into their applications. It provides a set of tools for analyzing visual content, including capabilities for image classification, where objects and scenes are identified within an image; object detection, which pinpoints and labels specific items within both images and videos; and text extraction from images, allowing for the identification and retrieval of written content within visual media.
Additionally, Amazon Rekognition offers advanced features like facial recognition, enabling the detection, analysis, and comparison of faces across images and videos. It also supports activity detection in videos, which can identify and track specific actions or movements, making it particularly valuable in surveillance and public safety scenarios. This service is designed to streamline the implementation of complex visual analysis tasks, making it accessible and scalable for a wide range of applications.
It is important to understand the basics of image and video processing in deep learning, particularly convolutional neural networks (CNNs). CNNs are specialized deep learning models designed to recognize patterns in images by breaking them down into smaller parts. They use layers of filters to identify features like edges or textures, gradually combining these to understand more complex aspects of the image, such as identifying objects. CNNs consist of convolutional layers (applying filters) and pooling layers (reducing parameters and spatial size). Initial layers capture low-level features (edges, curves); later layers capture high-level features (object identification).
Popular CNN architectures for image classifications include ResNet and Inception v4 and for object detection YOLO and SSD.
Transfer learning involves using a pretrained model, freezing initial layers, and retraining the last few layers on a new dataset. Commonly used in image classification, object detection, and NLP.
With Amazon Rekognition, developers can simply leverage pretrained models or train custom machine learning models without having to worry about writing the algorithm code, or about setting up or managing the infrastructure to train and deploy a deep learning model.
Amazon Rekognition Custom Labels allows you to train custom models tailored to your specific needs, but this feature is currently limited to image-based tasks only. It does not support custom training for video data or other types of media.
Amazon Rekognition processes both static images and stored videos. Image operations are synchronous, meaning you receive the results immediately. In contrast, video operations are asynchronous. When you request video processing, Amazon Rekognition will notify you upon job completion by publishing a message to an Amazon SNS topic. You will then need to call a Get* API to retrieve the outputs.
Here is a practical example to illustrate detecting objects in an image or video.
For object detection in an image, it is necessary to provide the location of the image (in JPEG or PNG format) stored in Amazon S3 or as a byte-encoded image input.
Sample output may look like this:
{
{
"Labels": [
{
"Name": "Vehicle",
"Confidence": 99.15271759033203,
"Instances": [],
"Parents": [
{
"Name": "Transportation"
}
]
},
{
"Name": "Transportation",
"Confidence": 99.15271759033203,
"Instances": [],
"Parents": []
},
{
"Name": "Automobile",
"Confidence": 99.15271759033203,
"Instances": [],
"Parents": [
{
"Name": "Vehicle"
},
{
"Name": "Transportation"
}
]
},
{
"Name": "Car",
"Confidence": 99.15271759033203,
"Instances": [
{
"BoundingBox": {
"Width": 0.10616336017847061,
"Height": 0.18528179824352264,
"Left": 0.0037978808395564556,
"Top": 0.5039216876029968
},
"Confidence": 99.15271759033203
},
{
"BoundingBox": {
"Width": 0.2429988533258438,
"Height": 0.21577216684818268,
"Left": 0.7309805154800415,
"Top": 0.5251884460449219
},
"Confidence": 99.1286392211914
},
],
"Parents": [
{
"Name": "Vehicle"
},
{
"Name": "Transportation"
}
]
},
"LabelModelVersion": "2.0"
}
}
By specifying MaxLabels, the number of responses can be limited, and Amazon Rekognition will synchronously return a response displaying the bounding boxes and confidence scores of the various objects detected in the image, as illustrated in the previous example. The confidence score can then be utilized for downstream actions.
领英推荐
In contrast, for a video job, it is not possible to pass in bytes; instead, the location of a video stored in Amazon S3 must be provided. The API used is StartLabelDetection, and it is also necessary to pass in an SNS topic for Amazon Rekognition to send a notification once the video labeling task is completed. The outputs can then be accessed by calling the GetLabelDetection API.
A key benefit of Amazon Rekognition Video is that you can work with streaming videos. Amazon Rekognition can ingest streaming videos directly from Amazon Kinesis Video streams, process the videos, and publish the outputs to Amazon Kinesis Data Streams for stream processing.
Real World Example:
Consider a scenario where an IT manager at a large retail chain is tasked with monitoring in-store security footage in real-time to identify shoplifters based on a database of known individuals. The company’s leadership, however, is concerned about the time and complexity involved in building, training, and maintaining these machine learning models due to the advanced expertise required. The challenge lies in designing a solution that addresses these needs while minimizing costs. The primary concern is that the lack of deep learning expertise within the organization may hinder the development of this solution.
Amazon Rekognition Video offers an effective solution for this scenario. The process can be broken down into the following steps:
Create a Face Collection: Start by building a collection of known faces using Rekognition Image or Video, detecting faces from an existing database of images or archived footage.
Ingest Live Security Feed: Use a service like Kinesis Video Streams to ingest the live security feed into the system.
Manage Output Data Stream: Set up Kinesis Data Streams to handle the output data stream from the video feed.
Process Video Feed: Utilize the CreateStreamProcessor API in Rekognition Video to process the incoming video feed, with the Kinesis Video stream as input.
Publish Analysis Results: The analysis results will be published to Kinesis Data Streams.
Store Outputs: AWS Lambda can then consume the data from Kinesis Data Streams and store the outputs in S3 or a key-value store like Amazon DynamoDB.
The following graphic illustrates the high-level architectural flow: