登录查看更多内容

Project AI Boom Gate: Part 2 (Analyze)

Rennay Dorasamy

Writer | Developer | Architect

发布日期: 2024年11月3日

Article Technical Level Guide

Based on reader feedback, I have provided a guide detailing the technical level of this article. Some sections are super-easy to follow. I am secretly hoping that my family will read these posts to understand my “AI monologues” at the breakfast table. Some are (intentionally) incredibly detailed and have links to my github page which contain step-by-step instructions that will help jog my memory for later projects and guide those who wish to follow along. As you will recall from my approach, I’m building to understand how everything fits together.

This article is pitched at a "general understanding" level.

Article Tech Level - General Understanding

Background

As you might recall in Part 1, the objective was to build a solution that used Artificial Intelligence (AI) to open a boom gate based on the vehicle registration number or biometric data of the driver. A critical element was to allow the solution to "see" the physical world. This was achieved by ingesting a video stream from a camera attached to a Raspberry Pi.

To simplify processing, I used a managed Cloud service from Amazon Web Services (AWS) , Amazon Kinesis Video Streams (KVS) to handle the "heavy lifting" of the stream. We also used available functionality to extract images from the the video data and write those through to S3. So, the point of departure for this article is a series of images on S3.

Problem Statement

Now that our solution can "see" the world, it needs to make sense of the world. Specifically, for our use-case, we need to find a way to retrieve information from the image - such as the vehicle registration number.

You might be familiar with solutions such as admyt and KaChing Parking that provide an innovative and elegant ticketless approach. Essentially, you pre-register your vehicle on their App and provide your credit card details. When you drive up to a parking boom, your vehicle is identified based on the registration and the boom automatically opens. You get to skip the queues at the pay station which tend to build up as patrons plead with the machine to accept their cash. On exit, the camera once again identifies your vehicle, charges your card and sends an instruction to the boom to open. When it works, which is around 90% of the time, it is flawless.

As a stretch target, I will also attempt to emulate the access system of my security estate that uses facial recognition. Much like the way you unlock your smartphone, the system automatically opens the boom based on the identity of the driver. Unfortunately, my success ratio with this system is closer to 10% - as much as I try to charm it with various types of smiles.

Objective

Implement a solution to determine if the boom gate should be opened based on one of the following:

Vehicle data - vehicle registration number (license plate)
Driver data - facial comparison

Machine Learning Approach

While researching the concept of machine learning a few weeks ago, I came across a super-interesting section around choosing a machine learning modelling approach. Lets discuss this a little more. From an AWS AI/ML stack perspective, there are three tiers:

AWS AI services, the simplest method you can choose and include services such as Amazon Polly for text-to-speech generation, or Amazon Comprehend for text or sentiment analysis.
AWS ML services, specialised services such as Amazon Q Developer, a service that specialises in code generation.
Customised ML model solutions using AWS infrastructure and frameworks, such as Amazon SageMaker that allows decisions such as what ML algorithms to use.

In much the same way that I selected a Raspberry Pi over an Android Phone in Part 1, I settled on Tier #1: AWS AI service for this project for the following reasons:

I want to get a better understanding of the available AI services. I was super impressed to see that there are services for Vision, Speech, Text, Search, Chatbots, Personalisation, Forecasting, Fraud, Development, Contact Centres and Generative AI.
Based on my current skillset and prior experience, this would result in the fastest turnaround.
In the same way I shifted the "heavy lifting" to AWS by using KVS, I wanted to leverage a "managed service" to simplify the implementation.
My objectives (for this project) could be satisfied by an existing Cloud service, Amazon Rekognition (details below).

AWS Cloud Services

Amazon Rekognition

Amazon Rekognition is a video analysis service that adds advanced computer vision capabilities to your applications using an API. The service is a pre-trained deep learning technology that identifies objects, people, text, scenes, and activities in images and videos stored in Amazon Simple Storage Service (Amazon S3).

Note: I intend to use Tiers 2 and 3, custom ML models, frameworks and Amazon SageMaker in future projects.

Solution Design

The diagram below illustrates the updated process flow and interaction between hardware and Cloud services.

Amazon Web Services (AWS) 5 个月前

Future-proof your AI strategy

Cohere 1 年前

?? DeepMind’s New Gemini and The $1.3 Billion…

AlphaSignal 1 年前

AI Boom Gate Solution Design (Capture + Analyse)

A motion sensor picks up movement as a car approaches the boom gate.
The sensor, connected to the Raspberry Pi (via GPIO), triggers an event which is received by a Local Controller application.
The application starts recording using the camera module connected to the Pi.
The video stream is ingested into Amazon Kinesis Video Streams (KVS) using the Amazon Kinesis Video Streams CPP Producer, GStreamer Plugin and JNI SDK.
Using the KVS built-in functionality, we extract individual images from the video steam (at a specified sampling interval) and store in an S3 bucket.
When a new object is written to an S3 bucket, it fires an event that triggers a Lambda function. See here for further details.
The Lambda function makes a call to Amazon Rekognition to process the image. I have included instructions to invoke Rekognition (see below).
The extracted text is compared against registered vehicle registration numbers in a cloud-hosted database.
If successfully verified, an instruction is sent to the Local Controller indicating access is granted.
The Local Controller sends a command to the relay (via GPIO) to raise the boom gate.

Instructions

Detect Text (CLI, NodeJS, Python) details the following:

Command Line Interface (CLI), referencing an image in S3
Sample NodeJS application, referencing an image in S3
Sample NodeJS application, referencing a local image
Sample Python application, referencing a local image, which generates an annotated image with bounding boxes. This essentially uses the response from Rekognition, with the ImageDraw library.

Observations

The initial euphoria of extracting text from an image was quickly extinguished once I found that every piece of text from the image was extracted.
This makes finding the vehicle registration number from the response a little more tricky.
If we were building for production, a potential solution would be to use only a specific portion of the image - where the vehicle number plate is likely to be found.

Facial Comparison

I must admit - I was quite impressed and entertained by Rekognition's "facial comparison" capability. As you will note from the image below, it successfully matched pictures of me taken across a 20-year time span. I used the opportunity to test a few other subjects, such as friends and family, and the results were consistent.

Observations

Using facial biometric data would also require a liveness check - to prevent someone from holding up a picture of you.
Again, if we were building for production, a potential solution would be to use Amazon Rekognition Face Liveness.

Cost (Approximate)

One of the pillars of the AWS Well-Architected Framework is Cost optimisation. I used the pricing for Amazon Rekognition to calculate the "ballpark" pricing to detect text or detect faces for different configurations. A quick note that this is a super-rough calculation based on my specific use-case.

Amazon Rekognition "Back of the napkin" calculation

Although the ZAR/USD exchange rate is still eyebrow-raising, it is incredibly encouraging to see that it is financially "feasible" to build and operate these solutions without massive outlays of initial investment. This is a massive boost for startups as tech entrepreneurs.

Result

I am super pleased and impressed that I have been able to conceptualise, design and most importantly, prototype a solution - in the space of a few weeks, that years ago would have been a feature on Beyond 2000. From my brief research, the ability to recognise and text and faces uses breakthroughs in computer science such as AlexNet.

This is my key learning from this project. In much the same way that many of us use a car without a deep understanding of the complexity of the internal combustion engine, we can use AI services to evolve and optimise the solutions that we build. As an example, creating the GPT-4 Foundation Model (FM) cost OpenAI in the region of $100m - $200m. It is unlikely that we will create a new FM - but we can use and build on existing FMs.

There are also various options and specific services already available that use AI internally.

Coming up...

In my next project/instalment, I tackle the issue that most likely sparked my "hands on" approach to learning AI to solve "real-world" problems. Chances are you've encountered it too. Here's a clue about the problem statement. Can you guess what it is?

Muhammad Kazim ?

???Helping Agencies With My UI UX Design Skills

3 周

Rennay Dorasamy, how about machines gradually enhancing everyday life? Interesting perspective.

1 次回应

Jacky Kotane

Platform Strategy|System Architecture|R&D|Design Optimization|Tech Roadmapping

3 周

Interesting project Rennay !

1 次回应

GoHuman AI Consulting

3 周

Rennay, you’re spot on about the evolutionary nature of AI. The real revolution lies in how we leverage its capabilities. At GoHuman AI, we see generative AI as a game-changer, not just a tool. While many focus on the limitations, the potential for real-time sentiment analysis and predictive insights can transform customer interactions and decision-making processes. It’s not just about using AI; it’s about understanding its superhuman strengths—like emotion detection and psychological profiling—that can redefine how we approach problem-solving in tech. The future isn’t just about adapting; it’s about thriving with AI at our side.

1 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

Project AI Boom Gate: Part 2 (Analyze)