Project AI Boom Gate: Part 2 (Analyze)
Article Technical Level Guide
Based on reader feedback, I have provided a guide detailing the technical level of this article. Some sections are super-easy to follow. I am secretly hoping that my family will read these posts to understand my “AI monologues” at the breakfast table. Some are (intentionally) incredibly detailed and have links to my github page which contain step-by-step instructions that will help jog my memory for later projects and guide those who wish to follow along. As you will recall from my approach, I’m building to understand how everything fits together.
This article is pitched at a "general understanding" level.
Background
As you might recall in Part 1, the objective was to build a solution that used Artificial Intelligence (AI) to open a boom gate based on the vehicle registration number or biometric data of the driver. A critical element was to allow the solution to "see" the physical world. This was achieved by ingesting a video stream from a camera attached to a Raspberry Pi.
To simplify processing, I used a managed Cloud service from Amazon Web Services (AWS) , Amazon Kinesis Video Streams (KVS) to handle the "heavy lifting" of the stream. We also used available functionality to extract images from the the video data and write those through to S3. So, the point of departure for this article is a series of images on S3.
Problem Statement
Now that our solution can "see" the world, it needs to make sense of the world. Specifically, for our use-case, we need to find a way to retrieve information from the image - such as the vehicle registration number.
You might be familiar with solutions such as admyt and KaChing Parking that provide an innovative and elegant ticketless approach. Essentially, you pre-register your vehicle on their App and provide your credit card details. When you drive up to a parking boom, your vehicle is identified based on the registration and the boom automatically opens. You get to skip the queues at the pay station which tend to build up as patrons plead with the machine to accept their cash. On exit, the camera once again identifies your vehicle, charges your card and sends an instruction to the boom to open. When it works, which is around 90% of the time, it is flawless.
As a stretch target, I will also attempt to emulate the access system of my security estate that uses facial recognition. Much like the way you unlock your smartphone, the system automatically opens the boom based on the identity of the driver. Unfortunately, my success ratio with this system is closer to 10% - as much as I try to charm it with various types of smiles.
Objective
Implement a solution to determine if the boom gate should be opened based on one of the following:
Machine Learning Approach
While researching the concept of machine learning a few weeks ago, I came across a super-interesting section around choosing a machine learning modelling approach. Lets discuss this a little more. From an AWS AI/ML stack perspective, there are three tiers:
In much the same way that I selected a Raspberry Pi over an Android Phone in Part 1, I settled on Tier #1: AWS AI service for this project for the following reasons:
AWS Cloud Services
Amazon Rekognition is a video analysis service that adds advanced computer vision capabilities to your applications using an API. The service is a pre-trained deep learning technology that identifies objects, people, text, scenes, and activities in images and videos stored in Amazon Simple Storage Service (Amazon S3).
Note: I intend to use Tiers 2 and 3, custom ML models, frameworks and Amazon SageMaker in future projects.
Solution Design
The diagram below illustrates the updated process flow and interaction between hardware and Cloud services.
领英推荐
Instructions
Detect Text (CLI, NodeJS, Python) details the following:
Observations
Facial Comparison
I must admit - I was quite impressed and entertained by Rekognition's "facial comparison" capability. As you will note from the image below, it successfully matched pictures of me taken across a 20-year time span. I used the opportunity to test a few other subjects, such as friends and family, and the results were consistent.
Observations
Cost (Approximate)
One of the pillars of the AWS Well-Architected Framework is Cost optimisation. I used the pricing for Amazon Rekognition to calculate the "ballpark" pricing to detect text or detect faces for different configurations. A quick note that this is a super-rough calculation based on my specific use-case.
Although the ZAR/USD exchange rate is still eyebrow-raising, it is incredibly encouraging to see that it is financially "feasible" to build and operate these solutions without massive outlays of initial investment. This is a massive boost for startups as tech entrepreneurs.
Result
I am super pleased and impressed that I have been able to conceptualise, design and most importantly, prototype a solution - in the space of a few weeks, that years ago would have been a feature on Beyond 2000. From my brief research, the ability to recognise and text and faces uses breakthroughs in computer science such as AlexNet.
This is my key learning from this project. In much the same way that many of us use a car without a deep understanding of the complexity of the internal combustion engine, we can use AI services to evolve and optimise the solutions that we build. As an example, creating the GPT-4 Foundation Model (FM) cost OpenAI in the region of $100m - $200m. It is unlikely that we will create a new FM - but we can use and build on existing FMs.
There are also various options and specific services already available that use AI internally.
Coming up...
In my next project/instalment, I tackle the issue that most likely sparked my "hands on" approach to learning AI to solve "real-world" problems. Chances are you've encountered it too. Here's a clue about the problem statement. Can you guess what it is?
???Helping Agencies With My UI UX Design Skills
3 周Rennay Dorasamy, how about machines gradually enhancing everyday life? Interesting perspective.
Platform Strategy|System Architecture|R&D|Design Optimization|Tech Roadmapping
3 周Interesting project Rennay !
Rennay, you’re spot on about the evolutionary nature of AI. The real revolution lies in how we leverage its capabilities. At GoHuman AI, we see generative AI as a game-changer, not just a tool. While many focus on the limitations, the potential for real-time sentiment analysis and predictive insights can transform customer interactions and decision-making processes. It’s not just about using AI; it’s about understanding its superhuman strengths—like emotion detection and psychological profiling—that can redefine how we approach problem-solving in tech. The future isn’t just about adapting; it’s about thriving with AI at our side.