Building a cashier-less checkout GPT
Back in 2019, I worked on a startup, we were building technology to provide a pickup and go checkout experience for retail customers. The core elements to build this system included solving the following problems -
- Track individuals with overhead depth sensing cameras - We solved this through object detection(SSD and YOLO models) and object tracking(graph theory algorithms that join tracked objects across frames) using Intel Real Sense cameras. You can see the output here - Object Tracking
- Find out who picked up what - We solved this using pose estimation(who was near the object) and object detection(which object got picked up). You can see the output in this video - Pose Estimation
We had to build different models for each element that included object tracking, object detection, pose estimation and orchestrate the same through a distributed engineering system.
In 2024, I tried the same with GPT-V and I could get most of elements to work with zero short learning using simple prompts as shown below -
AI that is capable of language understanding, human like perception and in some cases reasoning has arrived. The future is to reduce cost/request and deploy this at scale.