Navigating the Future: Spatial AR Development with AI for Object Labeling and Placement - Part 2
Preston McCauley
Director of Emerging Technologies at Tonic3 | Executive Leader in AI, AR/VR, & UX | Driving Innovative #AI Solutions / XR Labs -Teacher, Speaker, Educator - expertise in agentic solutions like #crewai #ML
Welcome to part two of my three-part article on prototyping with AR and AI. I encourage you to go back and read part 1. To understand the entire process.
To begin with, we are going to break down the approaches used to move the POC 1 validations into the next gateway.
With the help of my brilliant developer, we confidently moved forward to the next stage of our work. The second proof of concept in the CIC approach aims to build upon rapid design and development cycles to validate further the techniques, tools, methods, and issues. These are broken down into quadrants.
This stage was initially focused on identifying and categorizing an image, which was a micro goal. I have gained significant practical experience with spatial devices, allowing me to create a potential flow that could quickly lead us to the spatial classification phase.
Though working around privacy concerns presented challenges, we remained undaunted and continued our concept. After all, we could only plan based on what we knew and were determined to succeed.
The first micro objective was establishing a working spatial Unity project to understand the spatial map and map the room planes (lightwork in a Hololens and ARKIT). However, it was still challenging in Meta Quest.
Leveraging my previous spatial work experience, I knew how to accomplish this goal. Working with my fabulous UX designer, we discussed this spatial mapping functionality and drafted some highly conceptual UI mocks that could be used to build upon our (POC) further. As you can see the interface is meant to be extremely minimal, because we need to keep the spatial visual areas manageable, If we didn't this would cause issues with the identificaiton process and overall clutter the camera scene. You always want to manage your holographic visual data like any other code structure. This meant we wanted to toggle all the UI and potentially existing labeled objects to be visible or non-visible.
While I realized the end product would likely not achieve the level of polish identified below, the mocks provided a helpful north star for the experience we wanted to aim toward while helping to keep an eye on how the UX would evolve.
At the same time in parallel we were working on a another quadrant of the iteration.
First, our developer and I worked to ensure we could run the spatial app experience. It didn't need to be perfect; it was enough to be functional on the Quest 3. This process meant detecting the environment, the room, room data, and testing the camera and taking the screenshot.
I was incredibly excited because the Quest 3's color pass-through made this the perfect candidate device. This advancement is a massive milestone in cost versus capability with cheaper AR pass-through devices.
The next thing to note was that, much like what I shared in article one, for POC1, we needed to create a method to "highlight" the environmental target areas. As previously mentioned, I would place shapes around my images in tools I made in Python and Figma and generate pseudo highlights around objects.
We needed a method to replicate this process and highlight objects in the physical space. The obvious choice was to create a re-sizeable poly and include a shader to give it the same look as in POC1. This step required a few tests to correct the lighting after the Quest took a photo. If the image was too dark or the contrast was off, the AI could not interpret it. After a few examples, we found a decent balance, like in the video and images. We also had to play around with the permissions to determine what was possible with the image capture.
This step was an essential CIC fail/pass gateway for approaching a pass-fail gateway where permissions were challenging. Could we take a picture of the highlighted object, refer to the same method from POC 1 - and test whether the image was saved from the device and loaded back into memory to be analyzed? Thankfully, this was yes.
I noticed a few things along the way that were important. Firstly, the pass-through camera's resolution was lower than I had expected. However, this was not an issue for our case since I didn't necessarily require high-quality images. I tested image fidelity in POC1 to find a baseline.
领英推荐
Despite these challenges, we had solid forward momentum, which was significant, but another challenge appeared; and for this one, we needed to discuss it intensely.
From the start, I had worked a lot with multi-dimensional spatial anchors; I knew how to use them and even had patents on techniques with them. We would be in great shape if we could link the spatial ID to our pseudo-highlight, but how would we approach this?
Over a series of conversations, we broke the problem down. The prompt modeling had to change, but to what?
In addition if multiple polys need to be labeled in a image how can we ensure that the highlighted image areas, representing a 1-to-many relationship, could be linked to the right spatial property when the response method was returned from the API with the object data payload?
This conversation led us down a new path and decision, as my AI developer was like.
"What if we could parse the object systematically?"
We discussed a few approaches to how the AI might read the image. This meant examining existing instructions to ensure the AI understood and creating new instructions to exhibit the behavior of systematically and repeatably analyzing the photos.
I quickly returned to POC one. I started revising my prompt structure to test and see if we could find an excellent model to instruct the AI on analyzing the image content. In a few hours, we had a new working revision.
After more discussion, the approach was to read the image from a constant left to right and assign object highlights as a JSON-like object with ID 1, 2, 3, etc. Of course there are still further refinements, if polys overlap or touch near edges, that can confuse the system.
We now had images, object highlighting, spatial identifiers, spatial vision-linked markers, and all the AI behind it working with a real camera. The last part was to enable the popup-like tooltip bubbles as the data came streaming back (like in the video). Using these steps together, we can identify and generate new tool possibilities that could be leveraged on-demand in spatial systems. So, where does this lead us next?
Please join my next article as I share more about the CIC model. The foundation of an innovation model to prototyping in the age of AI. this particular project, and where we go from here!
Fascinating insights on the convergence of AR and AI in spatial systems; looking forward to seeing how these technologies continue to evolve and intersect!