Location Enabling AI without Computer?Vision
The Question
In our previous work at Pixel8earth and Snap we spent a lot of time trying to drive down the cost of city scale Visual Positioning Systems (VPS) for Augmented Reality (AR). While we were able to drive down the cost of compute and enable the use of commodity cameras, there is no way around perpetually mapping the earth in 3D not being resource and cost intensive.
As we’ve worked on improving the accuracy of positioning with Zephr, there has been a little voice asking if it could enable AR without computer vision and VPS. Is it possible to construct a geographic pose with just the sensors on a phone and/or smart glasses?
The Background
The question has become particularly intriguing with Ray-bans by Meta quickly surpassing sales of mixed reality headsets.
While the progress by Meta and Snap on true AR glasses is incredible, the reality of a low cost and fashionable product doesn’t look imminent. In addition, there is a growing sentiment that “The coolest thing about smart glasses is not the AR. Its the AI.” This concept further refined our question; can we just location enable AI for portable compute.
The Challenge
In order to location enable AI we need to be able to determine what the user is looking at. In computer vision parlance this is determining the “geographic pose” of the user. To do so we need highly precise geographic coordinates (latitude, longitude, altitude) as well as orientation (pose). Both geographic location and orientation are available on smartphones today, but are notoriously inaccurate, especially in urban areas. This was one of the primary drivers for creating VPS. As we’ve been building the team has pushed to see if a software augmented GPS + IMU could deliver a viable geographic pose. Based on our latest testing the answer looked to be yes. So, we built out an app to test the concept.
The App
Given the lack of smart glass SDKs we opted to go for a smartphone based demonstration. The goal is simple. Point your phone at a “place” and see if it can determine what you are looking at. Specifically, hitting the excellent Google Places API as a function of the view shed given by the geographic pose. Then we have a Large Language Model (LLM), ChatGPT, generate a custom profile for the place. Last, but not least, we allow the user to ask ad hoc questions about the “place” to the LLM. The video below shows the app in action:
领英推荐
For a smart glasses use case we’d leverage the IMU from the glasses for orientation and the position from the phone for location. Then we’d use audio for the interactions “glasses, what place am I looking at?”. The possibilities of location enabling AI powered smart glasses is super exciting. Audio is a wonderful interface and format, which is developing quickly. Features like routing and tours work exceptionally well with this combination of location, AI and audio immersion. Also, the opportunities for gaming are quite exciting.
The Long?Term
Removing the need for computer vision as the primary layer for geographic interaction can really drive down the cost for smart glasses and the requisite infrastructure. Street view capture for the world is incredibly expensive. Processing a global scale or city scale feature databases to power VPS systems uses an tremendous amount of compute and energy. Constantly running the camera to operate a VPS is a significant battery drain.?
One of the most precious commodities for smart glasses is battery life. This is a big reason GPS is often left off early smart glass initiatives. In Zephr’s work with IoT partners we’ve discovered that moving the GPS solver to the cloud and the using the local GPS chip to just send raw satellite measurements can significantly reduce battery consumption while improving accuracy. Given the near future where constellations like AST and Starlink can provide 5G connectivity to devices anywhere in the world, this approach could hold promise.?
Last but not least, none of these means abandoning AR. You can use a geographic pose generated with sensors to render AR objects in a device. Also it is possible to provide occlusion using the vast 3D building databases available today. Will just sensors solve all the problems?—?no. They can provide an important bridge between the demo-able and the deployable. I think the biggest lesson in the success of Ray-bans by Meta and the struggle of mixed reality, for main stream adoption, is the importance of practicality over gadgetophilia. The future is exciting, but there is a lot of opportunity for location enabling AI that consumers are hungry for today.?
CEO at Development Seed
4 个月Really good stuff Sean Gorman. I could imagine rapid uptake in niche professions. I'll be surprised if motorcycle delivery drivers are still looking down at a phone in a couple years
Great to see you at the show Sean. Been a longtime. Really excited to see your new project.
National Geographic Explorer | Map Maker | Founding Engineer @ Fused.io
4 个月Phenomenal demo! ??
Co-founder & CEO at Fused.io
4 个月This is a game changer!! ??
Working at the convergence of Geospatial, AII, spatial computing and blockchain ~ Unlocking geospatial's potential at Versar
4 个月Super interesting Sean Gorman