Building an Augmented Reality App in 2018: Technical Immaturity
“silver iPhone X” by NeONBRAND on Unsplash

Building an Augmented Reality App in 2018: Technical Immaturity

I develop AR apps for iOS using ARKit 2.0 and Swift. In Part 1 I explored the practical problems of developing augmented reality apps in 2018. Here’s what’s on my mind for the near-future of mobile AR.

Part 2: Technical Immaturity

Last week I walked through some of the everyday, fundamental issues of developing, testing, and using AR apps built on top of iOS and ARKit in 2018. Those issues ranged from the mundane (you can’t test AR apps while sitting down, nor in the dark, nor without some remembrance of geometry) to the systemic (testing shared AR experiences is complex and awkward).

Those practical problems may lead to significant issues at different stages of the product lifecycle, but many of them are innately connected to the technology underpinning this early version of ARKit and mobile augmented reality on iOS. When Apple moves to the next generation of sensors and base technology powering ARKit in 1–5 years, several of those failings (inability to work in low-light & requisite movement to scan an area) will likely go away.

But I’ve encountered more interesting problems that illustrate the rough edges of present-state augmented reality development. These shortcomings aren’t failures of the underlying hardware, but rather effects of iOS AR tooling and development being so new. I expect these flaws and gaps to be resolved either by Apple in the coming 1–2 years via software and the release of ARKit 3.0/4.0 or by third party tools, libraries, and products:

Low-Fidelity Persistence

One of the glaring omissions of ARKit 1.0 was that, generally speaking, once you closed the app whatever AR objects you had placed or scanned or created were gone. AR apps were a single session affair. ARKit 2.0 introduced Persistent AR Experiences, which means you can now save the new layout & design of your living room that you spent ten minutes creating and reopen that exact design next month. Persistence seems like it should have been table stakes and existed at the onset so people didn’t lose their progress or place, but that’s another indicator of just how early we are for mobile AR software.

However, even using Apple’s first stab at persistence leaves a lot to be desired. At its core, ARKit works by scanning the world around you and finding planes. Those planes could be the ground, tables, countertops, or walls. Once it recognizes those planes, ARKit facilitates placement of 3D objects on those planes or in 3D space or whatever else the AR app functionality entails.

Persistence would then, ideally, remember exactly where those planes are so when you place a virtual vase of flowers right in the middle of your dining table, close the app, and reopen the app, that vase is precisely where you left it. The flowers aren’t floating 6 inches above the table and they’re not three inches offset from the center of the table: It’s exactly where you left it.

This doesn’t happen quite yet in practice. I’ve posted some demo code that visualizes detected planes and then allows you to reload the scene within that same session for comparison of how accurate persistence is, or you can check out a video demonstration of what the demo does here. This app is just a technical demo and one meant to highlight both plane detection in general and the built-in persistence’s accuracy (or lack thereof). While the persistence accuracy may be fine for pathfinding inside a building (e.g. “Follow the AR arrows on the grocery store floor to find the Mountain Dew” or “Follow the path to find a specific conference room in an event space”), in my testing it’s not accurate enough to persist a 3D representation of my home. After scanning and reloading a room, and sometimes even after just walking in circles a couple times without any reloading, a virtual wall will be off by a foot or a virtual shelf will be shifted 6 inches from where the real shelf actually sits. This isn’t a dealbreaker for using persistence in apps, but it is a significant limitation that needs to be accounted for when designing a product and its capabilities.

I suspect persistence accuracy will be greatly improved in the next year or two max. I’ve encountered startups using custom software running on current hardware to scan and persist AR objects at much greater fidelity than ARKit, so Apple and Google won’t be far behind in either improving the software themselves or bringing in the people who can. The ability to reliably scan and reload objects or locations with precision and accuracy to the nearest inch, and then millimeter, won’t be something that inherently impresses the general public — but the apps, products, and companies built upon those advancements will.

Lack of Standard AR UI Elements

It’s pretty easy to build a barebones skeleton of an app that only lives on your phone and doesn’t need talk to a server somewhere to pull in pictures from a social network or whatever. If you’re looking to create the world’s greatest self-contained timer app or non-cloud-syncing journaling app of 2019, you could start learning iOS development today and have your prototype ready next week. A big reason why making a simplistic app is so easy is because Apple provides basic UI building blocks to cobble together your product.

Apple provides the groundwork so you can add a button to your app in 1 second. Or a table that will display some data in an organized manner. They also provide modules to easily implement retrieving a picture from your camera roll, finding yourself on a map, or posting something to a social network. Not all of these building blocks were around on Day 1 of the App Store in 2008 — they were added over the years as the mobile platform matured.

While ARKit provides quick access to amazing technology, what developers have access to right now is solely that base technology and very few building blocks on top of it. Apple’s Human Interface Guidelines provides 76 pages of guidance on all the different types of UI/UX and their corresponding Do’s and Do Not Do’s. Augmented Reality is a single page in those Guidelines. If AR is going to be a new mode of interaction and a new platform that breeds entire companies, industries, and changes to our daily lives (I think it will be all of the above), then getting some core AR UI elements will be an early step toward hastening that future. Such elements would provide clarity around the best way to present data or text in AR, or the best way to handle rotation of 3D objects, or the best way to translate a drawing from a finger tracing on a phone’s screen to the 3D plane that was detected by ARKit. I think a few tentative steps will be taken towards providing these UI/UX elements in the next 1–2 years, and doing so will unleash a rejuvenated wave of AR apps as more developers build upon that core foundation.

I don’t think Apple & Google are dragging their feet here. If I had to guess, I think they’re building things internally, testing things internally, and — like any tech titan — they’re watching what the millions of developers on their platforms are doing in order to crib a particular part of some random app’s AR UX and declare it the “best,” sanctioned way of handling ___ in AR for all future apps. There’s no R&D team quite like an ecosystem of millions of developers providing you with proprietary usage data.

Heavily Constrained Shared AR Experiences

This one’s simple. Currently, using Apple’s recommended toolkits, those ultra-cool shared AR experiences can only be shared with 5 other people max. Outside of infrequent large events, festivals, and conferences, does the ability to share a mobile-phone-based AR experience with 5 other people suffice? Sure. But it won’t suffice when we’re no longer looking through a smartphone’s screen at an AR experience and instead looking through some other device, whatever that may be, in a more natural way.

But I think that glosses over an issue. While it may generally serve to only allow 6 people max to enter the same shared AR world/experience (especially since that limit also requires all 6 people to be in the same location), it would be awesome for a hundred people to be playing an AR game together in real-time. Whether that game board is shared by a hundred people in the same space or a hundred players spread around the globe, the promise of massively multiplayer shared AR experiences is tremendous.

I don’t think allowing hundreds of players to see each other on the same AR board is an easy issue to tackle, but I do think it’s one that can be surmounted using existing technical backends & infrastructures. While it’s true that such a game isn’t the most meaningful thing to build, such experiences would expose AR capabilities to tens, hundreds, or thousands of people. Plus…it would be really cool.

Tracked Images Toolkit

Tracked images allow AR apps to recognize 2D artwork, photos, packaging, posters, graphics, billboards, or any type of image and launch an AR experience when a specific image is detected. That could entail an animated 3D model of a car driving out of the automobile ad plastered to the side of a building. Or it could be Harry-Potter-style moving pictures that replace a static photo with video (that demo’s code is here). Or it could be a real-world ad-blocker that replaces billboards with content you want to see.

If you’re working on a marketing campaign, getting access to the images that kick off the AR experience is straightforward. However, if you’re creating custom tracked image behavior based off pictures you take out in the real world or you’re a student tinkering with the concept while utilizing pictures of things around the house — your images aren’t ready to use in an AR app without some preparation. Most importantly, perspective distortion needs to be removed to yield a perfectly rectangular image (which is best for a properly functioning app). The trapezoidal geometry that results from taking a picture of a rectangular object (e.g. taking a picture of a 100-foot-wide billboard while you’re 50 feet below and to the right of it) isn’t ideal when creating a tracked image app.

I looked and outside of an old, unmaintained Mac app (with poor reviews) that removes perspective distortion, there were few other options. While testing the technology, I dumped my images into Gimp and manipulated them until they “looked right.” I look forward to someone, with more interest & expertise in graphical manipulation, creating a GitHub library or simple product for easy removal of image perspective distortion.


The four shortcomings highlighted above are less about the “duh” differences in AR development explored in Part 1 and more about how early things are for mobile AR. These things will get better — and soon. Some will be resolved in the next release of ARKit or ARCore while others will be solved by a third party and promptly acquired (or lifted) by Apple/Google. On that note, in Part 3 I’ll jump into some of the AR dev ecosystem opportunities I’ve encountered, including some that could be prime markets for a startup to build a product around.

要查看或添加评论,请登录

Tony Morales的更多文章

社区洞察

其他会员也浏览了