The Freedom of Constraints: AI in 3D Interfaces

The Freedom of Constraints: AI in 3D Interfaces

In the future predicted by Apple, Meta and dozens of other huge companies sinking billions into R&D on VR and AR headsets, the interactive world will no longer be restricted to the flat plane. Why should websites still be the same 2D experiences we were making back in the Web 2.0 days? So we decided to play around with the idea: what if we made 3D websites really easy to make?

A little while ago I wrote about AI as the future of interface. I think that is absolutely going to be the best way for users to interact with tools in particular, and I’ve recently had the opportunity to test out that theory with a project I’ve been working on with a few friends.?

We’re extremely early on this project - these are dirty prototypes to prove out theories and not launchable code. But I think there are still some interesting things to learn from the work we’ve done so far, particularly around helping people interact with the 3D world and in using AI to make that stress-free.

3D in the Second Dimension

Those of us with experience working in games can probably remember the first time they ever tried working in a 3D engine. Getting things perfectly aligned in one view, just to rotate the scene and realize you’re way off in the axis you couldn’t see, was a new brand of frustrating. The fact is that humans are pretty good at interacting in 3D spaces - we live in them 24/7 - but if you put that 3D space behind a 2D plane (your monitor) it becomes much more difficult.

Some companies like ShapesXR try to solve this problem by moving the creation process inside 3D and inside the headset. This works pretty well, but still requires a pretty steady hand and a great deal of patience - and so far you still need to move them to another engine to actually deploy it anywhere, which is a headache all its own.? So we took it as an interface problem.

Meet our test site: Alien Sushi. This represents some of the ways we tried to solve these problems with some common website elements. It’s a fun little website as if aliens had their own sushi restaurant.?

The first problem to solve is the third axis. People are not great at depth - they can typically manage width and height decently well. We considered what made tools like SquareSpace and Wix so successful - they don’t allow you to do just anything. They give you constraints that will help you get the best looking result as fast as possible. So we came up with something we called the Ananke System. This is really just a fancy name for a 3D template, but it provides a set of constraints that allow the user to focus on what they’re better at - height and width.

There are four areas where users can adjust the elements of their site: the Table, Near (or Arm’s Length), Mid-Range, and Far (Horizon).? The Table is for bringing things close to you - grab a picture and set it down, have some 3D models to interact with in front of you, or highlight a special item you want the user to see front and center.? Near is like a fighter jet cockpit - everything is within reach and immediately in your view. Mid-Range is like having a bank of 80 inch TVs encircling you, for elements you need scale for, or when you need to display a bunch of decently-sized images together. Far is for background elements - not displayed in AR, but ways to spruce up your scene in VR.? You can also mix and match content between them, which can create some nice interfaces themselves.

As we go, we can also create multiple types of templates - or allow users to create their own. We pin all content to one of these Ananke shapes, and allow the user to manipulate their elements through easy to use controls - and an AI interface.?

Ghost in the Machine

AI as an interface has several problems - foremost of which is accuracy. LLMs are not perfect at understanding what you mean, and the way you structure the prompt has an enormous impact on the result you get. With the Alien Sushi prototype, I wanted two main things: for AI to be able to configure the site to the user’s requests, and for it to be able to understand where other objects were in the scene and to be able to read characteristics about them.?

We built this prototype in A-Frame, which is an offshoot of THREE.js, meaning that the whole thing is more or less made with JavaScript and HTML. Our early experiments were much more naive about how easy it would be to get an LLM (OpenAI’s ChatGPT for the purposes of this experiment) to generate code in the very unusual way we were trying to get it to work. John, our engineering expert, spent a ton of time trying to retool projects like Web2VR - the idea being that if we could just use as flexible a framework as possible we could do things like very easily convert existing HTML sites to 3D.?

It worked - sort of. We could get a 3D world working, we could get the LLM to do… something, but the LLM was just not trained to do this task well. We also had expectations about how it would work, but these were expectations the LLM couldn’t easily figure out how to satisfy.?

It took us a few iterations to realize that the problem posed by 3D spaces in 2D planes for humans, and the problem of getting the AI to do what we wanted had the same solution. Drastically limit what the AI had to think about. We were trying to boil the ocean and the AI couldn’t do it. So we took another look back at our core interface idea - find what made services like SquareSpace so effective for non-engineers to use to produce a complex technical result.

The solution to both things was constraints. Ananke is the ancient Greek god of constraints, and we had to apply it to not just the layout, but the tooling as well. Don’t ask the LLM to generate the HTML element. Just tell it what values it needs to return to get the result you’re looking for. We built a template for each kind of module represented in the Alien Sushi site and it boiled down to a handful of values the LLM had to understand. Vertical position. Horizontal position. Which Ananke layer this belonged to. Font names, colors, caption positioning, etc.? This time it worked beautifully!

Mostly!

I won’t go through all the iterations we went through, but we learned a lot. We had to build and constantly refresh lists of characteristics about every element and push those into the LLM instructions. We had to figure out why the LLM would sometimes interpret “move this left a little” to mean moving it halfway across the screen.? What we found in the base controls we wanted to create was that we needed to keep the instructions simple, and the options clear. Eventually it worked pretty much every time.

But pretty much is NOT every time. I think we can improve this a lot by better fine-tuning the model and probably a few other optimizations, but we’re still faced with the problem that sometimes AI just doesn’t do what you want, due to the nature of how they work. Problematic for an interface - how frustrating would it be if you told your car to turn left and then it suddenly just slammed on the brakes instead? So we determined that we would always make the manual controls accessible for fine tuning, and perhaps even pop up the actual control for the value being edited right after the LLM edited it for ease of use.?

Just as if you were asking an artist for a drawing, and they’re just not getting what you have in mind, sometimes it’s more efficient to sketch it yourself to get the idea across. Along with improvements to the training to make this less and less necessary, I think it’s always going to be good interface to allow humans to adjust things to their own liking instead of just shouting “No! I mean left. Left!” at the LLM.

So, did this work?? Did we successfully marry AI and interface??

Honestly, yeah, I think it worked great!? Mostly in lining things up - I didn’t have to turn on “snap”, or try to nudge things appropriately in line with each other - I just asked the system to do it and it did. I still wanted to move things and ideate with my hands; see how this might look over there, or what if I made this or that a different color, but the little precise work was easier with AI help.

For example, with manual tools, I could choose the red I wanted for a background color on one element, copy the hex code and move it over to the other element of the same type (or use the eyedropper, which is a little funky when opacity is involved.) With the AI, I could just say “make this background the same color as the one on the Chef Bio.” And then, the system simply copied the values.? Later on what will make this really effective is the ability to say “make all the info panel backgrounds this shade of red,” but our prototype only allows editing one element at a time at the moment. And of course, a more mature version of this tool would allow you to establish color palettes, but even then adjusting across multiple elements will be easier with help from the AI.?

Summary

I think when many people start fiddling with AI and LLMs, they have a very starry-eyed idea of what these tools can accomplish. And don’t get me wrong, when you are not looking for a very specific outcome, the results you can get can be stunning. But in terms of interface, people attempting to implement AI assistance may find that freedom can be the enemy of precision. As these tools become better and better, and trained on more specific data to these tasks, we may find this problem fading, but for now I think the best way to execute AI-assisted interface is to start small and use it to attack very specific problems with straightforward outcomes.

I’m eager to keep experimenting and see how these capabilities evolve!


Hayk C.

Founder | AgentGrow | hit your sales goals.

2 个月

The convergence of AI and 3D interfaces presents exciting possibilities for intuitive interaction. Techniques like reinforcement learning can be employed to train AI agents that understand and respond to user gestures within a 3D environment, enabling more natural and immersive experiences. However, how would you address the computational demands of real-time rendering and AI processing within resource-constrained 3D interfaces?

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了