How Far Are We From Being Able to Generate Whatever 3D Objects On the Fly?

How Far Are We From Being Able to Generate Whatever 3D Objects On the Fly?

Welcome to my bi-weekly newsletter, “I’ll Keep This Short ,” where I navigate the less-traveled paths of AI, building new insight beyond the banal, mainstream chatter.

Step Into a New Dimension

Walking across your living room floor, coffee in hand, you ready yourself to sit down in your nice, relaxing avocado-shaped easy chair for some well-earned rest after a hard day’s work of writing prompts and generating images.

No alt text provided for this image
Dalle-2 Prompt: “3d render of a chair that looks like an avocado digital art”

While you sit there drinking your very real coffee, staring off into space, probably what you don’t think to yourself is, “Whew, I sure am glad that this chair in fact exists in physical reality.

But in fact that’s precisely where we’re at with the vast majority of AI-generated content on the internet today. We’re a heck of a long way off from creating actual 3D content on the fly. Even the avocado chair above, while it certainly looks 3-dimensional, it’s really a 2-dimensional rendering trained on previous 2-dimensional snapshots of 3D renders that a human did.

For those who have used 3D modeling software it’s likely imminently clear what I am talking about. 3D CAD software has been ubiquitous since the 1980’s as something used to model virtually everything, from furniture here on earth to furniture on the International Space Station.

No alt text provided for this image
Z-Plane Desk on the International Space Station (as opposed to a normal XY-Plane Desk on Earth), Credit: Paolo Nespoli and Roland Miller

Since I’m not clear on how familiar the vast majority of everyone reading this article might on the nuances of 3D objects vs. 3D pictures in 2D space, I drew a little demonstration to show what I mean when I am talking about the difference between illusory 3D objects and real 3D objects below.

No alt text provided for this image

If you rotate a 3D object, you should be able to see the other side of it, in some kind of software environment. If you rotate a picture of a 3D object, an illusory 3D object, you will see the other side of the picture frame, and the illusory object does not change.

So where the heck are we as a species in terms of being able to generate some sweet, sweet, real 3D objects? There’s got to be tons of uses for text-generative 3D objects, from being able to generate and 3D-print out your own personal toe-door-opener things you see in bars, to a plastic bust of Karl Marx that fits over the end of your toothpaste tubes, so that Karl Marx can spit toothpaste on to your brush every night.

What Peak Performance Looks Like

This is what peak 3-dimensional performance looks like. You may not like it, but this is actually the first actual 3D printed object that was created using mathematics - the Utah Teapot , first rendered in 1975 by a researcher at the University of Utah.

No alt text provided for this image

Short and stout, with a handle and a spout, when you tip it over, you realize it’s actually rendered via Bézier Curves rather than just perhaps a bunch of points manually configured by hand in a grid space. Bézier Curves are essentially parabolic lines defined by mathematical functions, like these. You can imagine how a congruence of several of these in a defined way can be used to create objets.

No alt text provided for this image

Let’s contrast this to an illusory 3D avocado teapot, as interpreted by Dall-E 2, just for kicks:

No alt text provided for this image

While cool, it’s a hallucination, without any real physical embodiment, that is to say, there isn’t really a 3D point cloud which dictates how those shadows fall and how that light bounces off of the surface. There would be no way to, “rotate” these on the screen, they are purely illusory 3D objects, not, “real 3D objects.”

The above gives us a foundational understanding for where 3D graphics came from in the first place. So how about generative 3D objects?

Enter Shap-E

Perhaps you’ve heard about Dall-E, how about Shap-E? Recently, a paper came out from OpenAI researches called Shap-E , which is a 3D object generator. From the paper, Shap-E is an improvement over a previous model called Point-E . Whereas Point-E modeled point clouds, Shap-E uses something called Neural Radiance Fields (NeRF) which represents a scene as an implicit function. Never mind what NeRF is for a moment.

What you get as a result of NeRF in contrast to Point Clouds is something like this:

No alt text provided for this image

As opposed to Point Cloud images which are very detailed like the following, but are lacking in realistic surface interpretation:

No alt text provided for this image

  • So here's the tricky part. While this does seem to be an interesting approach, the value of Dall-E and other generative AI seems to be in part the capability to create, "whatever," but you don't seem to be able to do that with Shap-E, it creates all sorts of mistakes, e.g.:

No alt text provided for this image

  • The resulting samples also seem to look rough or lack fine details, or outright hallucination in the form of not being what was requested.
  • Further, this is not covered in the paper, but the architecture seems to be super resource heavy as far as I can tell, I tried to run it in a Colab notebook and it took forever, so this might not be, "cheap." I’ll go over this in the next section.

My Attempt At Running Shap-E in a Colab Notebook

I was able to render an image of a dog with a HuggingFace demo:

No alt text provided for this image

  • Ran the model with a prompt:

No alt text provided for this image

  • …and I was able to successfully convert this into a GLTF file, which looks like the following:

No alt text provided for this image

Mathematical Background

Point-E Math

I'm skipping the math section for the Linkedin Article. To view the math, go to the substack version of this article .

Point-E Result

So as a result, Point-E was able to generate images which are very detailed like the following Avocado chair:

No alt text provided for this image

Shape-E Math

I'm skipping the math section for the LinkedIn Article. To view the math, go to the substack version of this article .

Shap-E Result

Shap-E was able to generate images which were, "pleasing," “smooth,” and did not skip out on parts of the model as opposed to Point-E, like the following:

No alt text provided for this image

  • So here's the tricky part. While this does seem to be an interesting approach, the value of Dall-E and other generative AI seems to be in part the capability to create, "whatever," but you don't seem to be able to do that with Shap-E, it creates all sorts of mistakes.

What About Just Rendering with Code with a Large Language Model?

As I have mentioned in a previous post, large language models have a problem with factual knowledge alignment , and this goes in particular for more specific, niche topics.

We can observe that the best of class, GPT-4 LLM as of May 2023 does not deliver even the simplest everyday object:

Create a house in OpenScad        
No alt text provided for this image

Imagine trying to build an actual house with this technology. Gah! What happened to my roof? I appreciate that my car is dry but really it would have been much better to protect my living room.

There’s a Market for That

  • So of course I set up a betting market on Manifold, my favorite Predictions Marketplace, to try to get an idea of where this will go.
  • As of the time of writing, this is sitting at about 65%, but you can check the new probability below or by following the link to the market itself.
  • I will not bet on this market, but I will have to figure out a way to resolve it, which may become contentious if there is a large trader volume by the time it needs to be resolved, prior to June, 2024.

Market on Non-Crappy 3D Generative Objects

No alt text provided for this image


要查看或添加评论,请登录

社区洞察

其他会员也浏览了