Game GANNS
I was on a call with a group of CTO’s working on advanced new AI models.? I’m not one of them, my interest in neural networks died of boredom with them in the 1980’s.? I was invited as a resident authority on gaming, crypto and media technologies.? During the introductions, I said I was just there to learn about the state of the industry. ? What followed however surprised me.? It was a lot of discussion about how limited and stuck modern AI models are.? Apparently, the magic feature of generative AI models is that they can train themselves given enough properly classified information to learn from.? The internet provides a near-bottomless supply of photos surrounded by related text. It was possible to train an AI on the scale of Chat-GPT because the adversarial network had a bottomless supply of contextualized photos and text to draw on.? We don’t have such a vast qualified supply of human-characterized training data in many other areas to draw on.??
Take the idea of creating AIs that can code for example. It’s wonderful that we have this vast sea of open source code on GitHub to draw on but there isn’t a vast body of qualifying information about WHAT it’s supposed to do to enable a GANN to self-train against it.? It needs lots of working code and a lot of context for WHAT that code is supposed to do in order to “learn” from it automatically.? In a sense, Chat-GPT is a novel achievement in highly specialized AI functionality given that it was born and raised inside a solitary confinement cell where its exposure to learnable information was highly constrained.? It’s a sort of technological Hellen Keller story.?
I had an epiphany during the conversation… “Oh, that’s why it’s simultaneously able to produce these amazing works of art while never quite managing to get hands or eyes right.? Chat-GPT also appears to be able to interpret text but can’t read it in images.? It really doesn’t understand depth or lighting!”? Hands are famously hard for humans to learn to draw because they are such complex 3D objects with very complex lighting properties.? What followed was some discussion about how to train an AI to perform photogrammetry in order to bring AI into our world of 3D depth and interactivity.??
One famous CTO guy discussed creating a reward system to build a network of mobile phone users to use their phones to scan 3D scenes and objects and tag them to help train a depth and lighting-aware GANN.? As soon as he said it, I realized that I knew how to solve the problem without any of that.? Since I’ve written enough patents for one lifetime, I’m just going to put it out there.?
Here is how to construct a self-training GANN that can extract 3D depth and lighting data from any arbitrary video.? You use a modern game engine or photorealistic renderer to generate random scenes of 3D objects, materials, and lighting properties.? Fly the camera around the scene randomly and record it all as a video.? Automatically tag the video with the parameters that the renderer used to generate the scene.? Use the video to train the adversarial CNN models.? Now you have an infinite supply of highly qualified 3D video suitable for unsupervised learning.? Better still, the video is very compact to store and train with because it can be regenerated from its parameters on demand, vastly reducing the storage and network infrastructure required to train such a model.
The resulting GANN should be able to do things like watch all the videos on the internet and extract all the models, textures, lights, and materials from them.? It should also be able to generate very realistic entirely artificial videos.??
领英推荐
I’ve done a ton of work on artificial vision systems over the years and the problem with seeing things in drone or satellite video is never actually having enough sample video to test with.? The way I’ve always solved it by building a virtual model in a game engine like Unreal or Unity first and using that to generate training video for the vision system.? This is classical CCN kind of training stuff.? Because you know the absolute location of all the objects and features in the virtual scene you can automate measuring how well any human-authored algorithm performs at identifying it.? It’s a natural step further to apply the idea to training a GANN.??
We, humans, have figured out how to generate very realistic artificial 3D worlds for games and movies, we don’t have to use real tagged video to train a depth-aware GANN when we have very powerful tools to generate an infinite body of realistic 3D video randomly and perfectly tagged.
The same idea works for teaching a GANN to read text from the real world.? You generate text from random fonts with random backgrounds into textures and place them in a 3D scene.? Fly the virtual camera around the scene to generate a perfectly tagged video, then train the GANN on that tagged video.
Speech is a little more interesting to think about because when we watch a movie the dialog is obviously very contextual to the entire movie not just the frame of video the speech is occurring in.? Before the GANN could be trained to recognize and author plausible movie dialog it would need to understand enough about the overall context of the movie and its setting to have a real chance at producing something plausible.? So I would theorize that in addition to solving some other GANN training problems you need to solve the 3D lighting and depth one in order for the GANN to be able to contextualize movie dialog with the movie video content.? Fortunately, we already have great speech-to-text technology and we have GANNs that understand the text so… you could watch all Youtube and Twitch videos, extract all speech into text, and then tag the original video with the text in order to help the 3D aware GANN relate the 3D scene context to what is being said.
It follows that in order to create AIs that can truly relate to humans on our terms, we need a way to train them to understand our physical world, however, that would require a lot of expensive robots. A better approach might be to train them inside a game world with game physics and materials. You give the AI a virtual robot body to interact with virtual 3D objects, lighting, and physics in order to train it to relate to our reality automatically.
Anyway, here’s a freebie for you AI guys trying to figure this out. I'm not an expert on adversarial AI yet, but... give me 5 minutes. Got a thought on how to generate actual games this way as well but that’s another article. ?
Owner Operator at 21e8
2 年just use 21e8
Filmmaker, Virtual Production Supervisor, Executive Producer.
2 年Yummy. Synth 3D data made in a game engine.