Generative models only Generate
There are lots of challenges in building more complex and robust applications that incorporate generative models. One of the more frustrating ones is that it *seems* like you’re talking to a reasonable “person”. It’s ok that they make a few mistakes, right? We do what we do with an actual person - ask them to fix the mistake.
Except, what a generative model does isn’t just fix the mistake - it (re)generates the entire piece of work, from scratch, trying to fix the mistake. It’s easy to see with DALL-E. Tell it to make a sign with some words, it will usually misspell or misform some of the letters. That’s ok! Just tell it to redo one word - except it can’t. You’ll get an entirely new image every time.
We want, and need, these systems to be reliable in order for them to be valuable. People aren’t reliable, but we get along just fine. What’s the difference? At least part of it is that we can *iterate* with people (and with ourselves when doing a task). And that iteration can move flexibly between scales and scopes. If we have a big task to do, we start out sketching out the overall flow, then we work on smaller pieces (sometimes linearly, sometimes not). Then we refine and gradually get to smaller and smaller pieces. Sometimes we have to back up - we rewrite or rework a big piece of content - and that’s annoying! Imagine trying to work if you had to do that ever time.
领英推荐
What does this tell us about building with generative models? That scoping the work is really important. You have to restrict the model to generating only what you want it to generate. You can’t give it a large artifact and expect it to only change part of it and preserve the rest, like a human would. This is a great job for code, in the “think with the model, plan with code” sense - using code to break up and isolate parts of the problem so the model can’t “get into trouble”.
It’s hard to keep this in mind because the interaction feels so natural, but generative models can ONLY generate. They can’t read, they can’t modify. They can just take some input and generate some output. Everything else - the iteration, the selection of scope and context, the construction of the prompt, all of it has to come from outside the model somehow. Right now, that’s mostly with human effort, hopefully more and more that becomes with better coding practices.
CTO | CPO | Chief AI Officer | AI | Blockchain | Cryptocurrency | Technology Strategy | Scaling Startups & Unicorns | Global Operations | Investor | VC | Advisor | Speaker | Board Member
8 个月I believe future generations of GenAI models will offer robust editing capabilities. Think of today’s GenAI models as writing on paper -- sequential, forward-only, and difficult to edit. In the future, with advanced model structures and inference control, GenAI models will be like word processors -- seekable, insertable, and updatable -- allowing for easy editing.
That Product Guy
8 个月I’ve found the experience you described confounding in Chat GPT and Dall-E - I mean, the model just generated the text, yet misspell the same text in the generated image. Do you think this is simply a beta issue for generative AI that would improve over time or AGI with better reasoning is the only fix?
Data & Analytics Sr. Manager | MS, Data Analytics | USMC Veteran
8 个月Generative models provide functionality we are still figuring how to best implement inside of larger solutions. Design patterns in real world solutions using generative AI are still being developed and field tested. The technology itself is advancing faster than our collective ability to implement it, test it, document our findings and understand those findings at scale.