Generative models only Generate
Sometimes DALL-E really nails it

Generative models only Generate

There are lots of challenges in building more complex and robust applications that incorporate generative models. One of the more frustrating ones is that it *seems* like you’re talking to a reasonable “person”. It’s ok that they make a few mistakes, right? We do what we do with an actual person - ask them to fix the mistake.

Except, what a generative model does isn’t just fix the mistake - it (re)generates the entire piece of work, from scratch, trying to fix the mistake. It’s easy to see with DALL-E. Tell it to make a sign with some words, it will usually misspell or misform some of the letters. That’s ok! Just tell it to redo one word - except it can’t. You’ll get an entirely new image every time.

We want, and need, these systems to be reliable in order for them to be valuable. People aren’t reliable, but we get along just fine. What’s the difference? At least part of it is that we can *iterate* with people (and with ourselves when doing a task). And that iteration can move flexibly between scales and scopes. If we have a big task to do, we start out sketching out the overall flow, then we work on smaller pieces (sometimes linearly, sometimes not). Then we refine and gradually get to smaller and smaller pieces. Sometimes we have to back up - we rewrite or rework a big piece of content - and that’s annoying! Imagine trying to work if you had to do that ever time.

What does this tell us about building with generative models? That scoping the work is really important. You have to restrict the model to generating only what you want it to generate. You can’t give it a large artifact and expect it to only change part of it and preserve the rest, like a human would. This is a great job for code, in the “think with the model, plan with code” sense - using code to break up and isolate parts of the problem so the model can’t “get into trouble”.

It’s hard to keep this in mind because the interaction feels so natural, but generative models can ONLY generate. They can’t read, they can’t modify. They can just take some input and generate some output. Everything else - the iteration, the selection of scope and context, the construction of the prompt, all of it has to come from outside the model somehow. Right now, that’s mostly with human effort, hopefully more and more that becomes with better coding practices.

Zhenbin Xu

CTO | CPO | Chief AI Officer | AI | Blockchain | Cryptocurrency | Technology Strategy | Scaling Startups & Unicorns | Global Operations | Investor | VC | Advisor | Speaker | Board Member

8 个月

I believe future generations of GenAI models will offer robust editing capabilities. Think of today’s GenAI models as writing on paper -- sequential, forward-only, and difficult to edit. In the future, with advanced model structures and inference control, GenAI models will be like word processors -- seekable, insertable, and updatable -- allowing for easy editing.

回复
Akin Akinwumi

That Product Guy

8 个月

I’ve found the experience you described confounding in Chat GPT and Dall-E - I mean, the model just generated the text, yet misspell the same text in the generated image. Do you think this is simply a beta issue for generative AI that would improve over time or AGI with better reasoning is the only fix?

回复
Lance Hughes

Data & Analytics Sr. Manager | MS, Data Analytics | USMC Veteran

8 个月

Generative models provide functionality we are still figuring how to best implement inside of larger solutions. Design patterns in real world solutions using generative AI are still being developed and field tested. The technology itself is advancing faster than our collective ability to implement it, test it, document our findings and understand those findings at scale.

要查看或添加评论,请登录

Sam Schillace的更多文章

  • AI analogies and historical lessons

    AI analogies and historical lessons

    How to make sense of it all. I've decided to keep the primary posts over on Substack.

    1 条评论
  • Motion, Thought, Systems and AI

    Motion, Thought, Systems and AI

    In which I ponder how motion is like thought, why LLMs are like early steam engines (hitting things and pumping water),…

    4 条评论
  • Looking back at the Schillace "laws"

    Looking back at the Schillace "laws"

    Way back in, I think, March or so of 2023, after I’d spent a little while trying to build things with GPT-4, I wrote…

    5 条评论
  • A strange tech parable

    A strange tech parable

    In my role at Microsoft, part of what I do is spend time with the leadership team that runs M365, Office, and Windows…

    12 条评论
  • Simplicity, Clarity, Humility

    Simplicity, Clarity, Humility

    There is an old joke: “sorry this letter is so long, I didn’t have time to write a shorter one”. It’s funny, but it’s…

    4 条评论
  • A matter of context

    A matter of context

    It’s interesting that, as we talk about using AI more and more, the phrase we use is “human in the loop” instead of “AI…

    3 条评论
  • The tension between Chaos and Order

    The tension between Chaos and Order

    I’ve been spending the last week in Japan, meeting with makers and crafts people. as always, it’s a humbling…

    4 条评论
  • No Prize for Pessimism

    No Prize for Pessimism

    A book! I’ve been writing these letters for about 12 years now. I started writing them when I was at Box, as a way to…

    10 条评论
  • Adding Value in the Age of AI

    Adding Value in the Age of AI

    If you wrote out all possible combinations of, say, 1000 letters, the vast number of them would be nonsense. And the…

    3 条评论
  • Don't use AI to make work for humans

    Don't use AI to make work for humans

    I’ve started to notice an interesting pattern. More enlightened teams and people are using AI to get lots of work done…

    5 条评论

社区洞察

其他会员也浏览了