Generative AI Storyboards & Regional Prompting
I used to make films and music videos. I wasn't particularly good at it. One of my favorite parts was pre-production. I understood that I couldn't compete on resources, so my best chance was to compete on preparation: if everyone knew exactly what we were trying to accomplish, we could do more with less.
Storyboarding was absolutely essential to this process. Storyboarding is simply laying out the shots of your film (usually comic-book style) such that your crew knows explicitly what types of coverage (shots) you are trying to achieve in your scene.
Storyboards can be incredibly detailed:
Or they can be incredibly essentially glorified stick figures:
Style can help - but storyboards are all about substance.
Rich details - specifically as they pertain to frame selection (Medium Shot? Wide Shot? ECU?) perspective (OTS, Dutch Angle, etc.) suggest to the cinematographer what lenses they may want to opt for, and how they might light the scene. The grips and gaffers will understand how to navigate around the set to light it and set up dollies effectively and (ideally) out of the way of framing. The editor can understand how she is going to eventually edit the scene (assuming, of course, that all coverage is captured). The director can lay out their vision and articulate it to the actors and producers.
Storyboards are incredibly valuable. They are also incredibly time-consuming. So when I saw that Storyboarder.ai was available, I got extremely excited! Like most former filmmakers, the bug is still alive and well within me. I have several ideas ready to rock, so I figured I'd give this software a chance. Ultimately, I was disappointed and felt that my needs were insufficiently met... but I feel like this is the opportunity for them to drive real value.
Before I dive into what I found on Storyboarder.ai, I want to highlight what I believe are the three "must haves" in any storyboarding software - whether it be GenAI driven or conventional (like Frame Forge).
FrameForge at storyboardsmarter.com does all of these things very well. It just takes a very long time to use the software proficiently, and the beauty of GenAI is that it has the potential to significantly reduce the time it takes to get imagery generated, and that's why I was so interested in storyboarder.ai.
Here's the problem: despite claims that they can offer stylistic unity and asset preservation within a logical framing environment... the evidence I found is that they're not there yet. I'll get to prescriptive solutions later, but for now, here's what I saw:
I began by uploading details on a story I had written about Japanese internees in Minidoka playing baseball. Their software (seemingly powered by ChatGPT, which isn't a bad thing) concocted a shooting script and then broke it down into shots. Pretty neat!
The problem is that, while it advertised stylistic continuity, the result was actually quite different. As you can see, the "stylistic continuity" is basically centered around one thing: monochrome. We have a blend of live-action and animation here, which isn't particularly useful stylistically. Beyond that, what I need is to understand how the specificity of my shot requests can be accommodated by the software when I change the direction.
To do this, I clicked into Scene 1, Shot 3: "Tommy Ohara confidently playing shortstop".
I changed from a Medium Shot to an Extreme Wide, and the change I received was:
Before I get into why I hate this generation, let's talk about some decent features. The ability to select various shot sizes, perspectives, focal lengths, and aspect ratios is useful. In fact, I'd go so far as to say it's necessary. Where FrameForge succeeds, and I strongly suspect that this project succeeds less, is that FrameForge's focal lengths actually matter. The focal length of the lens explicitly determines what is and is not "capturable" within the environment of the playscape.
But as you can see here, the actor (asset) from the first iteration is not preserved in this variant. And while there does appear to be stylistic continuity between this shot and the previous, it's also worth pointing out that this is unlike any infield configuration I've ever seen: Tommy Ohara does not appear to be playing any position, let alone shortstop.
And so I decided to change one parameter - and one parameter only: I changed the shot size to Medium Close-Up.
领英推荐
Tommy is definitely doing something having to do with baseball. Shortstop? I'm not sure. This is still 50mm? That's interesting. As you can see, Stylistic Consistency has been completely taken out to the woodshed. Now we're looking at Lone Wolf and Cub meets Eight Men Out.
So now I wanted to try to get back to the initial styling, and this time I changed the perspective to a dutch angle (tilted camera):
So this isn't a dutch angle. This is a straight-on perspective with the actor turned. GenAI struggles mightily with different perspectives, so this isn't a surprise - but if the tech isn't "there", the feature should be turned off.
That was the end of my journey with storyboarder.ai, because, to be honest, they made it clear that they weren't able to meaningfully expedite the storyboard process in a way that would be scalable and repeatable with respect given to the playscape and the assets.
Now, while this was happening, I was also communicating with one of my artists about a scene we were trying to animate for a project I was working on.
Her first draft? It was quite good... but it was also drastically different than what I was hoping to create in terms of look, framing, and feel.
I'm not a skilled enough illustrator to show her differences in her rendition from my desired outcome, but I was able to go into PowerPoint and "layout" what I thought should be the proper layout of the scene.
That is to say, I was "regionally prompting" her. I took various components of the frame that I wanted to see and "prompted" them... but constrained that prompt to a specific region. I gave a few other directions beyond this prompting (more animated fish, Goro dressed more disheveled)... but this was all the direction beyond the prompt.
She took this regional prompting and provided:
...and this is significantly closer to what I wanted.
Note how:
a.) the assets are preserved (the women in the back, the main male protagonist)
b.) the environment (playscape) was redrawn to conform to my demands. My expectations would be for additional scenes in this scene to preserve this dynamic and structure.
c.) it is stylistically consistent in terms of overall artwork and sketching.
This is where AI Storyboarding needs to go. I should be able to direct the camera with regional prompts within a UI interface that preserves both assets and backgrounds. But the generative aspect needs to be self-contained:
The question I have is whether or not GenAI can get to this point. In the past two years, I have seen mild (at best) improvement in the quality of images and the sensitivity to responding to prompts accurately. What I have not seen is any ability whatsoever to preserve assets, preserve an environment, and allow for discernible consistent style between scenes.
I want to see it! If there is a company that is doing that today, please tell me, and I'll happily give them my money. But until then... unfortunately... I've got to slog through the old-fashioned way.