Creating a Shirley Card for AI
Digging into MidJourney, I have been so engrossed in the process of how I engage with the lab, what the process of prompting the system looks like, and how the community shares words that form the prompts. I’ve been pulled into the act more-so than the output itself. I will note the output is super interesting, and you don’t need to scroll far in LinkedIn to see amazing work (I follow too many people to tag). Perhaps it’s the super-8 enthusiast in me, or all of those photography classes, but I keep trying to think about how prompts are notes, frames, or units of currency that trigger mechanical iterations.?
One way I’ve looked at a set of prompts is as a composition, like how a musician might improvise. In the Context of Poetry: 1946-1950, poet Robert Creeley wrote:
"Writing is the same as music. It's in how you phrase it, how you hold back the note, bend it, shape it, then release it. And what you don't play is as important as what you do play."
And it is that last part “what you don’t play is as important as what you do play” that I want to focus on. For example, short poetic prompts entered into MidJourney can produce incredible inspiring images. While an over detailed entry will often result in major concepts or objects being ignored by the machine. Because of that, I fight the urge to control the output with too many prompts. I am becoming more comfortable expecting my prompts to yield a surprise. The process is sort of like corralling or building on a digital exquisite corpse. As a creative, this process frees me from usual deadline-inspired sprint to deliver. I expect to ideate, to explore, to meander. I am trying to see my engagements as a meditation, akin to journaling.?Building on that, MidJourney’s first advice in Tip for Text-Prompts is “Anything left unsaid may surprise you”.?
In the 1950s, Kodak introduced the Shirley Card, a physical printed photograph, as a means of calibrating photographic equipment for skin tone.?As a business, consistency of equipment is very important if you expect your customers to connect your brand name with keeping memories. In addition, any business focusses their advertising on the customer they believe most likely to pay for their product. In this case, Kodak saw their customer as light skin toned, and therefore, they had the incentive to calibrate their film and equipment to serve lighter skin tones, while underserving customers with a darker skin tone. (If you want an interesting podcast looking into the history of the Shirley Card, check out 99%Invisible. Fuji film made some adjustments over the years to expand the range of skin tone capture. While digital cameras were developed on a similar flawed inception, they have made big leaps in better rendering of a diversity of skin tone. In addition the internet is full of articles giving tips to on how to better use technology to better capture a wider range of authentic skin tones.?
领英推荐
Coming back to AI and “what is left out will surprise you”. Perhaps, it won’t surprise you that AI programs seem to be built on an additive model. You start with a neutral object, person, location and then building it out with descriptors. For example, if you do not quantify race, sex or age, prompting something as simple as human will most likely yield a 30 year old white male. Or prompting a woman will result in a 20-30 year old white woman. Michael Senkow does a great job showing how this works prompting “neutral” terms like good, bad, lawyer, etc.
Anyone who has dealt with stock imagery understands how to live and work in a racially compromised space. This is a subject many people have worked to address, but stock searches require qualifiers like race, gender, or age to reflect a world that we live in.?
I can imagine that AI is built on the dominant culture, or possibly on a feedback loop of a perceived dominant culture.?And while every engagement has the opportunity to make the digital reflection more inclusive, humans seem to lean towards stereotypes. So does AI. In July, DALL-E2 added a technique to diversify outcomes. And I would assume that other AI Labs are working towards addressing biases. Representation matters, especially in art and advertising.
Like everyone else who engages with MidJourney or DALL-E 2, I see the tools at my disposal as a way to see what is in my mind that I have never actually “seen” before. It is also a space to challenge my assumptions, my expectations, and my biases.?For one of my morning AI meditations, I wanted to see how I could develop a series of racially diverse Shirley Cards without overly defining race or age. That didn’t work. While I got a similar tone of the original Shirley Card, all of my models were light skinned. It is possible that the Shirley Card prompt was either too weak or too strong. But I needed to add other qualifiers to see women of other ethnicities. I found that the more general I prompted, the closer the results mirrored the stereotypes I see in stock photography. It makes sense that a more generalized term, like Asian, yielded an amalgamation from a larger subset, and was thus more generic, while a more specific prompt of Hmong, Tamil, Indonesian or Mongolian used a smaller subset, and showed more specificity, including cultural details in the clothing. In this case, the specificity mirrors google searches, but definitely out performs stock imagery sites.?
As people engage more with AI, will the machine learn to bring in more diversity without the additional prompts? Or perhaps the additional qualifiers become more important. Either way, I will continue to explore the process as it evolves. And every morning as I open MidJourney I remind myself… “And what you don't prompt is as important as what you do prompt”.
Collaboration Manager
1 年RobertI'm from 123RF, home to 200M content. Have you explored our new AI search? It offers unmatched accuracy, long-form description searches, and lightning-fast speed, giving us a competitive edge. Give it a try here www.123rf.com.