Better prompts for AI image generators
Jan Tissler
CONTENTMEISTER ? – AI Content Strategy, AI Content Creation, Generative AI Workshops and Trainings, German Content, Translation and Transcreation.
AI image generators like Stable Diffusion, MidJourney, or Dall-E can produce amazing results. But as anyone who has tried these tools will tell you: It's not as simple as pushing a button.
If you want to get good results reliably and regularly, you have to deal with the idiosyncrasies of these tools. As with AI text generators, vague prompts will bring forgettable and random results.
Here are some general tips for creating better prompts:
As you can see, ideally you already have an idea of what you want the result to look like. However, you don't need to have all of the above in mind for every image.
At the same time, the more precise your idea is in your head, the more frustrating the process can be. This is because today's AI tools stumble, for example, on image ideas for which there are not enough (or any) examples in their training material.
Over time, you will learn what works reliably well and what doesn't.
A few more tips on how to proceed at this point:
The providers of these tools themselves have realized by now that many users have difficulty putting their image idea into words.
That's why Dall-E, for example, uses ChatGPT: you tell the chatbot what you need as precisely as possible, and it converts that into an appropriate prompt. Ideogram also does this automatically, always showing you which extended prompt it used.
And Google is experimenting with a user interface that lets you choose alternatives to your chosen terms from dynamically generated pull-down menus. This will help you come up with new ideas.
In this respect: In the not-too-distant future, your prompting skills may not be as important as today, because the tools will actively assist you. However, I believe that even then it will still be a good idea to understand these details in order to work towards specific results.
T O O L S
MidJourney adds feature for consistent characters
Until now, image generators like MidJourney have not been very helpful if you wanted to use a specific, fictitious person in several images. However, such consistency is essential for many applications. Think of comics or marketing materials.
With Stable Diffusion, this can be solved with LoRAs. However, the technical effort is not trivial. Another option is to use celebrities from the AI training material. However, this is not always useful or legally sound.
MidJourney is now experimenting with a feature that should make this possible with little effort: You reference in your prompt another image that already shows the person you want. You can then use it in your new work, adjusting the background, image composition, facial expression, posture, and more.
An article on VentureBeat explains it and shows examples.
Claude 3 Haiku is fast and affordable
Anthropic recently introduced Claude 3, the latest version of its AI language model. After the mid-range "Sonnet" and the largest and most expensive "Opus", "Haiku" is now available. It offers low prices and fast responses.
Unfortunately, Claude is still not officially available in Europe.
Inflection 2.5 promises GPT-4-like performance
Inflection has released Inflection 2.5, an updated version of its Pi personal AI assistant that can compete with leading language models such as GPT-4. Inflection-2.5 achieves high performance on a number of benchmarks while using only 40% of the processing power of GPT-4.
The basic idea behind Inflection is to make the Pi AI assistant more personal. It aims to be "useful, friendly and fun." It is available for Android, iOS, web and desktop.
Source: VentureBeat
Other tools in brief
This new AI model is particularly obedient. While commercial AI offerings have many guardrails and barriers to protect them from misuse, the open model Liberated-Qwen1.5-72B advertises that it has no such restrictions. Instead, it is specially trained to strictly follow the system prompt. This makes it harder to “jailbreak“. At the same time, you have to decide for yourself which answers and topics are allowed. Source: VentureBeat
OpenAI plans to launch the video AI Sora “this year“. The service has caused a stir with its test videos, which promise a significant leap in quality. Source: The Verge
Command-R is a new language model designed specifically for enterprise use. Startup Cohere is targeting the enterprise market. Its Command-R model wants to impress with its flexibility. It includes a large context window of 128,000 tokens, and it can access external information via RAG. Source: VentureBeat
领英推荐
Skyvern aims to automate browser-based tasks. The idea: You give the AI a task in natural language and it gets to work on its own. Source: Hacker News
Kolena is a testing platform for AIs. Any company that wants to offer a chatbot or other AI services will want to put it through its paces beforehand. The Kolena platform promises to do this work for you. Source: VentureBeat
KL3M is a language model that only uses content from documented, legal sources for training. It is the first such model to receive the “Fairly Trained“ seal. Source: Wired
Create custom chatbots with Microsoft Copilot GPT Builder. Microsoft's partner OpenAI already offers the ability to create variants of chatbots for specific purposes (“GPTs“). A similar feature is now available to all Copilot Pro users, as VentureBeat reports.
Video AI Story.com is promoting longer clips. While many AI videos are only a few seconds long, Story.com allows up to 1 minute. A storyboarding feature is supposed to ensure that the clips ultimately meet the user's ideas and needs.
Video AI Pika adds sound. Pika already offers a “lip sync“ feature that makes people speak in generated videos. Now there is an option to add sound to a generated clip, such as background noises and effects. Source: VentureBeat
Google researchers show VLOGGER, which can create lifelike videos of people speaking, gesturing and moving from a single photo. This opens up a range of potential applications, but also raises concerns about forgery and misinformation. Source: VentureBeat
Stable Video 3D creates 3D models from a single photo. It can be used for free for non-commercial applications.
Amazon AI aims to make life easier for merchants. Amazon shows a small, nice example of a practical AI application: The assistant will generate an Amazon product page from a link to a product in a merchant's own online store, reports The Verge. The service is initially available in the US.
N E W S
AWS, Accenture and Anthropic join forces for enterprise AI
Amazon Web Services (AWS), Accenture, and AI startup Anthropic (makers of Claude) are joining forces to help organizations in highly regulated industries, such as healthcare, government, and banking, deploy customized AI models quickly and responsibly.
The partnership will enable organizations to access Anthropic's AI models, including the entire Claude 3 family, through AWS' Bedrock platform. Accenture will provide the technical and industry expertise to refine the models. More than 1,400 Accenture engineers will be trained on how to use Anthropic's models on AWS and will provide implementation support.
More news in brief
Jailbreak with ASCII trick. Researchers from Washington and Chicago have developed “ArtPrompt“, a new method to bypass security measures in language models. Using this method, chatbots such as GPT-3.5, GPT-4, Gemini, Claude, and Llama2 can be tricked into responding to requests they are supposed to reject using ASCII art prompts. This includes advice on how to make bombs and counterfeit money. Sources: Tom's Hardware, Ars Technica
OpenAI's GTP store is full of spam. It seems the startup doesn't have much time to control the individual chatbots in its store, as TechCrunch shows.
Stability AI stumbles. The British startup has popularized the “diffusion“ technique for AI image generation developed by Munich students. Now Robin Rombach, Andreas Blattmann and Dominik Lorenz have left the company, three of the original five people involved in the research project. Source: Forbes
G O O D ? R E A D
These eight people made ChatGPT possible - at Google
This Wired article tells the story behind the development of the “Transformer,“ a revolutionary AI architecture that powers modern language models like ChatGPT. A team of eight Google researchers developed the Transformer in 2017, based on the concept of attention. The team worked hard to complete the paper before an important conference deadline, and their breakthrough was initially overlooked by Google's leadership. The Transformer has since become a foundational technology, and the researchers have gone on to start successful AI companies of their own.
C U R I O U S ? F I N D
Another thing AI doesn't understand: Mirrors
AI image generators often fail because they don't understand (yet?) what they're creating. Mirrors are a good example. Source: Reddit
G L O S S A R Y
Merging
Merging is the process of combining two or more AI models to create a new model. The results can be surprisingly good and do not require expensive hardware.