Prompt Crafting
Opinions expressed here are my own.

Prompt Crafting

My Imperfect Approach After 20,000 Images

Using Midjourney is like painting with words. ??? In a prompt, each word acts like a faint picture that layers and mixes with many others, gradually forming a clearer image. The more words you use, the less direct influence any of them may have in the final image, but it may also give you more interesting results.

I started using it in the Summer of 2022 , and have since continued to experiment with it for all kinds of personal projects totaling over twenty thousand images. Controversies aside, creating images with generative models feels as powerful as instantly tapping into nearly all of the planet's fictional and non-fictional multi-generational inspiration. All the while, the results are sometimes far from ideal and the tool makes you feel as if you are working with sticks and glue, visualizing silly visual disasters at any turn of the crank.

Aside from Adobe, which has integrated its own (Firefly ) model into familiar tools such as Photoshop, Illustrator and other applications, so far, the user experience of creating with AI is generally cumbersome. And for those attempting to train their own models, patience and technical expertise will be a requirement. For the most people, Midjourney isn't frictionless either, but the learning curve is manageable. It provides thorough documentation as well as an engaging creator community from whom you can learn. Above all, it has been fine-tuned to create what are arguably the best-looking results of generative AI out there.

This article is my attempt to share a bit about my own approach, how I organize my thoughts and went about creating the kinds of images you see below.


A compilation of some of the images I have created with Midjourney in the last year. They are mostly of photographic quality, with a few more graphical exceptions.
A compilation of some of the images I have created in the last year.


Verbal Vision: The Art of Directing with Words

Before we begin, if you have never used Midjourney or Discord, have a look at this ??Quick Start guide as it will walk you through all you need to know.

I should begin by saying that there isn't a right or wrong approach to putting together a text prompt. But, if you have been creating your images with a single phrase or two, it is likely that you have felt frustrated with the lack of control and a sense that there is so much more creativity to unlock. An example of an ineffective approach I have seen people take goes something like this:

A laughing emoji made out of pastel-toned rubber on a colorful background.        
A row of 4 images depicting a series of vibrant illustrations of a laughing emoji in a three-dimensional setting.
Images created in Midjourney with a simple phrase.

If you like the results above, it is because Midjourney uses all of its might to give you something good looking. But it will most certainly resemble images you have seen across your feed because the prompt instructions lacked originality. Just like any kind of art direction, being opinionated, with a clear and detailed point of view will always help you take things further.

The approach I will walk you through can be a little wordy, but it has worked for me so far. Have a look and if it doesn't work for you, consider browsing the Explore page on MJ's site as it exposes the prompt for each image.

I start by assembling a loose sequence of words to get a feel for things and begin defining the look I have in mind.

smiling with hearts emoji, object, sculptural, goo, liquid, Zen, kawaii, immersive, foggy environment, pastel palette, tone on tone, pastel tones, whimsical, aerogel, inflatable, fluid, cymatics, bubbles, Voronoi, macro photography, miniature, graphical, friendly, highly detailed, RedShift 3d        
A recorded animation of Midjourney's rendering process depicting a 2x2 grid of emojis going from blurry to sharp.
A recording of Midjourney's rendering process.

As I continue to experiment, I will generally combine two prompting strategies into one larger prompt. The first is Descriptive Prompting. This involves providing highly detailed and specific descriptions of the desired image, including colors, objects, setting, lighting, and mood. The more precise the details, the closer the resulting image is likely to match your vision.

The second is Abstract Prompting and here, I use abstract concepts, emotions, or themes rather than concrete visual descriptions. I find it useful for generating a more artistic and interpretative result in my images.

Within these strategies, these are the different areas I pay most attention to:

  1. ?? Objects: These first words will dictate the primary subject of your image. Whether it is a portrait of a dog or a lunar landscape, this is where you will want to place those first words to guide the prompt. It is also where you will swap words or add new ones to create more images in the same aesthetic. In my case, as I experimented with emojis, it was easy to swap the name of one symbol for another. (Emojipedia as a great cheat sheet). ??
  2. ??? Setting: Describe what the environment looks like. Is it a room with specific architectural influences, a lush landscape or a minimalistic studio environment? Try to envisage as many specific details as possible and see how the tool attempts to capture them.
  3. ?? Aesthetic Influences: This is where most of my Abstract Prompting approach will come to life, as I throw lots of obscure references and ideas that come to mind as potential influences. Try experimenting and see how it affects your images. For instance, I have noticed that words such as "tsunami", "simulation", "soundwaves", and many others alike, tend to add interesting movement and distortion to the subject matter.
  4. ???? Lighting & Camera: It doesn't take much experience with photography to notice how lighting is as important as the subject of your photograph. When creating with generative AI, the same idea rings true. Having a strong point of view regarding the general mood and atmosphere of your scene, as well as technical photographic details will go a long way. Therefore, aim for words that add insightful detail and that go beyond technical ways of describing your lighting influences. There are endless articles (such as 1 and 2 ) and videos out there on this topic. Enjoy exploring!It is also worth noting that once an image is generated, you will have the ability to extend it further with pre-canned and custom camera controls.
  5. ??? Style: Here you will want to describe the aesthetic, artistic, or visual approach you want the generated image to embody. This can include a wide range of elements such as artistic styles or even historic of cultural influences. My preference is to find ways to mix realistic with graphical styles as the collision of the two tends to give out more surprises.
  6. ?? Other Details: I enjoy continuously building on prompts by adding more influences that might add texture and detail to the image. As the results come through, I might decide to bring them to the top, or further down in the prompt as a way to maintain the theme's original integrity.
  7. ?? Rendering: Consider specifying some of the rendering characteristics of your image. I enjoy adding names of software that has a specific aesthetic, or capability as it may add more character to the image or borrow the look and feel associated with certain 3D engines.


Graphic summarizing my approach regarding how I structure my prompts.

See below how my prompt evolved after using this process. You will notice how I tried to cluster the words together in the order in which I deem most important. Part of the fun, however, is to re-order them again and again, bringing to the top some of the words you wish to play a stronger role in your results.

smiling emoji, object, sculptural, 

tone on tone background, gooey background, goo, immersed in liquid, immersed in milk, bright space, depth, foggy environment,

muted palette, gradient, minimal use of color, tone on tone, pastel tones, pastel materials, range of color, lightly colored particles, matte material, not shiny, invisibility, aerogel, subsurface scattering, rubber, clay, puffy, fingerprint, smooth, hydrophobic, latex, wax, porous, multi-layered, whimsical, inflatable, lightweight, cosmetics, see-through, translucent, minimalistic, fluid, flow, explosive, vibrations, decomposition, impact, waves, tsunami, cymatics, pattern, moiré, procedural, simulation, speed of light, effervescent, bubbles, sparkles, labyrinth, Voronoi, crescendo, wrath, wispy, dynamic, oscilloscope, trigonal, ferromagnetic, soundwaves, acoustic irradiation, liquification, constructed, disintegration,

volumetric light, voluminous, back-lit, silhouette, prismatic depth, glistening details, prisms, sharp, close-up shot, macro photography, tilt-shift lenses, miniature, depth of field, bokeh,

graphical, illustrative, bold shapes, geometric, beauty, beautiful, exuberant, irreverent, disproportionate, symmetrical, centered, realism, graphical, undertone, dynamic, friendly, positive, highly detailed, detail, precise, intricate, crisp, abstract, small cracks, fissures, debris, ash, dust, microscopic ambers, golden specks, torn, flag, smoke,

microscopic details, filament, fiber, strand, thread, hair, hairy, fuzzy, furry, waves, 

Houdini, RedShift 3d render, Octane render, 8K, post-processing        

Long and complicated prompts like this one comes with disadvantages because adding a lot of words makes it so that the impact of each one is lessened. It is like trying to get direction from lots of voices at the same time. But at the same time, a noisier approach such as this where you reinforce the same direction in different ways often results in a more complex image that you may not reliably get otherwise. ????

Visit this page to see many of my results up-close.

A grid of 3x4 images featuring my results in highly stylized and vibrant emoji illustrations.
Some of my favorite samples in this exploration.


Here is a close-up look at one of the images. It is easy to notice how this level of expression and detail clearly far exceeds what we had at the beginning. ??

Recently, the ability to change specific areas of an image was introduced. By using the "Vary Region " feature, you can select an area of your image to be re-created. So, in my case, I removed the little blob over the right eye.

Vary Region

Parameters

At the end of your prompt, you have the option to include more instructions to change how your image generates. These will give you finer creative and technical control to take your images to the next level. Visit this page to learn about everything that is currently offered and have a look below where I have included a breakdown of some of the most consequential ones.




Versions (--v): This refers to the version of Midjourney's AI Model. The experience in Discord will default to the very latest, but it still allows you to choose previous models. As seen below, despite the lack of sharpness and definition, something funky and beautiful happened in versions 2 and 3, ?? which somehow got diluted in later versions. Go back in time and explore your ideas in older models for some delightful surprises.

The same prompt put through Midjourney's v1 through its latest, v5.2.


Midjourney also offers an entirely separate specialized algorithm called Niji, which means rainbow in Japanese. It is specifically trained on anime aesthetic and will give you entirely different results from your prompts. See below how my prompt came out.

Samples of the same prompt using Midjourney's Niji Mode, which show 4 images in an anime-esque look.
Samples of the same prompt using Midjourney's Niji Mode.


Stylize (--s): All the images I generated used the highest Stylize parameter, which is 750. But you can change this value in settings , with 50 being the lowest. To better understand what Stylize is capable of, see what --s 50 did to the same ghost emoji prompts. This approach is ideal when you are aiming for an iconographic look.

Samples of the same prompt at the lowest stylize level, which show 4 ghost characters in a simplistic, minimal look.
Samples of the same prompt at the lowest stylize level.


Chaos (--c): This is one of my favorite parameters because it increases the variability of your prompt, and it is a great way to go if you are on the hunt for something unexpected. Lay the foundation with your prompt construction but throw in --c 100 at the end for a surprise. Here are some of where I landed with Chaos at its maximum value.

A grid of 3x4 images created with the Chaos parameter at 100.
Samples created with the Chaos parameter at 100.


Weird (--weird): Equally interesting, but in my view less effective, is the Weird parameter. It is a highly exploratory and will likely keep drastically evolving over time. In fact, the results I got in one day, were entirely different from the next.

What it essentially does is to influence the degree of uniqueness of an image in relation to previously generated images. This is how you can introduce an even more unconventional aesthetic into your creations. See below how my explorations came out when I used values between 1000 and 3000 (its max).

Samples created with --weird between 1000 and 3000.


Image Prompts: Using an image as part of your prompt and give you an even greater sense of control. I have had some fun in the past by creating self-portraits with this feature. ?? But I also tested my smiling emoji prompt by removing the words that described the emoji itself and replacing them with and image of the Fluent Emoji we designed for Microsoft (seen on the left).

In this case, it appears to have borrowed from its centered composition, the color palette, and a more simplified, graphic aesthetic.


Aspect Ratio (--ar): The ratio of all your images will be square (1:1) by default, unless you customize it to your needs. The explorations below were created with --ar 2:1, and as you can see, ultra-wide ratios will be more predisposed to cinematic results, where the backdrop plays a stronger role alongside the main subject matter. Even then, if what you need is a square image, you can always use the "Make Square" or "Zoom Out" features to expand and adjust your image after the fact.


Upscaler: This new feature allows you to not only enlarge your image to up to 4K resolution, but it will also dramatically enhance its details as the example below illustrates. I love it! ??

A sample of what the Upscaler feature is capable of.
A sample of what the Upscaler feature is capable of.


This is it for now! ?? I hope you have enjoyed this little breakdown. As you can see, there are not any secrets to creating great images in Midjourney, or any other tool for that matter. It simply takes basic research, tinkering, and patience.

If you find this approach helpful, please share your results in the comments. ???


thank you! it helps so much

回复
HEBE CORVI

Dise?adora Gráfica Sr. Especializada en branding.

11 个月

?Excelente!

回复
Alicia Thornber

Freelance creative director/brand strategist

11 个月

Thanks Nando Costa! Looking forward to trying out your process and suggestions.

Nishant Nayak

Multi Camera Director specialising in Live Music & Unscripted Formats II The Traitors II Shark Tank II Coke Studio II MTV Unplugged II Fear Factor II Masterchef

11 个月

Invaluable tips Nando Costa ! Thank you for sharing so generously ????

要查看或添加评论,请登录

社区洞察

其他会员也浏览了