Instructional Designer, Respect Yourself: Steer Clear of Stock Photos and AI Image Generation
Recently, I've been contemplating establishing a new community named 'Dilettante's Advice.' Laypeople seem to have justified opinions and insights regarding what's considered the "professional" standard across various industries, particularly in areas such as UI and UX. This is because the prevailing standards often come across as banal and tasteless, potentially eliciting irritation from a thoughtful and aesthetically discerning audience.
Because of the benchmarks established by industry leaders, many designers are thoughtlessly adopting these "standards". Take Articulate 360, for example, which boasts about its stock photo collection—depersonalized emanations that are the opposite of art, mere sterile grimaces. These photos are photography without art, dehumanizing products for anyone ready to purchase and use them regardless of how fitting they are. By doing this, by using lifeless stock photos, instructional designers overlook so many artists, sketch specialists, illustrators, and digital artists who are alive and hungry for work, and in many cases, literally hungry for food.
Additionally, consider how inconsiderate it is to the people who are used as models for future stock photo products. Consider the scenario of being invited for a cameo role but not even acknowledged as a cameo in the final credits—unless the stock photo is of Harold, no one knows them. Harold was an exception, as his smile couldn't hide his life experience.
Textbooks are riddled with these stock photos of smiling people in well-lit artificial environments. They may take up two-thirds of the textbook, sometimes more, even in textbooks for advanced, proficient learners. And for what? A proficient learner needs a stock photo above the text about family relationships or corporate culture? Do they really need corresponding stock photos featuring a happy, smiling no-name family in a clichéd picture, or a handshake between smiling men in suits in an office environment? Is this what being proficient in a language means—to engage with a certain theme only with visual reinforcement? What do they take advanced and proficient learners for?
Even children's textbooks feature customized illustrations, meticulously crafted by artists for their intended purposes.
In the past, even though some learning materials might not have been the most visually appealing, publishers often hired artists or included illustrations from important cultural works, like Edward Hopper's art used in "English File" textbooks. Unfortunately, many instructional designers today seem uninterested in using artworks, or may not know which ones to choose. This often leads to relying solely on stock photos, which can feel like a lazy solution.
Picture this: an instructional designer who believes that allocating over two-thirds of valuable space in educational materials merely for stock images—images easily plucked from an Articulate 360 bank with a single click—is adequate. Can you hear the intellectual gears turning? Neither can I. The task of slotting yet another generic photo of smiling people, chosen solely to represent smiling people, hardly requires much cognitive effort. Sure, these images may convey different 'emotions' on occasion, but where's the depth?
In UNICEF's Agora course, instead of using real footage to feature people in need or children deprived of basic human rights, they might opt for neutral photos from a stock image bank representing 'happy' representatives of third-world countries.
Realities are kept hidden. Instead, they may emphasize the word 'sustainability' in size 96 font alongside images of smiling African children in neutral environments, rather than amidst the haze of smog or heaps of garbage brought from Western countries.
E-learning resources and courses prioritize stock photos over providing rich information about real-world situations. Why add personality and informativeness or proper references to the course when templates, stock photos, and sliding virtual pages with animation effects suffice? Keep the fields wide, the text sparse; otherwise, learners might experience information overload—God forbid!
Considering this perspective, one might assume that stock photos, being repetitive, phony, annoying, bland, and unremarkable, are probably the worst things Instructional Designers could use. They seem to assume that readers' ability to understand plain text is close to nonexistent. Consequently, they believe that text alone will make learners bored and unproductive.However, AI image generation has emerged as a new evil.
Of course, without "reinforcing emojis," it would likely never have garnered so many reactions or reposts. Without emojis, texts from reputable institutions might not celebrate their values as effectively. It's similar to the jingles in podcasts that sometimes take up to half a minute to play. Why? To set the mood? And again at the end. Or the intros on YouTube videos showing cropped moments at the beginning to capture the viewer's attention. In my opinion, this can be so distracting that you want to skip it. What's generally happening is a simplification of content, and people are participating in it willingly with their likes, reposts; they don't comment about the overuse of imagery and superficial appeals to "embrace" various values, nor do they demand higher quality informativeness of the learning materials.
Not only has AI Image generation inherited the depersonalized nature of stock photos, as AI image generators were trained based on them, but they also generate senseless, absurdist realms with a plethora of semi-abstract elements hovering somewhere intertwined with imitations of figures, shapes, and symbols, producing odious visual noise.
AI-generated images en masse often contain a multitude of glitches, distortions, and errors. Try giving AI a prompt to depict the Media Player Shuffle arrow and see what happens. If you don't notice these irritating glitches immediately, either you aren't particularly demanding when it comes to graphic imagery (which I doubt), or you've become accustomed to these glitches due to the poor quality of the prompts—'at least they tried'. Why even post these images if they only serve to contaminate the materials? If professional artists are beyond your budget, consider collaborating with emerging artists. This can provide them with valuable exposure while fulfilling your visual needs.
When instructional designers proudly post AI generated material—'Look, I have generated an image myself based on a fancy prompt!' (apparently idiotic, judging from what we often see)—they often end up with an image that goes beyond surreal; it is clumsy, yet due to a multitude of gradients and microobjects, extremely pointless in its excessiveness of what it is trying to depict. Such depictions are a huge disrespect toward the readers, but instructional designers seem to be proud and content with such an überhuman work they have managed to summon from the realm of AI.
As a stark contrast, consider some examples of textbooks that are more than a century old.
They didn't contain any images at all, but were quite informative and contained four times more exercises and nuances.
Reflecting on such materials brings to mind the good old textbooks with simple black and white sketches or even caricatures. They were so charming and appropriate that they remain dearly engraved in our memories for years.
领英推荐
In the educational materials of yesteryear, one would not find stock photographs or computer-generated visuals; instead, there was text—organized, neatly formatted, enlightening, and rich. Such text possessed its own significance and heft. Even for young learners, there were far fewer images, as authors and resource teachers of past decades were confident that templates and colorful photo bank images were not mandatory prerequisites to keep the reader engaged and focused. They knew how to work with text, making it systematic and, in many cases, humorous enough. They understood that text itself and its alignment mattered. They didn't worry that simply displaying textual data would bore or disengage the readers.
Yet nowadays' expectations, which modern instructional designers have instilled in their audiences, appears to eventually affect the final users. Whenever our trainees encounter Grandomastery for the first time—and most of our tasks are text-based, without pictograms, photographic stock (God forbid), or AI image generation—they find themselves stuck in the first few minutes. Despite having clear instructions and well-formulated questions, they initially wonder: Where are the photos, illustrations, templates, and animations? How can a text be based on just a few words or on just one abbreviation, one acronym, two abstractions, or one ideogram? How? Just a word and a question? And nothing else?
This is how bad the situation has become. The stagnation in this domain is propelled by the standards employed by instructional designers, which may stem from negligence, organizational directives, the influence of marketing teams like those at Articulate, or their own educational background. This inertia often continues even beyond the completion of certain instructional design courses and degrees. These educational programs frequently resort to uninspiring templates, overused images, and minimal text. Consequently, they produce content that is uniformly "light" yet insipid, superficial, and easily consumable, leaving little to question about alternative interpretations of what constitutes an effective educational tool. It is absurd that instructional designers are universally expected to master tools such as Articulate 360 and to conform to methods like ADDIE, without genuine consideration of whether these practices actually improve the learning environment, especially in the areas of execution and evaluation.
Yet, many designers seem reluctant to break away from the reliance on stock photos or AI-generated images and bland templates, neglecting to trust their audience's capacity to engage with substantive content beyond mere bullet points and superficial visuals.
Even Wix, when you are trying to post an article in a blog section purely dedicated to some abstract scientific notion, indicates that an image is almost half of the post's SEO success.
A serious reader who is keen on this term or concept won’t care about the image above the article; they may even be insulted by an irrelevant image, for it does nothing but question their ability to visualize for themselves. It challenges their competencies in abstract thinking, especially if it is redundant. However, companies insist on its inclusion.
You may also know that in Cambridge exams, for example, as well as in many other international exams, candidates are required to compare and contrast stock photo images, which is rather antihumanistic on the part of the examination centers and publishing houses.
You may say that this is the layman's viewpoint and that images are used to illustrate and grab attention (what are we, crows or magpies?), but the point is exactly this: I started the article with the idea of dilettante's advice. Dilettante means, by the way, 'an admirer of fine art, literature, science, etc., one who cultivates art or literature casually and for amusement.' Art, not AI generated images. Art, not hackneyed stock photos. And I don't mind fitting sketches, illustrations, paintings, artsy and heartfelt photography featuring realistic environments. I have nothing against the artworks that have authors, names, description, meta data and initial designation or the story behind it.
The disregard for chosen images or a careless attitude toward their informational value and significance is what infuriates me when I encounter yet another overused stock photo, likely among the top five search results on such services. Even the once ubiquitous clip art from Microsoft Word 97 wasn't as flagitious, since those pictograms bordered on symbols, and at least the brain perceived them as standardized signs.
Come to think of it, the entire photobank industry is one of the oddest emanations of commerce that shouldn't even exist in the first place. It is not art and never will be, especially considering that many illustrators and graphic designers are not being contracted and continue to seek the cheapest orders on platforms like Upwork or Fiverr. The same applies to stock videos—8K quality of beaten and often pointless slow-motion scenes, such as a man counting money on a calculator while yet another video blogger is mentioning expenses, or a man in a doctor's coat checking a scan while the vlogger is narrating about healthcare. And one would watch such instructional videos or even travel shows and think: do they really consider us such idiots that without these sterile, supposititious, depersonalized, non-informative, clichéd triggers we would be unable to grasp what they are talking about? You might say they are doing this to create a colorful video range—otherwise, what would they fill the video with? With a talking head? Well then, don't make a video at all if half the footage comes from stock videos, introducing people from completely different cultures and staged studio-lit environments.
And the term 'talking head'? It's a crystallized, simplified perception of what people see, assuming they are unable to just listen. The term implies that the audience by default is suppsosed to need more than just a close-up of a video presenter —and surely photo or video stocks and hackneyed templates are the only solutions, right? Instead of redirecting attention back to text, nuances, hermeneutics, mimics, acting, annunciation, pauses, and reading between the lines, this approach dumbs down both the creators and their audience. Is that what ADDIE is about?
Why hasn't it become clear to so many instructional designers that stock photos are the weakest tools for making a point or a statement? Due to their stylistic similarity, consistencyб unnaturalness, and professionally blurred backgrounds, our brains easily recognize them as stock photos, especially when they lack references or inscriptions below. The brain realizes that if it is a stock photo, it was made beforehand to serve as a generic image for a wide variety of occasions. And this is when our brain realizes that it has been tricked into initially believing the image was an illustration created specifically for the material, but it is just a filler of empty space to avoid overwhelming the reader with an "unbearable" load of information. More than ten lexical units per lesson? No way!
I remember my cousin asking me not to show a cartoon to his 3-year-old daughter because it contained animation, and they preferred to show her texts and images to maintain her interest in static content for as long as possible before allowing her to watch cartoons. At first, I thought it didn't make much sense, but actually, it seems that with this obligatory overabundance of dynamic content, animated awards, artificial gratifications, badges, and useless images that are supposed to be supportive, many of us lose the ability to deal with text-based tasks alone.
Whenever I give students, for example, the Grandomastery Random Saying task (which is exclusively text-based), they get completely stuck at first. They ask, 'So, what do I have to do with it?' While with images, they seem to engage much faster, though they often provide arbitrary responses based on the images rather than the text-based task itself, ignoring the actual questions.
However, trainees who receive solely text-based tasks blankly look at them and ask, 'What do I have to do with this?' I respond, 'Please, read the task.' Then they say, 'Oh, but where?'
In my opinion, many people nowadays seem to be unable to work with text, uncapable of analyzing etymology, prefixes, roots, and suffixes, or reading between the lines. many do not even know the origins of their own name. They are hopeless at hermeneutics. At the same time, they may be equally bad at image analysis. They cannot read sfumato, chiaroscuro, composition, allusions, symbolism, or archetypes because they consider any combination of symbols or imagery as something accessory, not as the crux of the matter itself. They are unable to view a simple word combination or an image as a distinct self-sufficient task. They are unable to make sense of it or even view it as something worth considering. This dumbing down, I suspect, has to do with a decade or two of instructional design materials and quizzes filled with primitive stock photos and over-instructed, simplified text.
As an instructional designer, I often receive numerous offers from companies seeking professionals with expertise in Articulate 360 and their extensive collection of noname stock photos. This article serves as a testament that I will never work for a company that relies on stock photos or accepts works with AI-generated images, a company that does not hire decent illustrators and artists.
Please do not consider my candidacy unless your company is willing to invest at least $30 in a skilled illustrator to produce a few quality sketches, because a few is usually more than enough, so why don't you go, "embrace" yourselves! ;)
The invention of written language dates back to ancient civilizations around 3400 to 3000 BCE. I can assure you, people are and can be quite comfortable with reading text without any additional noname visual support, especially when the visuals are included solely for the sake of "support" and "visual appeal".
If this fixation on superficial visual appeal persists, one must question the quality of knowledge we're imparting. In the future, will it be genuine intellectual substance like a century ago, or merely shallow marketing jargon and empty buzzwords passed down to future generations?
As an artist and former teacher who is working on specialising in instructional design, seeing AI being heavily pushed in education now is both amusing and worrying for me. I wonder how far this reliance on AI will/can go for education, because while I love technology, losing the human touch in a humanities field is a bit unsettling. It seems I'm not alone in feeling this way. Thanks for sharing your thoughts!