Stable Biases
Credits: Deeplearning AI

Stable Biases

Stable Diffusion may amplify biases in its training data in ways that promote deeply ingrained social stereotypes.

What's new:?The popular text-to-image generator from Stability.ai tends to underrepresent women in images of prestigious occupations and overrepresent darker-skinned people in images of low-wage workers and criminals,?Bloomberg?reported.?

How it works:?Stable Diffusion was pretrained on?five billion text-image pairs?scraped from the web. The reporters prompted the model to generate 300 face images each of workers in 14 professions, seven of them stereotypically “high-paying” (such as lawyer, doctor, and engineer) and seven considered “low-paying” (such as janitor, fast-food worker, and teacher). They also generated images for three negative keywords: “inmate,” “drug dealer,” and “terrorist.” They analyzed the skin color and gender of the resulting images.

  • The reporters averaged the color of pixels that represent skin in each image. They grouped the average color in six categories according to a?scale?used by dermatologists. Three categories represented lighter-skinned people, while the other three represented darker-skinned people.?
  • To analyze gender, they manually classified the perceived gender of each image’s subject as “man,” “woman,” or “ambiguous.”
  • They compared the results to United States Bureau of Labor Statistics?data?that details each profession’s racial composition and gender balance.

Results:?Stable Diffusion’s output aligned with social stereotypes but not with real-world data.?

  • The model generated a higher proportion of women than the U.S. national percentage in four occupations, all of them “low-paying” (cashier, dishwasher, housekeeper, and social worker).?
  • For instance, Stable Diffusion portrayed women as “doctors” in 7 percent of images and as “judges” in 3 percent. In fact, women represent 39 percent of U.S. doctors and 34 percent of U.S. judges. Only one generated image of an “engineer” depicted a woman, while women represent 14 percent of U.S. engineers. (Of course, the U.S. percentages likely don’t match those in other countries or the world as a whole.)
  • More than 80 percent of Stable Diffusion’s images of inmates and more than half of its images of drug dealers matched the three darkest skin tone categories. Images of “terrorists” frequently showed stereotypically Muslim features including beards and head coverings.?
  • The authors point out that skin color does not equate to race or ethnicity, so comparisons between color and real-world demographic data are not valid.

Behind the news:?Image generators have been found to reproduce and often amplify biases in their training data.?

  • In March 2023, researchers at Leipzig University and HuggingFace?found?that both DALL?E 2 and Stable Diffusion tended to overrepresent men relative to the U.S. workforce. (The previous July, OpenAI had?reported?that it was addressing issues of this sort.)
  • Pulse, a model designed to sharpen blurry images,?caused?controversy in 2020 when it transformed a pixelated headshot of former U.S. president Barack Obama, who is black, into a face of a white man. More recently, users of the Lensa photo editor app, which is powered by Stable Diffusion,?reported?that it sexualized images of women.
  • In 2020, after studies showed that ImageNet contained many images with sexist, racist, or hateful labels, the team that manages the dataset?updated?it to eliminate hateful tags and include more diverse images. Later that year, the team behind the dataset TinyImages?withdrew?it amid reports that it was rife with similar issues.

Why it matters:?Not long ago, the fact that image generators reflect and possibly amplify biases in their training data was mostly academic. Now, because a variety of software products integrate them, such biases can leach into products as diverse as video games, marketing copy, and law-enforcement profiles.

References:

  1. https://www.bloomberg.com/graphics/2023-generative-ai-bias/
  2. https://www.deeplearning.ai/the-batch/the-story-of-laion-the-dataset-behind-stable-diffusion/
  3. https://dermnetnz.org/topics/skin-phototype
  4. https://www.bls.gov/oes/data.htm
  5. https://arxiv.org/pdf/2303.11408.pdf
  6. https://openai.com/blog/reducing-bias-and-improving-safety-in-dall-e-2

Apurv Sibal

Passionate about building AGI and leveraging it to solve hard problems

1 年

While it’s important to minimize bias in our datasets and trained models, it’s equally important to use our models in ways that support fairness and justice. For instance, a judge who weighs individual factors in decisions about how to punish a wrongdoer may be better qualified to decide than a model that simply reflects demographic trends in criminal justice.

回复

要查看或添加评论,请登录

Apurv Sibal的更多文章

  • Cloud Computing Goes Generative

    Cloud Computing Goes Generative

    Amazon aims to make it easier for its cloud computing customers to build applications that take advantage of generative…

    2 条评论
  • Optimizer Without Hyperparameters

    Optimizer Without Hyperparameters

    During training, a neural network usually updates its weights according to an optimizer that’s tuned using hand-picked…

    2 条评论
  • What Venture Investors Want

    What Venture Investors Want

    This year’s crop of hot startups shows that generative AI isn’t the only game in town. What’s new: CB Insights, which…

    4 条评论
  • Sample-Efficient Training for Robots

    Sample-Efficient Training for Robots

    Training an agent that controls a robot arm to perform a task — say, opening a door — that involves a sequence of…

    1 条评论
  • Language Models’ Impact on Jobs

    Language Models’ Impact on Jobs

    Telemarketers and college professors are most likely to find their jobs changing due to advances in language modeling…

    1 条评论
  • AI & Banking: Progress Report

    AI & Banking: Progress Report

    One bank towers above the competition when it comes to AI, a recent study suggests. What’s new: A report from market…

    1 条评论
  • The Secret Life of Data Labelers

    The Secret Life of Data Labelers

    The business of supplying labeled data for building AI systems is a global industry. But the people who do the labeling…

  • Bug Finder

    Bug Finder

    One challenge to making online education available worldwide is evaluating an immense volume of student work…

    1 条评论
  • Letting Chatbots See Your Data

    Letting Chatbots See Your Data

    A new coding framework lets you pipe your own data into large language models. What’s new: LlamaIndex streamlines the…

    1 条评论
  • Making Government Multilingual

    Making Government Multilingual

    An app is bridging the language gap between the Indian government and its citizens, who speak a wide variety of…

    2 条评论

社区洞察

其他会员也浏览了