The Secret Life of Data Labelers
Credits: Deeplearning AI

The Secret Life of Data Labelers

The business of supplying labeled data for building AI systems is a global industry. But the people who do the labeling face challenges that impinge on the quality of both their work and their lives.

What’s new:?The Verge?interviewed more than two dozen data annotators,?revealing?a difficult, precarious gig economy. Workers often find themselves jaded by low pay, uncertain schedules, escalating complexity, and deep secrecy about what they’re doing and why.

How it works:?Companies that provide labeling services including Centaur Labs, Surge AI, and Remotasks (a division of data supplier Scale AI) use automated systems to manage gig workers worldwide. Workers undergo qualification exams, training, and performance monitoring to perform tasks like drawing bounding boxes, classifying sentiments expressed by social media posts, evaluating video clips for sexual content, sorting credit-card transactions, rating chatbot responses, and uploading selfies of various facial expressions.

  • The pay scale varies widely, depending on the worker’s location and the task assigned, from $1 per hour in Kenya to $25 per hour or more in the U.S. Some tasks that require specialized knowledge, sound judgment, and/or extensive labor can pay up to $300 per task.
  • To protect their clients’ trade secrets, employers dole out assignments without identifying the client, application, or function. Workers don’t know the purpose of the labels they’re called upon to produce, and they’re warned against talking about their work.
  • The assignments often begin with ambiguous instructions. They may call for, say, labeling actual clothing that might be worn by a human being, so clothes in a photo of a toy doll or a cartoon drawing clearly don’t qualify. But do images of clothing reflected in a mirror? And does a suit of armor count as clothing? How about swimming fins? As developers iterate on their models, rules that govern how the data should be labeled become more elaborate, forcing labelers to keep in mind a growing variety of exceptions and special cases. Workers who make too many mistakes may lose the gig.
  • Work schedules are sporadic and unpredictable. Workers don’t know when the next assignment will arise or how long it will last, whether the next gig will be interesting or soul-crushing, or whether it will pay well or poorly. Such uncertainty — and differential between their wages and their employers’ revenue as reported in the press — can leave workers demoralized.
  • Many labelers manage the stress by gathering in clandestine groups on WhatsApp to share information and seek advice about how to find good gigs and avoid undesirable ones. There, they learn tricks like using existing AI models to do the work, connecting through proxy servers to disguise their locations and maintaining multiple accounts as a hedge against suspension for getting caught breaking rules.

What they’re saying:?“AI doesn’t replace work. But it does change how work is organized.” —Erik Duhaime, CEO, Centaur Labs

Behind the news:?Stanford computer scientist Fei-Fei Li was an early pioneer in crowdsourcing data annotations. In 2007, she led a team at Princeton to scale the number of images used to train an image recognizer from tens of thousands to millions. To get the work done, the team hired thousands of workers via Amazon’s Mechanical Turk platform. The result was ImageNet, a key computer vision dataset.

Why it matters:?Developing high-performance AI systems depends on accurately annotated data. Yet the harsh economics of annotating at scale encourages service providers to automate the work and workers to either cut corners or drop out. Notwithstanding recent improvements — for instance, Google?raised?its base wage for contractors who evaluate search results and ads to $15 per hour — everyone would benefit from treating data annotation less like gig work and more like a profession.

References:

  1. https://www.theverge.com/features/23764584/ai-artificial-intelligence-data-notation-labor-scale-surge-remotasks-openai-chatbots
  2. https://www.deeplearning.ai/the-batch/google-contractors-get-a-raise/

要查看或添加评论,请登录

Apurv Sibal的更多文章

  • Cloud Computing Goes Generative

    Cloud Computing Goes Generative

    Amazon aims to make it easier for its cloud computing customers to build applications that take advantage of generative…

    2 条评论
  • Optimizer Without Hyperparameters

    Optimizer Without Hyperparameters

    During training, a neural network usually updates its weights according to an optimizer that’s tuned using hand-picked…

    2 条评论
  • What Venture Investors Want

    What Venture Investors Want

    This year’s crop of hot startups shows that generative AI isn’t the only game in town. What’s new: CB Insights, which…

    4 条评论
  • Sample-Efficient Training for Robots

    Sample-Efficient Training for Robots

    Training an agent that controls a robot arm to perform a task — say, opening a door — that involves a sequence of…

    1 条评论
  • Language Models’ Impact on Jobs

    Language Models’ Impact on Jobs

    Telemarketers and college professors are most likely to find their jobs changing due to advances in language modeling…

    1 条评论
  • AI & Banking: Progress Report

    AI & Banking: Progress Report

    One bank towers above the competition when it comes to AI, a recent study suggests. What’s new: A report from market…

    1 条评论
  • Stable Biases

    Stable Biases

    Stable Diffusion may amplify biases in its training data in ways that promote deeply ingrained social stereotypes…

    1 条评论
  • Bug Finder

    Bug Finder

    One challenge to making online education available worldwide is evaluating an immense volume of student work…

    1 条评论
  • Letting Chatbots See Your Data

    Letting Chatbots See Your Data

    A new coding framework lets you pipe your own data into large language models. What’s new: LlamaIndex streamlines the…

    1 条评论
  • Making Government Multilingual

    Making Government Multilingual

    An app is bridging the language gap between the Indian government and its citizens, who speak a wide variety of…

    2 条评论

社区洞察

其他会员也浏览了