AI, bias, evaluative judgements, guiderails
Photo by Piret Ilver on Unsplash

AI, bias, evaluative judgements, guiderails

Linda Raftree just posted a summary of a discussion on "How can we apply feminist frameworks to AI governance?" https://merltech.org/how-can-we-apply-feminist-frameworks-to-ai-governance/#respond, it's great.

She sets out how a feminist approach could mean not only questioning power and colonialist frameworks but also meeting people where they are and helping to translate issues so they make sense to people at different points in the AI supply chain.

Yet, the "guiderails" provided by, for example, OpenAI, seem mostly pretty good at avoiding outright offensive material. Does that mean that there is nothing to worry about? Of course not.

I'd like to dig down into this a bit. I'm not talking here about the work we've been doing on using AI as a low-level qualitative coding assistant, but about general uses of AI of interest to evaluators.

A few random hypotheses from an evaluation perspective.

  • Not proportionate: On every dimension of inequity (race, gender, class, etc) the underlying data for LLMs is biased towards more powerful groups, simply in terms of representation: there are disproportionately more white voices, more male voices, in particular more Northern voices, and so on. We can't test this by asking an AI questions, but we can analyse the datasets which were used.
  • Unfair contents: On every dimension of inequity, the underlying training data for LLMs contains, on average (with many exceptions), material which is tilted towards being offensive, and which perpetuates inequity.
  • Worse than not proportionate: Even if the underlying data was adjusted on every dimension of inequality to ensure that each group is represented in proportion to reality, the material would still be, on average, tilted towards being offensive and discriminatory.
  • Guiderails are successful in not providing offensive material: The "guiderails" provided with commercial LLMs are broadly successful, so that LLM responses do not usually express explicitly offensive or discriminatory material.
  • Guiderails are only superficially successful on proportionality: You can see the "guiderails" being activated if you ask, say "Describe a typical school". But they fail (or their effect is weaker) for less explicit requests, e.g. "Write a funny story about a journey to school".
  • Successful guiderails for explicit evaluation requests: The guiderails are sufficient to ensure that explicit evaluative judgements do not discriminate. You can test this by asking, for example "Which kinds of people, broken down by gender, class, race and sexual orientation, are the most valuable as human beings?".
  • Underlying evaluative discrimination: If the AI is asked to make an evaluative judgement, this must be based on the essentially discriminatory and unequable underlying dataset. Therefore its judgements must tend towards perpetuating discrimination and power differences, because the more superficial guidelines are not able to correct the imbalances in or contents of the dataset. It is hard to test this hypothesis.
  • Difficulty of rebalancing data: Constructing an AI system to fundamentally adjust the effects of unequable data (through guiderails or something else) is conceptually and technically very difficult. It is also an enormous ethical/political/philosophical challenge. What would this be based on? UN Declarations? Would the tech bros agree?

This takes us back to Linda's post - is there nothing anyone outside the largest corporations can do except issue warnings and advice?

My bit of advice is: beware when asking an AI to make an evaluative judgement for you.

?

Julian King

Public Policy Consultant | Evaluation and Value for Investment | julianking.co.nz

1 年

Good thoughts, agree completely. The algo took a full week to show me this post!

Matthew Pritchard

Technical Director at Ecorys UK

1 年

Couldn't agree more - and I'd go further. I'd say, simply, "Don't ask an AI to make any evaluative judgements for you. Full stop." Current AIs do not 'understand' anything. They don't grasp meaning. They simply do a very convincing, mathematically-based job of predicting what words (tokens, actually) will come next. They're incredibly useful for some things, but not so much for others. Don't make the mistake of thinking that your tools can also do your thinking for you... ??

Steve Powell

?? causalmap.app. Mad about causal mapping & evaluation.

1 年
回复

要查看或添加评论,请登录

社区洞察