登录查看更多内容

Those undeniable algorithmic harms - Part One.

Meenakshi (Meena) Das

CEO at NamasteData.org | Advancing Human-Centric Data & AI Equity

发布日期: 2022年8月11日

Welcome to Data Uncollected, a newsletter designed to enable nonprofits to listen, think, reflect, and talk about data we missed and are yet to collect. In this newsletter, we will talk about everything the raw data is capable of – from simple strategies of building equity into research+analytics processes to how we can make a better community through purpose-driven analysis.

Last week I shared with you how my work and research around AI is shaping the next workshop I am working on, i.e., 'towards human-centric AI'. That informed the 7 tenets you and I reflected in the 27th edition. Today I want to unpack more, though differently, about the same topic.?

Through some hypothetical and real-life examples, today, you and I will hold a symbolic microscope to understand what biases look like in algorithms.?

Why this topic? Well, partly because I am challenged with finding creative ways to squeeze all of it into workshop materials. But primarily because I realize the word "data" inaccurately holds too much power in this day.?That "data" alone is neither a sole problem nor an entire solution. Here is a better articulation of what I mean – in the words of Jer Thorpe, "…data is never perfect, never truly objective, never real. Data is, after all, not a heartbeat or a hippo sighting, or a river's temperature. It is a measurement of a heartbeat. A document of hippo's sighting. A record of the river's temperature."

It is you and me, our choices, decisions, and intentions around data, that truly give the privilege to that word ("data") it enjoys. It is, therefore, even more essential to understand what algorithms mean and where they potentially can and do create harm. And that's what you and I will do today.

Let's start with a hypothetical example. Next week, we will see real-life examples of these algorithmic biases and harms.

Imagine a classroom full of 80 students with their class teacher. The students sit in 8 rows, with 10 students in each row. The students come from all different backgrounds (in terms of race, ethnicity, nationality, and immigration status). The teacher is currently working on assigning a class monitor for the entire year (i.e., someone who will have the responsibility and authority to supervise, manage, and guide students when the teacher is unavailable. The class monitor is also the class representative for the entire school and its official events).

Several scenarios can play in this assignment:

1.????The teacher picks their favorite student.

2.????The teacher decides based on their flowchart:

?(image description: a flowchart used in an hypothetical situation to determine who should be nominated for class monitor)

3.????The teacher collects minimal demographic background information on each student (like race, ethnicity, and nationality) to find interview names representing students from various communities.

4.????The teacher can make an educated guess by observing their class based on popularity, influence, extrovert nature, etc.

5.????The teacher asks each student to nominate one person from the class (except themselves) to tally and create an open-for-all-to-vote list.

6.????The teacher goes around the school to interview their colleagues, extra-curricular committees, and other departments to evaluate the best and the brightest names for the final interview.

7.????The teacher asks their school/department leadership to pick a class from which the teacher can borrow the selection procedure of a class monitor.

More scenarios are possible here, but do you see potential issues in these? In all of them, the teacher picks someone they already know or rely on others for names others know already.

Vincent Granville 1 年前

Article 1 - The Predicament of Predictors

Rommel Sharma 3 个月前

How Important is that Machine Learning Model be…

Gregory Piatetsky-Shapiro 6 年前

What if there are students who were not given/were aware of opportunities to shine? What if the ways of deciding winners and underlying tests always favored a particular group of students – extroverts coming from a stable, wealthy family, having access to various learning resources already?
What if most students are friends with native English speakers who are baseball and soccer lovers? What if popularity and influence are defined by only a few most frequently seen/prioritized factors?
What if the past class monitor selection procedures were flawed? By asking the school/department for past working procedures, wouldn't the exclusions/biases of the past continue as well?

A truth we fail to acknowledge formally is that humans like to use our cognitive and emotional influence in decision-making to justify and satisfy the logic we understand, mainly to safeguard the perceptions we hold about ourselves.

All those above scenarios bring and create biases. And should we use these scenarios to design an algorithm to recruit the next class monitor, we will also feed biases into the algorithm.

Now, assume we did build such an algorithm (a.k.a. a model) that takes the data about the students and recommends three top names for class monitor selection. Let us track where are the sources of biases in this algorithm:

1.????Say, if 90% of the students are native English speakers, White, and come from wealthy families, the algorithm will likely pick recommendations that fit that description. (This is selection/sampling bias)

2.????Say, if 70% of the students are extroverts, participating in and winning all sorts of group activities around the school, the algorithm will likely favor those traits and habits for a successful class monitor. (This is measurement bias)

3.????If 5% of the students are most voted "popular choice", the algorithm picks on this trend and places weight on it. Imagine someone using force and fear to win that "popular choice" label. Or an uneven availability of opportunities amongst students (e.g., male vs. female students) led to them becoming so popular. (This is representation bias)

4.????Imagine all those data points that may have gone uncollected about the students, especially if the algorithm is not appropriately designed to factor in the experiences of international and local/domestic students. (This is omitted variable bias)

5.????Take that chart above (where the teacher decides on interviewees). Imagine if that comes from a decision-tree algorithm (i.e., an algorithm that creates final output depending on the outcomes of yes/no decisions, like a tree). That algorithm would then be flawed. What if 70% are native English speakers, but there is someone far better suited to be a class monitor in the remaining 30% of the students? (This is algorithm-design bias)

6.????Say the algorithm (or a flawed algorithm in this case) produces three recommended names (holding all the biases coming from the data and the design). All the department/school leadership and other teachers will likely start interacting with these 3 recommended students more. In the next cycle, when the recommendations from this algorithm are needed again, an updated dataset will bring back the external teachers' "favored" and "popular" candidates. (This is social bias)

Can we do something about this? Yes, we can.?

Can we build this algorithm better? We can and we must!

In one of the near-future editions, you and I will explore how these biases are managed and mitigated in the algorithms.

*********************************

You and I live in a time when algorithms all around us want to collect data constantly and continuously about us, through us, claiming it is for us. As much as the recommendations, auto-completions, and predicted values here and there offer a personalized experience, we need to remember that algorithms are not magic. They are human-designed codes intended to learn from the repeatable behaviors and patterns of the world -?our complex and imperfect world. And that is all the more reason for you and me to learn about the nuances of these algorithms while we learn to develop a collaborative relationship with them (as we laid out the 7 tenets of "human-centric" AI last week).

***?So, what do I want from you today (my readers)?

Today, I want you to share your thoughts on biases in algorithms. Do you think they exist? To what extent do you think they can cause harm?

*** Here is the continuous prompt for us to keep the?list of community-centric data principles?alive.

data uncollected

3,547 位关注者

Grayson Bass

Imagine. Innovate. Build. I solve complex problems and unlock #disruptive #innovation through compassion. Academic, Industry, and Government experience in #northamerica #uae #europe #latinamerica #africa #asia

2 年

Thank you for sharing this! Something we wrestle with is what happens to “fringe” communities/individuals where an/the algorithm for sorting will ALWAYS keep people out. It is part of a dystopian version of the future where without tech and without human intervention (like getting stuck on hold with Air Canada) limits your ability to even get basic rights…we need more humans (who can work with AI “partners”)

2 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

Those undeniable algorithmic harms - Part One.

Meenakshi (Meena) Das

CEO at NamasteData.org | Advancing Human-Centric Data & AI Equity

领英推荐

data uncollected

3,547 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

Should you let an all-knowing algorithm vote on your behalf? No, you shouldn't.

I'll be back.

Prospect 33 Global Data Lab

P33 Global Data Lab spotlights Julio

Gini index for ML (Performance measurement and many more..)

The Language of Statistics

Navigating the Enigmatic World of Machine Learning with Regularization

Blind ML and the Elephant

Algorithms: What are they? Why do they matter? How do they work?

Decision Tree in Machine Learning

领英推荐

data uncollected

3,547 位关注者

Can AI cause Tech Trauma? – Part One

2024年10月25日

Reclaiming control with AI Activism.

2024年9月27日

AI-Ready Nonprofits Need AI-Ready Leadership.

2024年8月29日

The non-extractive version of AI.

2024年7月26日

AI can replace my job, yes.

2024年5月24日

Justice is Expensive.

2024年4月25日

Care, Imagination, and Responsibility.

2024年3月21日

Democratizing AI is not equal to Democratizing Equity.

2024年2月21日

Edition 58: Decolonizing Data Will Fail Unless…

2024年1月11日

Making we, us, & our inclusive.

2023年12月11日

社区洞察

其他会员也浏览了

Should you let an all-knowing algorithm vote on your behalf? No, you shouldn't.

I'll be back.

Prospect 33 Global Data Lab

P33 Global Data Lab spotlights Julio

Gini index for ML (Performance measurement and many more..)

The Language of Statistics

Navigating the Enigmatic World of Machine Learning with Regularization

Blind ML and the Elephant

Algorithms: What are they? Why do they matter? How do they work?

Decision Tree in Machine Learning