What does AI Fairness look like in the world of GPT-4?
A lightning fast take on AI Fairness and Generative AI, with special guest Joaquin Qui?onero Candela . Joaquin is currently a Technical Fellow at LinkedIn , focusing on AI technology and its responsible use. He holds other positions focused on societal impacts of AI, on the Technology and Public Purpose Project at the Harvard Kennedy School Belfer Center for Science and International Affairs, on the Spanish Government’s Advisory Board on AI, and until recently on the Board of Directors of Partnership on AI . Previously Joaquin held senior positions focused on Responsible AI, Applied Machine Learning, and Research at Facebook and 微软 .?
Let’s do this ? These views are my own, not those of where I work.?
The (shared) take
When it comes to AI fairness, measuring bias in algorithms after the fact is not enough, and almost always just measures underlying systemic issues that show up in the data. Often, products can be made more equitable by building affordances outside of AI as well, through improving access or encouraging behavior change when bias is detected. Generative AI is going to take the bias that was often hidden in ranking systems and put it front and center for consumers, which will be good for bringing awareness to fairness in AI. Given that Generative AI enables fairness issues to be more easily tackled head on through prompt engineering (versus augmenting training data), there should be a standard for doing this across Generative AI products.?
More info?
The Interview?
Keren: Can you tell me about the area of AI fairness, Joaquin? When did it start and what are some key developments? Then we can talk about how AI fairness applies to generative AI.
Joaquin: Sure. I've been part of the machine learning community since around 2000 when I started attending the main machine learning conference called Neural Information Processing Systems (NeurIPS). Fairness in AI started to gain attention around 2014 with the introduction of the workshop Fairness Accountability and Transparency. In the beginning, only a couple dozen people attended, but over time, the topic gained traction.?
There's an interesting graph I can share with you that shows a bar chart over time indicating that up until 2014 or 2015, there was little interest in fairness in machine learning, but the topic has become more prominent since then.
One event that brought algorithmic fairness to public attention was a study conducted by the civil society organization ProPublica in 2016, called "Machine Bias". The study analyzed the use of machine learning in the criminal justice system and found that algorithms used to help judges make bail or jail decisions were biased by race.
To be clear, It's important to note that before addressing AI bias, we need to address human bias. Machines could potentially be more robust and consistent than humans, but we need to be mindful of human biases. The idea is that machines would be much more robust or resilient than humans to things like recency bias or whether you had changes in your blood sugar levels. There was an interesting study in Israel, where they analyzed judge decisions before and after lunch and found that judges were were a lot more lenient after lunch than before lunch. We have reason to believe that machines make decisions consistently, and you can interrogate why they made that decision and they can consume any amount of data. But human bias lives in that data.
Keren: I remember when I first heard about algorithmic bias based on race. I heard a talk at my first Grace Hopper conference back in 2016 by Dr. Latanya Sweeney. She talked about her study focused on how Black-sounding names were more likely to yield search result ads for criminal records.?
Joaquin: Yes I remember that one too. Another one I’m thinking of was the computer vision work by Joy Buolamwini where face detection, emotion recognition, and gender recognition showed biases based on skin tone, gender, and age. It worked best for middle aged white men and much worse for older or younger Black women. You might think, we need more training data, right? We need the data to be balanced.?
Going back to the case in criminal justice, the data looked at who was more likely to have recidivated or not. It turned out the algorithm would over predict how likely recidivism is when you are Black and under predict when you are white. Therefore the algorithm is biased. Then another group of scientists at Stanford and Sam Corbett-Davies and co-authors from Stanford used the same data and cut it a different way to group people by score. All the people who got a risk score of 40% or a risk score of 60%, for example. They found that when grouped by score the data doesn’t look biased based on race.?
I asked myself, how can that be? I have two ways of looking at the same data where one is biased and another isn’t. I dug into the academic literature on this, where Jon Kleinberg, Sendhil Mullainathan and Manish Raghavan wrote an amazing paper called Inherent Tradeoffs in the Fair Determination of Risk Scores. They basically showed an impossibility - that you cannot both have bias and not have bias. At this same time I started to talk to more social sciences and philosophers because I wanted to understand the deeper question here. And the reason I'm pausing here is because you can go and ask yourself “Why would the data show that a Black person is more likely to commit a crime?” Is the actual problem that there is more policing in Black neighborhoods, or a systematic marginalization, and disadvantage in certain groups, right? But the AI doesn't know all of that, right? It just takes the data that is given to it.
From the data you’re given, you’re going to have multiple definitions of fairness that are mutually incompatible, right? There is a great picture that shows three different sized people trying to watch a baseball game over a fence. If you give them all the same size crate, still one of them can’t see the game. Is that fair??
Keren: This is really the difference between equality and equity, right??
Joaquin: Right, that's exactly right. When you look at the math, and when you look at the AI fairness community from a computer science perspective, and you look at all of the metrics of algorithmic bias that they are proposing you can see that they kind of divide into two. They divide into the camp that are closer to equal treatment and the camp that believes in making the system more equitable.?
Keren: I want to see if we can start to get to a shared take here, since that’s what I like to do with my guests. So how should we think about AI in terms of equity and equality? One of my product philosophies is that we are affecting outcomes no matter what. So any choices that we make or choices we choose not to make in how we build our product matter. Ultimately our opinion then affects the outcome for a user, where every decision you make could affect behavior and behavior change. So where does that leave us??
Joaquin: 100% that is true. When we put AI in a product we have to think about the two things. Using the criminal justice example, if I have a scoring algorithm that is trying to predict the risk level of a defendant to commit another crime if they are released, the most basic thing I can ask the algorithm for is equal treatment. I can ask for scoring to be equally good or equally accurate for everybody. But when you put that into the context of a whole system, that might still mean we have a higher incarceration rate for defendants of color when you look at factors such as rate of policing and historical context. Sometimes it can predict even more minor things like the rate of being able to show up for court appointments which some people may be more able to do than others if there is economic hardship. In that case, the outcomes are not still equitable, right? We’re still incarcerating way more people from a certain group. Then the question here is how much of this is AI and how much of this is the broader system where we need to rethink policing and other aspects of systemic racial bias.?
领英推荐
Imagine we’re assessing whether the ranking algorithm that returns qualified candidates for a particular job opening, and that we find the predictions treat everyone the same, for example that they’re equally accurate for female and male candidates, but you find that males are still overrepresented compared to females in your list of results. The next question would be - in the real world, what percentage of women versus men would be qualified for this job? Imagine you find that for historical reasons, like we see for computer science jobs as an example, in the real world there are more male qualified candidates than females. What should we do? You could consider an intervention at the level of the AI algorithm, where you’d boost the predictions for females in order to increase their representation. But that would be problematic in several ways. First, you would no longer be assessing a candidate’s qualifications based pure merit and independent of gender. Second, your adjustment might help in the short term, but that ratio may need to be updated over time, and it doesn’t quite feel satisfactory because the solution is limited to the algorithm. A better solution is to help the recruiters using the product understand what is happening with their results. We can suggest that they broaden their search to get a more diverse pool of candidates.?
My point is that we shouldn’t burden AI with things that are not inherently an AI problem, and we shouldn’t ask AI practitioners to make decisions that have a societal scope way beyond AI. Let’s zoom out and think about the options in the holistic product experience that could help here, whether through additional features or other product decisions. The framework should be that AI fairness amounts to equal AI treatment plus product equity.?
Keren: Yeah, I'm always fascinated by problems that require you to think about second and third order impacts in order to make a decision. How do you actually do that? This is a great example of how one might think about equity more globally than locally. How do we make things more equal from a gender perspective? Instead of thinking about fairness in an algorithm after the fact, what if we started with what outcome are we trying to achieve for these users??
Joaquin: Yeah, that's exactly right. And I think that's a good way to put it - locally and globally. So, in this case locally would be just the AI component itself and globally would be how does the product end to end?
Keren: I was actually thinking even bigger, right? Globally, what is the life outcome that this user wants? If we are trying to create gender parity so that women can have the same economic opportunity as men, for example, let’s start with that as a goal. If that is our goal then we will ask ourselves different questions. What does the product look like with this goal? What fairness elements do we need to be tracking and understanding and fixing in our algorithms starting from that outcome? I think people start from – we have this algorithm, is it biased or not? But what if you started from the other end, with what outcome you are trying to achieve and work backwards from there?
Joaquin: Yeah. Hundred percent. From there - what things would I be predicting and what should be the goal of my product in the first place? And then everything propagates from there.?
Keren: Switching gears slightly, what changes now with Generative AI? What fairness issues should we be discussing? And how do we approach fairness when LLMs are trained on data that includes all historical biases from the past, and yet we want to build a different future??
Joaquin: I often say that responsible AI or AI fairness is not primarily an AI issue. And I think that that extends to building products. If you're building an AI powered product, you shouldn't lose sight of what the ultimate goal is. And I'd like to share two examples.?
One is an interesting example, many years old, where with very good intent the city of Boston had a problem of how to prioritize what road surfaces to fix. There's potholes and some roads are crappy and some are not and there is no way to map it out. But a few years ago, once every phone had a gyroscope and an accelerometer, the idea was that when people hit a bump or pothole, the phone would shake. A team built it so they could make it available to the local government with good social intent. But then when they saw the data they realized that there were entire neighborhoods that weren’t covered, and only affluent neighborhoods were covered. The reason was that they only build for iPhones, which are much more expensive and more likely to be the phone of choice in affluent neighborhoods.?
Any neighborhood with mostly Android data just wouldn’t exist. This is a great example where you might think you are solving a problem but if you don’t zoom out and you don’t understand the context in which the technology is being deployed, you might not get the result you were after. Whenever you are building a product where you are focused on algorithmic fairness, you first have to step back and think about access. I’ve seen teams of engineers at a past company that all have iPhones and great internet connections. And when my boss at the time went on a vacation to a place with poor connection, he realized that the product didn’t work at all. If we did have that context, we could have solved it and built the product differently.?
Keren: You just made me have a nightmare thinking about all the real-time OpenAI calls these new products are making that are just not going to work in most of the world, but maybe we don’t want to dig into that thread at this moment (smh).?
Joaquin: That's right. But I think it really connects what you're saying. You need to think big picture about what you are really trying to build. Maybe you are building something way fancier than necessary and you can simplify it to have broader access. Really putting yourself in the shoes of the population you are trying to serve. There was a very interesting project with good intentions that was trying to help with preventative medicine and tackled how you might prioritize people who are at risk? The initial attempt was well documented a couple of years ago there were a bunch of articles about it - the team needed an indication of risk so they used healthcare costs and spending on healthcare. The team built that and put in a bunch of other features available, health data and demographics to try and predict when you might need preventative care. But the model predicted that if you were Black or Latinx you didn’t really have healthcare risk. And the reason was that many of these people could not afford healthcare in the first place so they were lost from the data in a different way, similar to the example about the road improvement data and iPhones.?
The model was not predicting whether you were at risk, and the team eventually corrected this to make sure it was useful for a broader population. But this is a great example of really needing to understand the context of your product and where your AI is going to be working.?
Keren: This all reminds of a great book I read on this topic, Race After Technology, which focuses on not only algorithmic bias coming from biased data, but lots of other ways that technology should be designed with an eye towards equity and access.?
Shifting gears, I think that Generative AI is going to make algorithmic bias much more front and center. In traditional ranking products, you might not realize how different your feed or curated selection is from another person’s feed. The AI in this case feels more hidden and harder for the average user to detect bias. With ChatGPT and now GPT-4, the bias is a lot easier for an individual consumer to see and detect immediately upon engaging with the technology. Tell me more about what you think will happen now that fairness issues will be more in users' direct awareness??
Joaquin: Yeah, that's interesting. So there's been a bunch of interesting examples of bias that are well documented (e.g. here and here), where someone would prompt ChatGPT about ‘what makes a good researcher’ or something like that and you get an article that focuses on masculine gender. These become very apparent and draw a lot of criticism. At the end of the day, these models are trained on the Internet. We have multimodal models coming that are able to generate images and text at the same time. One of the most famous data sets for images is called imagenet. If you train a model on it and ask you to show images of basketball players, it's going to show you pictures of people who are mostly Black. If you asked for golfers it would be mostly white. And other versions of questions about sports would show similarly racially stereotyped results.?
In the past the mitigations that you had to work with would be going back to the training data and understanding that it wasn’t diverse enough and then trying to rebalance the data set. That is one approach that can work. But with these LLMs you can actually fix it on the fly with a prompt. The other day I did this by making a very famous Roosevelt speech ‘Man in the Arena’ in a gender neutral manner and it did. You can actually instruct these LLMs to avoid certain stereotypes explicitly. What is more amazing is that you can feed the output of those LLMs back into the LLM and say ‘Hey, does the response you gave me suffer from any stereotyping or biases?” and it will tell you and then you can remove it. This is overall a much more direct way of addressing these issues, right??
Keren: This is truly amazing. In prompt engineering, imagine that everyone had a step in their process to add something to the prompt to avoid bias. It would be amazing if that were a resource to anyone who is doing prompt engineering.?
Joaquin: There are already a lot of materials out there on both detection and correction of bias. Actually while we were talking I asked GPT-4 if it had read Race After Technology, the book you suggested! And it says that it has and gives me a summary of the book, that is written by Ruha Benjamin , and focused on abolitionist tools for the new Jim Code.?
Keren: So it seems like GPT-4 is up to date on the latest. And now we just have to make sure that that gets into all of our prompts and, and that we keep creating materials that future LLMs can consume to usher us into a better more equitable world. Really excited about what can happen here if we make it happen - is there anything else you want to share??
Joaquin: I always go back to this idea that while Responsible AI is extremely important, it's not primarily an AI problem. And that it is a process. It's never done, it's never finished. And this is a bit shocking to people like me who are engineers because we tend to think of our problems as solvable or unsolvable. This is the forever unsolved problem, but that doesn't mean that you shouldn't tackle it which is counterintuitive, right? Because we engineers often think, I'll work on this if I can solve it. If I can’t, then why would I work on it??
The truth is that in addition to being important, it's one of those problems where you have to work across functions and disciplines to get it right – humanities backgrounds, social science, philosophy, designers, product thinkers. It takes a multi-disciplinary view. If you had had a social scientist in the room with the scientists working on finding potholes, they would have identified the issue in milliseconds.?
Keren: ?Yes! Really thankful for my Barnard College liberal arts education in this moment - it is coming in handy! Again, Joaquin, it's been such a pleasure. Thank you so much for taking the time.
Co-Founder @ numi search ventures & numi | C-suite partner for VC/PE investors & high-growth tech businesses & portfolios
1 年Super interesting post Keren, I am excited to share my first exclusive insight newsletter, what is happening in the world of Generative AI. https://www.notion.so/wearenumi/Inside-the-world-of-Generative-AI-bde3f65d280e40b8ae4a4ad73b49bf24?pvs=4
Adjunct Professor in the School of Engineering Design and Teaching Innovation, Faculty of Engineering, University of Ottawa, Canada
1 年The interview is amazing. Thank you. I always learn so much from Joaquin. Interdisciplinary approach is a must (Engineering + social science and humanities and + other disciplines). It gives us a board view when developing AI tools.
VP, Legal - Product, Platform & Partnerships at LinkedIn
2 年Great conversation--covers a lot of ground!
Senior Director, Legal (AI + Data) at LinkedIn | Board Member | Speaker | Outdoors Advocate
2 年Love this! Great conversation with some deep insights.
Such a fun and important conversation, Keren. Thank you for your insightful questions.