Beyond AI: Investigating The Algorithmic and Data Bias

Beyond AI: Investigating The Algorithmic and Data Bias

Thank you for reading my article. I regularly write about Digital transformation and how it is reshaping the future of the enterprise.


To read my future content, follow me on LinkedIn.

------

Recently Buzzfeed published an article featuring AI-generated images of Barbie dolls from different Countries. The images were generated using the famous generative AI model — MidJourney. The images created a huge backlash on X (formerly Twitter) as the majority of these images depicted massive biases and stereotypes. For instance, a South Sudanese Barbie was holding a gun and a German Barbie was reminiscent of a Nazi soldier. The article seems to be removed from Buzzfeed since then.


Algorithm bias is very real and poses an even bigger threat than human bias. We tend to ‘legitimize’ or ‘validate’ the results and our thinking with technology. The ‘unconscious’ bias that algorithm carry can remain undetected and work in the background to create a systemic unfair advantage or disadvantage for a specific group of people. The worst part is how quickly it can multiply and scale as more and more data with biased results is consumed by the algorithm.


In a recent study by Researchers at AI Start-up Hugging Face and Leipzig University generated 96,000 images using Stable Diffusion models. The results have been pretty disturbing, if not unsurprising. For instance, when tasked with crafting images portraying CEOs or Directors, the models predominantly generated images of white males. They have since then made this study available to be used by the public in the form of an interactive tool here.


In the past, Amazon had to shut down its AI-based recruiting tool as it was showing bias against women candidates. Similarly, in 2016 Microsoft launched a Twitter Chatbot named Tay for casual and playful interaction with users. However, within 24 hours, it ‘learned’ from public comments and interactions and started tweeting racist and antisemitic comments. It had to be shut down!


With a vast population being exposed to AI today, in one form or the other, it is critical as part of our Digitial literacy to understand the Algorithm and Data Bias. This ensures we are not taking the results generated by the likes of ChatGPT at its face value and doing sufficient human-conscious and contextualised fact-checking.


Algorithm Bias

Kate Crawford, a Distinguished Research Professor at New York University explains AIgorithm bias in terms of Allocative and Representative Bias. Allocation Bias occurs when the algorithm unfairly allocates an opportunity to a specific group e.g. Amazon’s recruitment algorithms favouring Men over Women. On the other hand, Representative Bias happens when the algorithm stereotypes or ‘represents’ a certain group with a specific trait. For instance, COMPAS, a widely used algorithm in U.S. criminal justice system is found to be representationally biased towards black defendants. It is far more likely to flag a black defendant as having a higher risk of recidivism than a white defendant.


Data Bias

Another aspect of bias in AI models is caused by Data. For simplicity's sake, I will broadly categorize this into two types; Source and Diversity.


LLMs use a vast amount of previously human-generated data as input for training. The ‘source’ and content of the data these models are consuming are mostly unknown and in most cases proving to be detrimental to the neutrality of AI algorithms. Take the example of MidJourney. The company is under heavy criticism for massively infringing copyrights as it is training its models on a huge data set of previously human-generated images. The source and biases associated with these images are unknown and how it will systematically include or exclude certain groups of people unfairly is also a subject of debate at this stage.


Similarly, if the algorithm is being trained on a set of data that is not representative of the population, it creates tendencies and results which again are skewed towards a specific point of view. The earlier example of Tay by Microsoft is a case in point. The data is not Diverse enough to provide us with representative results.


Aftermath

Stanford HAI recently reported based on The AI, Algorithmic, and Automation Incidents and Controversies (AIAAIC) database that over the past 10 years (2021-2012), the number of ethical issues associated with AI has increased by 26 times i.e. from 10 cases to 260 cases. This increase in ethical concerns is a direct result of a lack of responsibility in AI algorithm development and consciously or unconsciously introducing biases in the AI models.


A recent study by Bloomberg used Stable Diffusion, a text-image AI model, to generate 5,100 images of people as a representation of workers for 14 jobs. The analysis suggested that the images generated for people working in high-paying jobs (Lawyers, Engineers etc.) were mostly of lighter skin tones, while images of people with?darker skin tones?were predominantly generated for prompts like “Janitors” and “Dishwashers” etc.


This is far from reality. The same report compares how many ‘Women’ images were generated for the keyword “judge”. It was only 3%, while in reality, 34% of US judges are women! How this will change the worldview we hold today…you guessed it right…is yet to be known!


What is next?

A report by Gartner indicates that by 2025, 30% of marketing from large organizations will be through AI-generated content. Another report by Bloomberg suggests that the Generative AI market is set to reach $ 1.3 trillion by 2032.


This puts a huge responsibility on large organizations, Governments and Academia to join hands to fight for ethical and responsible AI practices. A global standard providing Data Lifecycle frameworks to validate data sources and their authentication needs to be introduced. Blockchain can play a major role in this data governance and modelling by providing an immutable, distributed and trusted network to validate content generated or consumed by AI models.


As for the general public, as I always advocate, improving Digital Literacy is the only way to make informed choices with the use of AI. The core definition of Digital Literacy is no longer how you can use a computer, but extends to how we interpret, understand and connect the results generated by AI algorithms. With the power to reshape perspectives, AI holds the key to forging a brighter future – provided we remain vigilant and aware of its limitations.

Nancy Chourasia

Intern at Scry AI

4 个月

Great share. Ensuring fairness in AI models involves addressing bias, which is defined as unequal treatment based on protected attributes like gender or age. Fairness metrics, integrated into many AI systems or computed externally, include favorable percentages for each group, distribution of data for protected groups, and combinations of features related to one or more protected groups. To some extent, open-source libraries like Fairlearn and The AI Fairness 360 achieve fairness by computing metrics such as disparate impact ratio, statistical parity difference, equal opportunity, and equal odds to assess and enhance fairness. It is worth noting that fairness and bias differ because biases can be hidden, while fairness requires unbiased treatment concerning defined attributes. For example, training data may introduce biases of its own, which are often called Algorithmic Biases. Finally, after recognizing the dynamic nature of fairness, jurisdictions may alter the definition of fairness over time, making the task of updating AI models quite challenging. More about this topic: https://lnkd.in/gPjFMgy7

回复
Angela Hayes

Talent Acquisition Specialist

1 年

Thanks for posting this interesting article highlighting that bias is still a concern even when we remove the individual from the context! Something I will take from this when considering AI generated information migh be along the lines "Buyer Beware or all is not necessarily truth and is open to misintetpretation! "

???????????????? ????????????????

???????????????? ???????????????? ?????????? ????????????... ?? ?????????????????? ???????? ?????????? ?????????? 2018! ???????? ????????, ?????????? ??????????, ???????? ????????. ?????????????????

1 年

There are a number of things that can be done to address bias in AI systems, such as: - Using more diverse data:?This can help to reduce data bias. - Using more transparent algorithms:?This can help to identify and address algorithmic bias. - Involving people from different backgrounds:?This can help to reduce human bias. - Developing debiasing techniques:?This can help to remove bias from AI systems. The bias present in the current AI landscape is a serious problem, but it is one that can be addressed. By taking steps to mitigate bias, we can ensure that AI systems are fair and equitable for everyone.

Andrew Cunningham

Consultant, lead developer at Enabling languages

1 年

I would suggest that the programming languages we use, also introduce bias, especially when you are working with natural languages. A programming language will have varied support for text across natural languages. In essence natural language support is tiered. This affects pre-processing of text, toekisation, etc.

要查看或添加评论,请登录

Tariq Munir的更多文章

社区洞察

其他会员也浏览了