The Unconscious Bias Influencing AI
Definition of AI: “Artificial Intelligence is the simulation of human intelligence processes by machines, especially computer systems. These processes include learning (the acquisition of information and rules for using the information), reasoning (using rules to reach approximate or definite conclusions) and self-correction.” – Tech Target
Definition of Unconscious Bias: “Learned stereotypes that are automatic, unintentional, deeply ingrained within our beliefs, universal, and have the ability to affect our behaviour”. “For example, if you’re stuck in a car park with a flat tyre, chances are you’d be most likely to approach a man, rather than a woman for help” - Virgin
The last decade has seen the incredible advancement of technology, coupled with the price of processing and storing data massively decreasing, which has resulted in a surge of expansion of AI capabilities across different industries. Although this period of rapid growth has brought significant benefits, it has highlighted a quietly growing consequence which has been previously ignored, that of data bias.
Where is AI today?
We are seeing a sharp increase in AI applications and a corresponding demand for Machine Learning (ML) solutions. This is due to a growing reliance on computers to solve increasingly complex problems that traditional programming can no longer handle.
Although we are still in the early stages of developing AI, it is already helping businesses to increase sales, detect fraud, automate work processes and improve customer experiences. AI is helping businesses across a wide range of industries, including; Healthcare (better tools for diagnosis), Automotive (autonomous cars), Financial Services (Fraud detection and AI process automation) and Logistics (better delivery management and inventories).
Why is this important?
This is the beginning of the future… The data sequences and algorithms which will influence the technology of tomorrow are being written today. If we are unconsciously inputting biased or prejudiced data into the algorithms, we can only expect that biased or prejudiced predictions will come out.
ML systems (hereafter known as ‘intelligents’) are already being used to make life-changing decisions. Examples include: which applicants to grant mortgages to, which prisoners to release on parole and which job seekers to employ. Designed and implemented correctly, intelligents have the potential to eliminate the human bias in decision making, that society is working hard to erase.
The more AI systems find human errors and inconsistencies in decision making, the more they reveal about the way we as humans think and could even lead us to adopt more egalitarian views. However, it is also possible for the intelligents to reinforce systemic bias and prejudice.
As the usage of AI increases, it increases the need for governments and businesses to take into consideration the social impact of machine learning. The social and economic potential that AI is likely to have in the very near future requires the public’s trust in it. Negative outcomes or prejudiced results from machine learning systems will undermine this effort. Recent political events are gaining notoriety due to the impact of AI and data manipulation. For example, the recent manipulation of data by Facebook and the ensuing political scandal.
The Problem
Biases are being unconsciously created and finding their way into AI systems, which are then being used to make decisions by everyone from businesses to governments. The problem, however, lies within the data sets themselves, not the algorithms. For example, if a company has hired men into 80% of its tech roles in the last ten years, when these are inputted into the algorithms, the machine learning systems use this as a basis for future hires, effectively cementing bias against any applications from females.
Evidence shows problems with bias are not limited to gender but can also prejudice against race. For example, in the case of Google’s well-publicised photo app tagging scandal in 2015, which resulted in a photograph of a black couple being tagged as “gorillas”.
A recent example of data bias comes with the announcement that Amazon has scrapped its AI recruiting tool because it demonstrated bias against CVs with the word “women’s”. The system had been trained to vet applicants by observing patterns in CVs submitted to Amazon over the last ten years. However, because most of the applicants had been men (a reflection of male dominance in the tech industry), the Machine Learning system had taught itself that men were more ably equipped than women in technical roles.
How do biases occur?
There are several reasons for the growing emergence of bias in ML systems, including:
? Eliminating training bias requires a lot of planning and design. However, when companies are keen to release the next round of technology, they can rush through data sets without scanning them completely for bias, foregoing thorough algorithmic training. If not properly trained, the intelligent can pick up on politically incorrect language and incorporate it into its learning database;
? A lack of diversity in the data sets being used to train algorithms, i.e. some members of a population being sampled more or less frequently than others;
? A lack of diversity in the teams reading the data, which results in biases being created and not recognised at a higher level;
? ML systems using only readily available data, partly due to the expense and difficulty in getting data systems;
? On occasion, the data might not exist to train the algorithm for its future uses, and some bias occurs from using biased data from historical pools; and
? Skewed learning arising from interactions over time
? The risk for discrimination in algorithm design and deployment arises from choosing the wrong ML model, building a model with discriminatory features, an absence of human oversight and involvement, and having unpredictable and inscrutable systems.
So how to prevent these…
The Solution
For starters, companies need to implement a more diverse data team and use up-to-date data sets. If one or more demographic is under-represented or missing from a data set, the ML system may fill in gaps from outside sources that have the potential to be biased.
Using a diverse team brings different skill sets, ways of thinking, approaches and background to the table and delivers a more holistic approach. A variety of approaches gives a better chance of a positive outcome.
Machines can learn to operate without human bias if constantly tested and trained with unbiased data samples. Consequently, we need to pay extra attention to cultural representation within our teams and sample pools.
Biases are as common in machines as in humans. When these biases are discovered, there are multiple ways to rectify them, including exposing the machine to more unbiased data, algorithm selection, and retraining the machine to eliminate these biased outcomes.
Bloomberg Technology has stated that it “will take years to solve the bias problem”, but the future may not be as bleak as previously thought. There are many tools available and more being produced to help combat bias in AI systems, for example, IBM’s ‘Fairness 360’ which will be able to scan for bias then recommend adjustments. Specifically, IBM’s tool will be able to analyse the ‘how’s’ and ‘why’s’ behind an algorithm’s decision in real time. Google has also introduced its ‘What If ’ tool that allows its users to visualise their data sets to better see the demographics in the data. Additionally, users can manually manipulate the data to foresee the effects it would have. Other toolkits have also been announced.
With the right toolkits and diverse teams combined with new and diverse data pools, it isn’t unreasonable to expect data bias to be a problem of the past in just a few short years.
To discuss this paper and understand how Helix Insight can support your organisation through our Insight or Executive Search solutions, please do not hesitate to get in touch with Alexandra Milligan.
020 3146 8440
London Bridge House, 181 Queen Victoria Street, London, EC4V 4DD
Senior ECG Algorithm Engineer
5 年Very well written article, well done! I completely agree that AI bias is in the training data. There are some clever ways around it using under-sampling of similar training data, but that requires you to know where the biases are in the first place. Hopefully research can shed light into the extent of bias in AI in the coming years