How to surf the AI wave safely - Part 2
Image via leonardo.ai

How to surf the AI wave safely - Part 2

If you missed my last newsletter, you can find it here .?

As always I’d love to hear what’s on your mind . Feel free to DM me at [email protected]


My last newsletter was Part 1 on the big topic of How to surf the AI wave safely covering the BIG or philosophical questions about AI. Are you in the ‘all AI is evil camp’ or curious to know more before judging the AI tool in front of you.?

In this newsletter I want to cover some key points related to evaluating AI tools.

On that topic, it’s important to recognise that buying AI tools is unlike buying any other HR tech. Most of the HR tech we buy, be it with or without AI, are going to be software (maybe in a not too distant future we may be faced with buying robots at work too). But AI based solutions differ fundamentally in two ways:

  1. Learning vs. pre-defined rules: Unlike typical software solutions built with human-written specifications, AI systems learn rules using machine learning algorithms from large datasets. What this means is, you might not fully understand how these systems work and the data used in training makes all the difference. For example think of GPT-4 training on trillions of words
  2. Chance vs. deterministic: An outcome related to above is that AI systems tend to output a probability (showing likelihoods or chances) of what it is trained to infer (e.g. next word in a sentence, a CV is a good fit for a role, whether an email is spam or not etc) compared to giving a definite outcome based on a pre-specified logic (e.g. tax you are supposed to pay, CV contains predefined keywords etc)

?

It is due to this opaque internal workings and outcomes based on likelihoods that puts AI in a special category that requires deeper levels of scrutiny before you buy them. But it is also important to remember that the reason you are considering an AI based system is because traditional software has been unable to solve or provide a meaningful solution to the problem at hand. Complex problems may require solutions with complex inner workings.

Of the many aspects you should be looking into, I want to focus on three key aspects, maybe the most important and what you should be looking into first. If you findings are weak in three aspects, chances are you will not find a sound and responsibly built AI solution.

Those are:

data: where does your AI’s learnings come from?

model: what is the AI really learning?

bias testing: what are you doing to mitigate relevant biases?

For a more comprehensive coverage of aspects you should be looking into you read Sapia.ai ’s FAIR framework and responsible AI papers.


Doing your due diligence (DD)

Thanks to the EU GDPR there is ?high awareness in enterprises around data privacy, protection and consent. This makes the life of vendors like sapia.ai a lot easier as many of the level 1 DD is all around ‘the data’.

This means being scrutinous around :

  1. The data

The data used to train a AI model is like a guide that teaches the model about the problem it's trying to solve. The choice of data and the assumptions made by the developers can greatly impact the fairness of the model. If the data is biased or flawed, the model's results will likely be biased too. This is why data quality is crucial in ensuring fair and accurate AI outcomes.

?

Some questions to ask:

What data sets are being collected and how is this data used?

Is the data sourced ethically?

Who owns the data ?

Where is the data stored?

How is the data stored?

Is any data retained and if so, in what form e.g. is it de-identified, anonymised?

How is demographic data used in the system, how does it link to the models and how long is that data stored for? Can demographic data be traced back to an applicant (indirectly or directly)?

Is data adequately protected from unauthorised access?

Is the data tested for potential and relevant inherent biases?

Is data always encrypted at rest and in transit?

Is there a logging and event management system for application platforms and security events? How long are logs retained?

Is the Service Penetration tested at least annually?

Is data logically or physically isolated from any other customers in a multi-tenant or shared service?

?

When assessing vendors treatment of data, you want to get really forensic and distinguish between each of these components:

-?????? storage location

-?????? identifiability

-?????? access rights

-?????? retention rights

-?????? usage

-?????? bias testing

?

  1. The model build

The next area to be deeply forensic is on the model build process.

A model (that uses algorithms) is like the tool that finds patterns in data and uses them to make predictions about new, unseen information. These tools can be simple or complex, ranging from basic models to advanced ones like deep neural networks like the found in generative AI like ChatGPT. Each type of model works based on certain assumptions about how patterns are found and aims to either achieve high accuracy or reduce errors.

Are you going to be using simpler rule-based models or more complex machine learning models?

The more complex the model is the harder it gets to explain the outcomes. The question is really about using a model that gives the right balance of accuracy and explainablity for the problem at hand. A simpler rule-based model ensures a human is in the loop right from the outset. It enables the employer to know, and share internally, exactly what the technology is looking for. This is very different to the standard approach, which involves building a Machine Learning Model from a historical dataset of hires. Historical data is problematic because it usually comprises a high-performing group of hires, leading to a model that, for example, learns latent patterns associated with high performers compared to low performers of the past hired candidates.

That is, when you risk amplifying historical biases in hiring, and when explainability becomes practically impossible. Not to mention that by building models off the performance variations in the already hired people, you are working with a restricted or filtered sample (referred to more technically as having a restriction of range) that is not representative of the true candidate population. This can lead to machine learning models that can miss out on talent. There is a misconception in the market that to have an AI model, you must inevitably rely on a historical ‘people’ dataset.? You can dive deeper into this topic here .

Another expectation you should have is to have access to the model card for every model.?

The lack of transparency related to training data and behavioural characteristics of predictive models is a key concern raised when using machine learning-based applications. If there is no documentation around intended/unintended use cases, training data, performance, model behaviour, and bias-testing, step away!

Google introduced? the concept of a model card as a standard template for reporting important information about a model, helping users make informed decisions around the suitability of the model. Sapia.ai uses model cards to document how the model was built, what assumptions were made during its development, what type of model behaviours could be experienced by different cultural or demographic population groups, and an evaluation of how well the model performs with respect to those groups.

You can dive deeper into this here.

  1. Bias testing

We have always had our own AI governance framework which we made public four years ago. We call it FAIR(?) - Fair AI in Recruitment. You can find it here . It’s our most downloaded whitepaper.

Whilst AI specific regulation is emerging in some markets e.g. the NY City Local Law 144,? and then the EU AI Act, anyone using or selling AI must conduct bias tests ideally pre-go live and then ongoing. Personally, I think an annual audit is not transparency. It’s a once a year audit and companies ought to demand access to more real-time bias testing.

The key questions to ask include : What testing do you do and at what point to ensure the models are not biased/resulting in biases? What steps are taken to ensure that the AI system does not produce biased or unfair outcomes? Is there alerting and/or controls to prevent biased models from continuing to run? How frequently are the underlying models updated/retrained and what QA/testing is undertaken with releases? Are releases visible to customers? How do you mitigate and manage the potential for data drift and concept drift over time, potentially leading to unreliable and unstable AI system behaviour??

?

Following is a simple framework to guide your evaluation that complements what I have discussed above. It is a deep-learning exercise (excuse the pun), that you need to embark on if you are to reap the benefits of AI, which is becoming an imperative? if you are a business leader.? . Re-stating what I said in my last newsletter, get the help of AI in that learning journey. Have a chat with your favourite generative AI chatbot about evaluating AI.

?

I would love to hear from you if you are going through this journey of DD.?

Let’s all contribute to the responsible use of AI in HR.

The next 2 segments are about:

  • Operationalising AI?
  • Optimising the AI over time

I will cover these in a subsequent newsletter?

Until then …

Great article Barb and I love that Sapia.ai takes #ResponsibleAI so seriously. I have recently launched MorH.org as my contribution to the same, attempting to solve for the very real challenge of trusting whether content is from a Machine or a Human.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了