Superintelligence alignment and AI Safety
Recently, the creators of #chatGPT, OpenAI , published ‘Introducing Superalignment’, beneath the flagship message: “We need scientific and technical breakthroughs to steer and control AI systems much smarter than us.”
Alignment refers to aligning the capabilities and behaviours of artificial intelligence (AI) systems with our own (human) expectations and standards.
“We’re talking about AI systems here, so you need automated alignment systems to keep up – that’s the whole point: automated and fast,” says Advai’s Chief Researcher Damian Ruck .
This clearly resonates with OpenAI’s own views:
“Superintelligence alignment is fundamentally a machine learning problem.”
?
Forget General Intelligence and “AI systems much smarter than us” – as OpenAI’s page forewarns – for one second, we need scientific and technical breakthroughs to steer and control the AI systems we already have!
?
"No one predicted Generative AI would take off quite as fast as it has,” Damian freely admits, “things are moving very quickly. Things that didn’t seem possible even a few months ago are very much possible now.”
If it’s hard for technical people to keep up, you bet it’s hard for business leaders to keep up.
?
If you’re a business manager, you might be thinking ‘well, we don’t work with advanced AI, so this doesn’t concern us’.
But trust us, it does.
Even if you work with only simple AI tools – such as tools provided by third parties, it’s equally important to understand what their vulnerabilities are and if they can be made more resilient (don’t worry, we will finish this article with some actionable advice for you). ?
Put simply, how can you trust a tool if you don’t know its failure modes?
?
Damian leads a team that researches AI robustness, safety and security. What does this mean? They spend their time developing breakthrough methods to stress test and break machine learning algorithms.
This, in turn, shows us how to protect these same algorithms; from intentional misuse, and from natural deterioration.? It also lets us understand how to strengthen their performance under diverse conditions.
This is to say, to make them ‘robust’.
?
The Superalignment initiative aligns well with our research at Advai. Manual testing of every algorithm and for every facet of weakness isn’t feasible, so – just as OpenAI have planned, we’ve developed internal tooling that performs a host of automated tests to indicate the internal strength of AI systems. ?
“It’s not totally straightforward to make these tools.”
Damian’s fond of an understatement.
?
The thing is, trying to test for when something will fail is traying to say what something can't do.
You might say 'this knife can cut vegetables’. But what if you come across more than vegetables? What can’t the knife cut? Testing when a knife will fail means trying to cut an entire world of materials, categorising ‘things that can be cut’ from ‘everything else in the universe’. The list of things the knife can’t cut is almost endless. Yet, to avoid breaking your knife (or butchering your item) you need to know what to avoid cutting!
To be feasible, one needs shortcuts in conducting these failure mode tests. This is where automated assurance mechanisms and Superalignment comes in. There are algorithmic approaches to testing what we might call the ‘negative space’ of AI capabilities.
?
This might sound difficult - and it is, controlling what an algorithm does is hard, but controlling what it doesn’t do is harder. We’ve been sharing our concerns about AI for a few years now: they have so many failure modes. These are things businesses should be worrying about because there is a pressure to keep up with innovations.
There are so many ways that a seemingly accurate algorithm can be vulnerable and can subsequently expose its users to risk. Generative AI and large language models like Chat GPT-4 make it harder still because these models are so much more complex and guardrail development is reciprocally much more challenging. ?
?
So, kudos to OpenAI for taking the challenge seriously.
From their website:
领英推荐
-- “We are dedicating 20% of the compute we’ve secured to date over the next four years to solving the problem of superintelligence alignment.”
-- “Our goal is to solve the core technical challenges of superintelligence alignment in four years.”
?
?
What’s next, we ask Damian.
“The importance of AI Robustness is only going to increase. We're expecting stricter regulations on the use of AI and machine learning (ML) based models.”
Strict legislation is designed to protect people against breaches of privacy – as with GDPR, and soon too against breaches of fairness – such as with The AI Act. Bias being one example of a failure mode of AI, for example leading to unfair credit score allocations.
Taking the infamous example of Apple’s credit rating system, which did exactly this – favouring males, one can understand that a failure mode is more than about model accuracy. For all intents and purposes, Apples algorithm worked correctly: it found a pattern in its training data that suggested men could be entrusted with greater credit. It wasn’t a failure of the algorithm; it was a weakness of the data.
Or another infamous example, when Microsoft’s Tay – a chatbot, began to espouse egregious views. It wasn’t a failure of the algorithm, which was clearly designed to adapt to the conversational tone and messaging themes of its fellow conversationalists, but nevertheless it was a massive failure!
Making a distinction between engineering failure modes and normative failure modes is a crucial one to make.
?
So, we need guardrails in place both for engineering and normative failure modes. And that’s practically what Superalignment is designed to do. Training automated systems to support us in detecting and mitigating normative failure modes.
?
It’s a challenge.
?
?
?
To finish with some advice to commercial business managers:
Controlling AI systems presents a huge challenge to managers today. The competitive drive to adopt productivity enhancing tools will only increase and there will be a temptation to rush the development of guardrails.
But here’s the thing, sometimes AI tools ‘fail’ in totally unexpected ways! Ensuring you have a system of processes and tools that help you reduce failure modes is first step for any business concerned with keeping their AI behaviour aligned with company goals.
The openly stated difficulty of OpenAI’s Superalignment initiative and our own research at Advai emphasises the urgency for investing in AI alignment and robustness initiatives.
It may be to prevent bias, to ensure security, to maintain privacy. Or it could be a totally different and unforeseen consequence that you avoid.
We must not lose sight of the importance of creating reliable and controlled tools.
?
So, while the task may seem daunting, a proactive approach to aligning AI systems with your business’s needs and society’s expectations is sure to pay dividend in the long run.
?
Here are a few ways to kickstart the alignment of your AI:
?
?
Start preparing now, don't wait for the regulations. If you build or deploy AI in any way, you should either begin creating these alignment tools internally or commission them as soon as possible.
?
Get in touch! ????????????????????????????????????