Superintelligence, Superalignment and Existential Risk to Humanity
In early May 2024, two key people in the Superalignment team of OpenAI resigned. These were Ilya Sutskever (OpenAI’s cofounder and chief scientist) and Jen Leike (Head of Alignment Research). Leike claimed after his resignation that “safety culture and processes had taken a backseat to shiny products” at the company. These resignations left many observers bewildered.? The superalignment team had been formed at OpenAI in July 2023 and both these distinguished persons were its torchbearers. Their challenge as stated in an OpenAI blog was “how to ensure AI systems, much smarter than humans, follow human intent”. The company had promised to allocate 20% of its total computing power to the superalignment team to equip them with requisite resources for keeping up the research on super-alignment, but the superintelligent team failed to get the requisite resources. Hence their resignation. Subsequently, OpenAI dissolved the superalignment team and merged the same with their AI safety team.
This raised a few important questions. What is super-alignment and why is super-alignment research critical? Will these developments hasten the doomsday arrival, signifying the existential risk to humanity??
AI Alignment and Superalignment
Let us first talk of AI alignment and then move to superalignment. “Alignment” refers to “the processes of designing, training and fine-tuning Large Language Models (LLMs) to adhere to human ethical standards, preferences and objectives” (Gokul et al, 2024).? An IBM research paper (2023) defines alignment as “the process of encoding human values and goals into LLMs to make them helpful, truthful, transparent, safe and reliable”. These definitions relate to LLMs which are the foundation of Generative AI.
On the other hand, “superalignment”, a term coined by OpenAI, is defined as “the development of systems and processes to control superintelligent AI models which exceed human intelligence”. The objective is to ensure that superintelligent AI systems act in accordance with human values and goals. The pursuit of superalignment goes beyond Generative AI and relates to Artificial General Intelligence (AGI) or Superintelligence, which is being hotly pursued by OpenAI in its on-going research. The concern is that superintelligent systems will be so complex that humans won't be able to understand or control them, possibly leading to disastrous outcomes. Because of their advanced capabilities super-intelligent machines can potentially deceive their own creators. Their ability to learn and improve themselves dynamically can result in unforeseen behavior. Lastly, defining human values for a super-intelligent system as an unambiguous objective function is a challenging task, as discussed below.
What are Human Values?
“Human values” is a very tricky subject. There is no universally accepted definition of human values. Religions, geopolitical factors, national pride and commercial motives can mean that different sections of humanity have different human values. One author believes that on account of this diversity in human values, those values which are in compliance with the law can be deemed as universal human values. But this is a risky proposition. Which laws? The law reflects the cultural ethos of a country and one country’s law can be totally or partially contradictory to the law of another country. Further, human values are not static; they change over time. What was taboo decades back in the human society has now not only become acceptable but also propagated. LGBTQ and abortion rights are classical examples of the dynamics in human values. However, there is often a difference in or even contradiction between the positions of different countries on these subjects. Thus, human values are dynamic, subjective and diverse.
Discussion on human values becomes even more complex when we consider superalignment. The concept of superalignment is based on the premise that super-intelligent systems will outperform human intelligence. If such systems outperform human intelligence, it will obviously make the task of superalignment more challenging. This is because the understanding of human values by superintelligent systems and monitoring of the same only through human intelligence will not work, as the super-intelligent systems will be smarter than human intelligence.
Strategies for Alignment/ Super-alignment
According to the IBM Research Paper (2023), achieving alignment is a two-stepped process. The first step is instruction-tuning, i.e., the model learning by examples. The second step is critique phase wherein a human or another AI grades the response in real time to measure degree of alignment. A related approach to achieving alignment is through “contrastive fine tuning”. This essentially implies training the model on wrong responses, and thereby telling the model what not to do.
As regards achieving super-alignment, the research team of OpenAI had proposed a solution termed as “weak to strong generalization”. ?This solution suggested using a smaller or a weaker model to supervise a larger model. The team observed that when strong pre-trained models were naively fine-tuned on labels generated by a weak model, their performance was consistently better than that of their weak supervisors. For example, fine tuning GPT4 with GPT2 level supervisor would help recover close to GPT3.5 level performance. However, according to Robert Hanna (2023), this proposed solution does not address the fundamental problem of ensuring that super-intelligent systems share human values.
?Existential Risk: A Peril of Misalignment
Many leading scientists and experts fear that misalignment of superintelligence can even lead to human extinction. A super-intelligent machine might resist being controlled, pursuing its goals regardless of human desires. Superintelligence might find unconventional solutions to human goals, potentially harming humans in the process. A super-intelligent machine can rapidly improve its own capabilities at an exponential rate, outpacing human control. Robert Hanna (2023) believes that current AI research without the safeguards of super-alignment is dangerous. He compares the development of superintelligence with that of the atomic bomb through the Manhattan Project during World War II to illustrate how devastating the consequences of a misaligned superintelligence can be.
领英推荐
A misaligned superintelligence has the potential to be misused for various purposes such as designing incurable diseases or launching highly sophisticated cyberattacks. It can be used to create a totalitarian regime by manipulating people on a large scale.? Competition between high tech companies such as OpenAI and others potentially entering the race to develop superintelligence may lead to costly compromises on safety standards.
However, some skeptics argue that these risks are highly exaggerated. According to them, superintelligence would learn human values through a natural evolutionary process and therefore will not like to dominate humanity. Other skeptics argue that instead of worrying about risks that may arise in the distant future, it is more helpful to focus on current issues with AI, especially Generative AI, which has now virtually become mainstream, despite its serious limitations.
Conclusion
Serious concerns have been highlighted regarding the criticality of super-alignment research while developing super-intelligent machines. Superintelligence has the potential to pose an existential risk because (a) it may be impossible for humans to control the same due to its superior intelligence; (b) super-intelligent machines may pursue goals that are harmful to humanity; and (c) defining "human values" in a way that a superintelligence can understand is a significant challenge. Continued research on super-alignment is therefore critical and should not be sidelined while pursuing the race towards developing superintelligence. Whether superintelligence becomes a utopia or a dystopia hinges on our ability to prioritize safety and ensure its alignment with human values, howsoever tricky these human values might be.
?
References
1.????? Collin Burns; Pavel Izmailov; Jan Hendrik Kirchner; Bowen Baker; Len Gau; Leopold Aschenbrenner; Yining Chen; Adrien Ecofelt; Manas Joglekar; an Leike; Ilya Sutsvekar; Jef Wu, “Weak to Strong Generalization: Eliciting Strong Capabilities with weak supervision”, 14 December 2023, Superalignment Generalization Team, OpenAI, accessed at https://arxiv.org/pdf/2312.09390 ?????
2.????? Gokul Puthumanaillam; Manav Vora; Pranay Thangeda and Melkiar Omik, “A moral imperative: The need for continual superalignment of LLMs”, 13th March 2024, ?accessed at https://arxiv.org/abs/2403.14683
3.????? Eliza Strickland, IEEE Spectrum, “OpenAI’s Moonshot: Solving the AI Alignment Problem”, 21 May 2024, accessed at https://spectrum.ieee.org/the-alignment-problem-openai
4.????? Kim Martineau, “What is AI Alignment”, 8th Nov. 2023, IBM Research, accessed at https://research.ibm.com/blog/what-is-alignment-ai
5.????? Robert Hanna, “OpenAI, The?? Superalignment??? Problem, and Human Values”, January 2024
6.????? Wikipedia, “Existential Risk from artificial general intelligence”, 27 May 2024, accessed at https://en.wikipedia.org/wiki/Existential_risk_from_artificial_general_intelligence
Former Dean Academics and Professor, IMT Ghaziabad
5 个月There is an update on this. Ilya Sutskever has now launched a new start-up named?Safe Superintelligence Inc. (SSI) to build safe superintelligence. His co-founders are Daniel Levy, his former colleague at OpenAI and Daniel Gross, former AI lead at Apple. The founders contend that building safe superintelligence is “the most important technical problem of our time”. They claim that SSI would pursue safe superintelligence in “a straight shot, with one focus, one goal, and one product.”. Best wishes to them.
Professor - Marketing Management
5 个月An exciting read and multiple insights. Thank you for this thought provoking article Prof...??
Innovation Advisor at Knowledge Economy Innovations
5 个月Excellent article Surinder - very inciteful.