AI Alignment
Past (context)
Alignment is one topic in the broader field of AI safety - similar to the question of nuclear safety, if humans create technologies which could pose an existential threat to humanity, what safeguards are we willing and able to put in place to protect us from those threats. One of the best known examples of the alignment problem was posed in 2003 by Oxford professor Nick Bostrum. He describes an innocuous program which he calls "paperclip maximizer" that is simply directed to improve the efficiency of paperclip manufacturing. But without any other directives that would constrain the paperclip goal, the machine could take destructive acts that, while maximizing paperclip production, redirect resources from other necessary human activities and ultimately threaten life itself. (Here is Nick Bostrum's 2015 TedTalk - worth watching...)
Computer scientist Norbert Weiner put this problem succinctly as long ago as 1960 when he wrote:
领英推荐
“If we use, to achieve our purposes, a mechanical agency with whose operation we cannot interfere effectively … we had better be quite sure that the purpose put into the machine is the purpose which we really desire.”
The problem of alignment is complicated, however, in that it is very difficult for people to agree on what "we really desire." People have quite different objectives from each other and this creates a variety of behaviors that maximize their own outcomes without taking into consideration the needs and wants of others. We see this in industry (Martin Shrekli), in politicians (what does Putin want?), and even in our everyday interactions with neighbors (road rage). As researchers have pointed out, the example of our global response to Covid-19 is very informative about the challenges we face in responding to other existential threats, such as a runaway super intelligence (AI Alignment Forum post by Victoria Krakovna).
The Machine Intelligence Research Institute's co-founder Eliezer Yudkowsky in 2004 proposed a concept for solving this challenge which he calls coherent extrapolated volition. The basic idea would be to build into advanced AI systems the capacity to analyze and "recursively iterate" for all of humanity the converging desires of our species in order to incorporate supporting those desires into any course of action. Effectively, can we develop in an AI the capacity to understand and respect collective human needs as a counterbalance to any other goals that may be set for that AI. More recently Yudkowsky has expressed doubt that we will achieve our AI safety objectives (more complicated read here if you want to explore his views).
How we got here: "The potential for superintelligence lies dormant in matter." - Nick Bostrum. And if this idea is right, and human beings are capable of creating such a superintelligence, we should reasonably be concerned about whether or not it will have the best interests for humanity in mind. But even if we are able to devise ways, such as coherent extrapolated volition, which would be capable of understanding our best interests we will however find it difficult to ensure that people build such machines with the capability of respecting all of humanity's desires (and not just those of some small group or individual). If you agree that superintelligence is possible, then it seems logical that their is a danger of a misaligned AI. This could be through unanticipated consequences or through deliberate design by parties pursuing their own objectives. Research organization like MIRI are seeking to raise awareness of these issues and help create normative standards in the international research community to design solutions. And the larger AI development and research companies such as OpenAI have focused alignment research programs. We should all hope for their success.