AI Containment: Step 1) Technical Safety
Robert H. Eubanks
Human and Machine Collaboration | AI Engineer | Democratizing Generative AI | Building with Large Language Models (LLMs)
Recently, I've immersed myself in Mustafa Suleyman 's thought-provoking tome, "The Coming Wave." The book serves as an in-depth analysis and roadmap for navigating the complex future of two pivotal technologies: Artificial Intelligence and Synthetic Biology. In his work, Suleyman meticulously lays out a 10-step containment strategy aimed at both mitigating risks and advocating for ethical evolution in these technological fields. This article marks the kick-off of a series where I will distill these ten key points for a more general readership. Today, we discuss the first step he outlines, which he titles "Safety: An Apollo Program for Technical Safety."
?
Within this initial step, Suleyman highlights the critical importance of placing technical safety at the forefront of any containment strategy for emerging technologies. He recounts the transformative strides made in addressing the biased and potentially harmful outputs once associated with large language models (LLMs). According to Suleyman, this leap forward has been largely facilitated by the application of reinforcement learning from human feedback (RLHF), a technique where researchers actively spot and amend biases or inaccuracies in a model's output.
?
Yet, Suleyman underscores that focusing solely on algorithmic refinements falls short of true technical safety. He introduces the concept of multiple layers of containment as integral to a robust safety strategy. One such foundational layer is "hard physical control," which involves sequestering AI systems in controlled, restricted settings to preempt any unintentional consequences in the real world. As a case in point, he points to "air gaps," a design choice that disconnects a system from the internet, thereby constraining its ability to interact externally.
?
领英推荐
He draws insightful analogies between the safety standards in nuclear power and biotechnology sectors, expressing concern about the existing underinvestment in the field of AI safety. Pushing the envelope, Suleyman calls for an "Apollo Program" scale of commitment, enlisting hundreds of thousands of experts to contribute to this vital cause. He goes a step further, advocating for regulatory mandates that compel companies to earmark at least 20% of their R&D funds solely for safety-centric initiatives and to make their research findings publicly available for collective advancement.
In addition to offering a critique, Suleyman delves into ongoing cutting-edge research aimed at bolstering technical safety. This includes the development of "critic AIs," specialized models that audit and enhance the output of other AI systems, and an array of methods for instilling robust ethical guidelines and error-correction protocols within AI architectures. Importantly, he notes that the objective is not simply to build 'fail-safes' or 'off switches,' but to develop a holistic safety architecture uniquely suited to tackle the challenges and potentials of nascent technologies.
In summary, the first step in Suleyman's containment framework articulates that achieving technical safety is not a one-dimensional task. Rather, it necessitates a multi-pronged approach that spans legislative action, R&D funding, and meticulous attention to both physical controls and algorithmic adjustments. Keep an eye out for subsequent articles where we will cover the remaining containment strategies from Suleyman's insightful book.