Analogous Risks and Lessons for AI Model Red Teaming from Gain-of-Function Virology Research
Aaron Roberts, PhD
Research, advisory, and advocacy on ethics, policy, and governance regarding emerging technologies (expertise on A.I. and synthetic biology)
I. Introduction
Recently, while studying AI governance, policy, and regulation in BlueDot Impact’s AI Governance course I became absorbed in reading about the practice of AI model red teaming (AIMRT), an adversarial practice to probe weaknesses and discover unknown capabilities in advanced AI models. As I come from a bioethics and global health policy background, I immediately perceived parallels with gain-of-function research (GoFR) in the field of virology — a field notorious for its high-risk experiments on pathogens. Both domains grapple with balancing innovation and safety. As I considered these practices side-by-side I began to draw more analogies between them. One dissimilarity struck me; it occurred to me that GoFR is considered rather dangerous and controversial and as a result is highly regulated. Meanwhile, to my knowledge, AIMRT has received nothing but positive regard and encouragement even in policy and regulation circles. I found the apparent discrepancy in perception and treatment of these seemingly deeply analogous practices intriguing. I wondered whether the analogies I perceived were merely superficial, or if there were lessons that could be learned from GoFR for safer and better-regulated practice in the sphere of AIMRT. I decided to take a closer look. The following summarizes what I found. ??
Was I alone in perceiving these parallels?
?
A little Googling revealed I had not been the first to draw this connection, though deeper discussion of these research areas as analogous does not seem commonplace.
In what follows I…
What is Gain-of-Function Research (GoFR)?
GoFR involves scientific experiments where organisms, often viruses or bacteria, are genetically modified to acquire new abilities or "functions" that they do not naturally possess. These functions might include enhanced transmissibility, altered host range (the types of species they can infect), or increased pathogenicity (disease-causing ability). The primary aim is to better understand these organisms and the factors that contribute to their evolution, potentially improving our preparedness for pandemics.
Aims of GoFR include:
What is AI Model Red Teaming (AIMRT)?
AI model red teaming is a systematic and adversarial testing process designed to uncover vulnerabilities, biases, and unintended behaviors in advanced artificial intelligence systems. By simulating potential misuse or failure scenarios, red teaming evaluates how AI systems might perform under adversarial or unexpected conditions. This practice is increasingly vital as AI systems, particularly large language models (LLMs), gain significant societal influence and are deployed in sensitive or high-stakes contexts.
Aims of AIMRT include:
II. Risk Parallels Between Gain of Function Research and AI Model Red Teaming
A. Risk of Unintended Consequences:
In the cases of both GoFR and AIMRT, these practices are meant to elucidate understanding of entities we do not fully understand, and in some cases to trigger capabilities we do not know, or even necessarily suspect, are present in the system. Due to these inherent unknown unknowns, these practices are predisposed to result in unforeseen and unintended consequences. For example, in the case of GoFR, one could accidentally create more virulent strains of pathogens leading to lab escapes (e.g., H5N1 controversy, possibly COVID-19). Analogously as a result of AIMRT, we could see the discovery and amplification of emergent AI model capabilities (e.g., manipulation, advanced reasoning) that exceed anticipated control mechanisms.
“AI companies are already evaluating their most advanced models to identify dual-use capabilities, such as the capacity to conduct offensive cyber operations, enable the development of biological or chemical weapons, and autonomously replicate and spread. These capabilities can arise unpredictably and undetected during development and after deployment.” Source: An Early Warning System for AI-Powered Threats to National Security and Public Safety
Similar to how GoFR can create new virus capabilities, AIMRT may inadvertently unlock or exacerbate unintended behaviors in AI systems, including unsafe responses or "jailbreak" vulnerabilities. For instance, it has been demonstrated that, despite safety alignments, many AI models remain vulnerable to manipulative prompts designed to evade safety protocols.
B. Dual-Use Dilemmas:
Just as knowledge gained via GoFR could be used for bioterrorism or military applications, vulnerabilities or capabilities discovered via AIMRT in AI systems may be weaponized by malicious actors (e.g., using AI for disinformation, automated cyber attacks, creation of pathogens ).
C. Difficulty in Containment and Proliferation Risks:
GoFR presents difficulty in containing pathogenic material in secure labs; risks of leaks and accidents. Similarly, AIMRT presents challenges and ethical conundrums regarding securing sensitive AI findings; potential leaks of adversarial techniques, vulnerabilities, or emergent capabilities that could be exploited. It can be difficult, in both GoFR and AIMRT, to decide whether and when to be open and transparent with findings, and when doing so can create an info hazard or security risk.
“The nation’s leading AI labs treat security as an afterthought. Currently, they’re basically handing the key secrets for AGI to the CCP on a silver platter. Securing the AGI secrets and weights against the state-actor threat will be an immense effort, and we’re not on track.”? Source: Situational Awareness
D. Escalation and Arms Race Dynamics:
GoFR can feed into competition among states or labs to develop advanced biotechnological capabilities. AIMRT is arguably prone to more extreme competitive dynamics given the extreme economic advantages and political power to be gained by controlling the most powerful AI systems, driving an AI arms race that pushes the boundaries of safe and responsible testing practices, and incentivizes speed development and expanding capabilities above all other considerations, including safety.
III. Existing Gaps in the Policy and Regulatory Environment for AI Model Red Teaming
I wish to premise the following by acknowledging that AI systems plausibly approaching anything we might define as AGI or ASI are extremely new. Even the people most tuned into AI developments have only perked up to the possibility AGI or ASI could be a near-term phenomenon since ChatGPT made a splash with its public debut at the end of 2022. So, while there is much to be desired in terms of improvement and development on the policy and regulation front for this technology, these are subjects that famously take a long time to develop, and people working in policy who are becoming aware of the magnitude of revolutionary change (and potentially catastrophic risk) this technology is likely to bring about are doing a relatively admirable job of attempting to respond appropriately swiftly and thoroughly to this sudden, new, and rapidly evolving technology. All this to say, the following is not meant to disparage, but rather to point out gaps (mind the gap!) as opportunities for improvement.
A. Lack of Comprehensive Regulatory Frameworks:
Current AI policy and regulations are fragmented, with no cohesive strategy specifically addressing the complexities of AIMRT. Early examples of AI policy and regulations pertaining to AIMRT (e.g. US Executive Order on Safe, Secure, and Trustworthy Artificial Intelligence) call for it to be enacted as part of responsible AI model development, but do not include detailed guidance or standards for how the practice of AIMRT itself should be conducted safely and responsibly. This is to say, AIMRT is presented as a solution/tool for responsible AI development, but without always taking into account the risks inherent to the use of this tool which also must be managed responsibly.
Some of the most developed documents address only some of the risks posed by AIMRT. For instance, NIST released an Initial Public Draft of?Managing Misuse Risk for Dual-Use Foundation Models, a welcome move in the right direction, but far from complete in identifying and addressing the scope of risks posed by AI red-teaming. Neither this document nor any other produced by NIST appears to cover how to manage the risks of a highly capable AI agent going rogue. ?
领英推荐
At present, there remains an absence of international standards or agreements to guide best practices in AIMRT. For a useful, and relatively up-to-date summary of the current regulatory landscape for AIMRT, see here.
B. Insufficient Oversight Mechanisms:
There is a lack of independent oversight bodies dedicated to monitoring red teaming activities and ensuring adherence to safety protocols. What follows is weak enforcement of existing safety and ethical guidelines or practices in AI development and red teaming.
“The current environment has some parts of a functional early-warning system, such as reporting requirements for AI developers described in Executive Order 14110, and existing interagency mechanisms for information-sharing and coordination like the National Security Council and the Vulnerabilities Equities Process. However, gaps exist across the current system: There is a lack of clear intake channels and standards for capability reporting to the government outside of mandatory reporting under EO14110. Also, parts of the Executive Order that mandate reporting may be overturned in the next administration, or this specific use of the Defense Production Act (DPA) could be successfully struck down in the courts.?Various legal and operational barriers mean that premature public disclosure, or no disclosure at all, is likely to happen. This might look like an independent researcher publishing details about a dangerous offensive cyber capability online, or an AI company failing to alert appropriate authorities due to concerns about trade secret leakage or regulatory liability.?BIS intakes mandatory dual-use capability reports, but it is not tasked to be a coordinator and is not adequately resourced for that role, and information-sharing from BIS to other parts of government is limited.?There is also a lack of clear, proactive ownership of response around specific types of AI-powered threats. Unless these issues are resolved, AI-powered threats to national security and public safety are likely to arise unexpectedly without giving defenders enough lead time to prepare countermeasures.” Source: An Early Warning System for AI-Powered Threats to National Security and Public Safety
C. Inadequate Risk Assessment Protocols:
There are limited frameworks for assessing risks associated with emergent capabilities in AI systems, leading to potential blind spots in safety. Additionally, there are few existing policies for responsible disclosure of vulnerabilities discovered during red teaming.
D. Ineffective Coordination Among Stakeholders:
When I began this investigation I feared I would find completely fragmented efforts among government agencies, research institutions, and industry bodies, leading to overlapping or conflicting guidelines and an absence of collaborative platforms for sharing findings or best practices related to AIMRT. However, when I investigated I found…
“On February 8, 2024,?U.S. Secretary of Commerce Gina Raimondo announced the creation of the U.S. AI Safety Institute Consortium (AISIC). Housed under NIST, the Consortium will unite AI creators and users, academics, government and industry researchers, and civil society organizations in support of the development and deployment of safe and trustworthy artificial intelligence (AI).” Source: Artificial Intelligence Safety Institute Consortium (AISIC)
Of course, with the recent US election and forthcoming administration changeover,
“There will likely be a day one repeal of the Biden executive order on AI,” Samuel Hammond, a senior economist at the Foundation for American Innovation, told me, though he added, “what replaces it is uncertain.” The?AI Safety Institute?created under Biden, Hammond pointed out, has “broad, bipartisan support” — though it will be?Congress’s responsibility to properly authorize and fund it, something they can and should do this winter. There are reportedly drafts in Trump’s orbit of a proposed replacement executive order that will?create a “Manhattan Project” for military AI?and build industry-led agencies for model evaluation and security.” Source: Vox - AI is powerful, dangerous, and controversial. What will Donald Trump do with it?
While it is reassuring to see the rapid development of these coordinating bodies in the US, which is home to the world's leading AI companies, it would be beneficial to see similar mechanisms forming to fulfill similar mechanisms internationally, and ideally globally.
IV. Lessons from Gain of Function Research History and Policy
From 2014-2017 there was a moratorium of US funding for gain-of-function pathogen research due to concerns for public safety and security. Some of this research came back online once a sufficient regulatory regime had been established. The economic and regulatory considerations of leading AI development firms are somewhat different given that most leading AI R&D (that is publicly known about) is not reliant on government funding. However, lessons can no doubt be learned from the precautions, regulatory processes, and mechanisms that researchers conducting GoFR must undergo. Below see a summarized flow chart of such a process.
?
A. The Role of Oversight Bodies and International Collaboration:
B. Development of Risk-Tiered Frameworks:
C. Ethical and Transparency Standards:
D. Responsible Disclosure Practices:
V. Proposed Oversight and Regulatory Mechanisms for AI Model Red Teaming
VI. Challenges and Considerations in Implementing Oversight Mechanisms
A. International Cooperation and Diverging National Agendas: Difficulty in establishing unified global standards due to conflicting interests and geopolitical considerations.
B. Balancing Innovation and Safety: Ensuring regulatory mechanisms do not stifle innovation but provide adequate safety nets for high-risk activities.
C. Ensuring Compliance and Avoiding Loopholes: Addressing the risks of organizations evading regulations or finding loopholes to avoid oversight.
D. Cost and Accessibility: Mitigating the impact of compliance costs, ensuring small and developing organizations can still engage in safe AI research.
VII. Conclusion
AIMRT, though indispensable for responsible AI development and deployment, carries significant risks that are magnified by the scale and societal impact of increasingly large, complex, and inscrutable foundation models. Lessons from GoFR suggest that without robust oversight, AIMRT could in some instances exacerbate these risks rather than mitigate them. As governments increasingly recognize the need for regulation, it is imperative to build a coherent and responsive policy framework that reflects the high stakes of AI technology. Effective regulation of AIMRT can strike a balance between safety and innovation, ensuring that these powerful tools serve humanity’s best interests without exposing society to undue risks.
Research, advisory, and advocacy on ethics, policy, and governance regarding emerging technologies (expertise on A.I. and synthetic biology)
3 个月Further to my article...! This is getting scary! https://youtube.com/watch?v=2_CTNXq9fo8&si=w4jzYfj8yaQf-Cjp
Clinical Research Nurse
3 个月Congratulations, Aaron! A great and comprehensive read on the current state, challenges and risks of emerging AIMRT. Well done!
Cybersecurity Analyst 2 at 11:11 Systems | Engaged in AI/ML development, security, alignment, and governance.
3 个月Interesting
Community & Training @ BlueDot Impact
3 个月Congrats Aaron J Roberts, PhD! It was great having you on the course!