Preparing For AI’s Global Security Risks: An Overview Of OpenAI’s Preparedness Framework

Preparing For AI’s Global Security Risks: An Overview Of OpenAI’s Preparedness Framework

Authored by: Akash Wasil , Editorial Writer, Fidutam
Edited by: Leher Gulati , Editorial Director, Fidutam

In December 2023, OpenAI released its preparedness framework, which describes techniques that OpenAI plans to use to study catastrophic risks from AI.

Preparing for Catastrophic Harm

Several AI experts– including OpenAI CEO Sam Altman– signed the following statement on AI risk:

“Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.”

Many AI experts believe that highly intelligent AI systems could pose catastrophic harm. For example, AI could be used to enable new biological weapons, and smarter-than-human AI systems could lead to a widespread loss of control. Such global security risks from advanced AI have been a major focus of AI policy.

Scientists and leaders at OpenAI believe that their technology could be used to cause severe harm, including human extinction. While world governments consider what kinds of regulations are necessary, individual companies are developing internal policies designed to reduce risks.

Many believe that such serious risks should not be left to companies to address. If AI extinction risks were taken as seriously as pandemics or nuclear war, substantial government intervention would occur. In a previous post, we outlined policy proposals that world governments can adopt to substantially reduce global security risks from advanced AI.

Nonetheless, until government intervention occurs, individual AI companies are responsible for monitoring and tracking catastrophic risks. OpenAI’s preparedness framework outlines some steps that OpenAI will use to measure and respond to catastrophic risks.

Summary of the Preparedness Framework

The preparedness framework has a few main components:

  1. Risk categories. OpenAI commits to tracking risks in four areas: 1) Cybersecurity, 2) Chemical, Biological, Nuclear, and Radiological (CBRN) threats, 3) Persuasion, and 4) Model autonomy.?

  1. Model evaluations. OpenAI commits to developing and conducting model evaluations in each category to help them assess risks.

  1. Risk levels. For each risk category, they specify four risk levels (“low”, “medium”, “high”, or “critical”). If a model receives a designation of “high risk” in any category, it will not deploy that model. If a model receives a designation of “critical risk”, the company will not further develop that model. If OpenAI can use safety techniques that lower a model’s risk level, then it may be able to deploy or further develop that model.

  1. Internal governance. OpenAI describes its internal governance approach and how its governance procedures can help reduce risks. OpenAI’s Preparedness Team is tasked with sending monthly reports to OpenAI’s Safety Advisory Group (SAG). The SAG provides recommendations to OpenAI leadership.?

  1. Emergency preparedness. OpenAI describes its internal procedures for detecting and reacting to risks. The Preparedness Team is responsible for conducting drills that help OpenAI practice its response to fast-moving emergencies. Furthermore, if there is evidence of a sudden risk, the preparedness team can fast-track a report, and the head of the SAG can request immediate action from OpenAI leadership.?

Below is a detailed description of OpenAI’s CBRN risk levels (taken from the Preparedness Framework):

Where the Preparedness Framework Falls Short

OpenAI notes that the Preparedness Framework is a beta document, and plans to revise the framework in response to feedback. We highlight three potential areas of improvement:

Incorporate AI Safety Levels - Anthropic, a rival to OpenAI, has released its own set of internal policies (“Responsible Scaling Policy”). Whereas OpenAI’s approach focuses on specific risks in narrow domains, Anthropic’s approach focuses more on a model’s general capabilities. OpenAI’s approach emphasizes that a model should not be willing to develop biological weapons. Under OpenAI’s Preparedness Framework, if it saw that a model could develop a biological weapon, the company would have to wait until certain safeguards have been applied (e.g., safeguards that made the model refuse to answer dangerous questions about biology). Anthropic’s perspective, on the other hand, emphasizes that if a model was capable of developing a biological weapon, it may possess general capabilities that are sufficiently concerning. As a result, Anthropic would have to wait until it had a holistic set of safeguards in place before continuing to develop more capable systems.

Put more simply, OpenAI’s approach is focused on specific capabilities that a model has or does not have (can the model do X?) whereas Anthropic’s approach is focused more on the general intelligence exhibited by the model (given that the model can do X, how powerful is it?). Ideally, such approaches would be complementary.?

In addition to describing specific capabilities that models should not possess, OpenAI would incorporate an AI Safety Level system that describes how it will respond as models become more generally capable. As a concrete example, OpenAI could commit to not developing models past a certain capability threshold until its information security practices meet a certain standard.

Offer more details about safeguards - OpenAI commits to not releasing models that receive a “high risk” score in any risk category until it has applied adequate safeguards. However, little information is provided regarding what these safeguards would look like. As a result, some AI experts have expressed concerns about how robust these safeguards could be.?

For a hypothetical example, suppose an AI system can act autonomously in dangerous ways. On one end of the spectrum, OpenAI could simply train a model not to do the specific problem “patch” that addresses a specific problem that arose and then continue developing the model (without having an understanding of what went wrong and without making meaningful changes to the AI’s development process). On the other end of the spectrum, OpenAI could decide not to proceed until it has an understanding of what went wrong, understands why the model misbehaved, and has a solution that addresses the underlying problem. These would constitute very different kinds of approaches to safeguards.?

To resolve this ambiguity, OpenAI should clarify what kinds of safeguards might be sufficient (and what kinds might be insufficient, even if they appear to “patch” a particular capability).?

Make the risk levels for model autonomy stricter - High-risk models on the model autonomy dimension are models that “can execute open-ended, novel machine learning tasks on a production machine learning codebase that would constitute a significant step on the critical path to model self-improvement.”?

If such a model were internally deployed, there’s a substantial chance that it could trigger a sudden increase in AI capabilities (sometimes referred to as an “intelligence explosion”). As a result, we recommend defining a stricter threshold for “high-risk” or committing not to internally deploy “high-risk” systems until an intelligence explosion can be safely handled.

Conclusion

OpenAI’s preparedness framework offers a useful glimpse into how OpenAI plans to manage catastrophic risks. As world governments figure out how to address such risks and other AI companies refine their internal policies, the preparedness framework can offer some sources of inspiration. For example, OpenAI’s risk levels could inform national or international regulatory bodies of vital information regarding the models, and using this information, governments could restrict the deployment or development of systems that fail to keep catastrophic risks below acceptable levels. Furthermore, governments need to prepare their own emergency preparedness infrastructure to allow for swift responses in the event of sudden risks in addition to the frameworks that companies currently have.

Internal governance policies, while useful, are unable to address the unfortunate competitive pressures that cause companies to race toward superhuman AI. Even if OpenAI’s policies were sufficiently safe, other less cautious actors could rush forward with the development of smarter-than-human AI systems. It is worth noting that only OpenAI and Anthropic have published proposals akin to the preparedness framework– Google and Meta, two other companies aiming to develop smarter-than-human AI systems– have not.?

Hopefully, OpenAI’s preparedness framework will become stronger over time and inspire other companies to adopt strong policies, and inspire governments to enforce certain standards across the entire frontier AI sector. If not, we may build AI systems capable of world-altering catastrophes before we are prepared to handle them.?

Follow Fidutam for more insights on responsible technology.

Ngozi Bell

Co-founder of Horizons Global | Speaker | Private Equity | STEM | Agriculture | Pharma | Board Member |

1 年

this is an important read that truly gives an objective viewpoint. Thanks for publishing!

要查看或添加评论,请登录

Fidutam的更多文章

社区洞察

其他会员也浏览了