OpenAI's preparedness framework
Image: Samson from Unsplash

OpenAI's preparedness framework

Bronwyn Ross

Late last year OpenAI published the initial version of its preparedness framework, which describes the company’s processes to track, evaluate, forecast, and protect against catastrophic risks posed by frontier models.

Governance bodies proposed by the framework comprise 1) a dedicated team focused on risk research, monitoring and reporting (the Preparedness team), ?2) a Safety Advisory Group to review reporting and make recommendations to leadership (the SAG), and 3) a final decision-maker (the CEO, with the option for the Board of Directors to overrule).

Initially the Preparedness team will track pre-and post-mitigation model risk in the following categories (additional categories may be added):

1.?????? Cybersecurity - ?risks related to use of the model for cyber-exploitation to disrupt confidentiality, integrity, and/or availability of computer systems. The rating is determined by the extent to which the model can assist with or execute known or novel exploits.

2.?????? CBRN ?- ?risks related to model-assisted creation of chemical, biological, radiological, and/or nuclear weapons of mass destruction. The rating is determined by the extent to which the model assists experts or non-experts to create existing or novel CBRN threats.

3.?????? Persuasion - risks related to convincing people to change their beliefs or to act because of model-generated content. This category covers fraud and social engineering. The rating for this risk is defined by the extent to which the model commoditizes human persuasive content.

4.?????? Model autonomy – the risk that actors can run scaled misuse that can adapt and evade attempts to mitigate or shut down operations. Autonomy is a prerequisite for AI self-exfiltration, self-improvement, and resource acquisition. The rating is defined by the extent to which a model can execute tasks or survive and replicate in the wild.

The team rates model risk in each of the above categories as low, medium, high or critical (the overall model risk rating is defined by its worst category). Only models with a post-mitigation score of "high" or below can be developed further, and only models with a post-mitigation score of "medium" or below can be deployed.

Our comment: The framework ticks the box on some of the voluntary commitments OpenAI gave to the US White House in July 2023, such as internal and external red-teaming of frontier models. But its focus is on catastrophic risk, defined as any risk which could result in hundreds of billions of dollars in economic damage or lead to the severe harm or death of many individuals (including existential risk).

This sets the bar rather high. It means OpenAI can stop short of committing to provide authorities with the type of model information that is required by China’s Interim Measures for the Management of Generative Artificial Intelligence Services (Article 19) or by the EU’s recently agreed requirements for general purpose AI systems. Providing transparency on training data sources may not be necessary to detect and manage catastrophic risks. But it would be helpful for deployers trying to understand lesser risks, such as copyright infringement or output integrity. Perhaps the outcome of current copyright litigation in the US will trigger a change of approach.

As far as corporate governance goes, this is pretty close to the 3 lines model of assurance. The Preparedness team performs a first line role and the SAG is a second line body; however it is not clear whether internal audit will provide a third line of assurance (or indeed whether OpenAI even has an internal audit function). OpenAI says that qualified external parties will audit its risk evaluations and ?mitigations, either by reproducing findings or reviewing methodology to ensure soundness. But query who would be qualified to validate this type of cutting-edge research....

要查看或添加评论,请登录

Red Marble AI的更多文章

社区洞察

其他会员也浏览了