登录查看更多内容

OpenAI's preparedness framework

Red Marble AI

Software development powered by AI that enhances human performance and workplace productivity

发布日期: 2024年1月19日

Bronwyn Ross

Late last year OpenAI published the initial version of its preparedness framework, which describes the company’s processes to track, evaluate, forecast, and protect against catastrophic risks posed by frontier models.

Governance bodies proposed by the framework comprise 1) a dedicated team focused on risk research, monitoring and reporting (the Preparedness team), ?2) a Safety Advisory Group to review reporting and make recommendations to leadership (the SAG), and 3) a final decision-maker (the CEO, with the option for the Board of Directors to overrule).

Initially the Preparedness team will track pre-and post-mitigation model risk in the following categories (additional categories may be added):

1.?????? Cybersecurity - ?risks related to use of the model for cyber-exploitation to disrupt confidentiality, integrity, and/or availability of computer systems. The rating is determined by the extent to which the model can assist with or execute known or novel exploits.

2.?????? CBRN ?- ?risks related to model-assisted creation of chemical, biological, radiological, and/or nuclear weapons of mass destruction. The rating is determined by the extent to which the model assists experts or non-experts to create existing or novel CBRN threats.

领英推荐

The TechnologIST: Meet the ISTers who make our work…

Institute for Security and Technology (IST) 1 年前

Through ISMG’s Lens: Black Hat USA 2023: Highlights |…

Information Security Media Group (ISMG) 1 年前

Building an AI-Cyber Dream Team: Forging the Ultimate…

Dream 11 个月前

3.?????? Persuasion - risks related to convincing people to change their beliefs or to act because of model-generated content. This category covers fraud and social engineering. The rating for this risk is defined by the extent to which the model commoditizes human persuasive content.

4.?????? Model autonomy – the risk that actors can run scaled misuse that can adapt and evade attempts to mitigate or shut down operations. Autonomy is a prerequisite for AI self-exfiltration, self-improvement, and resource acquisition. The rating is defined by the extent to which a model can execute tasks or survive and replicate in the wild.

The team rates model risk in each of the above categories as low, medium, high or critical (the overall model risk rating is defined by its worst category). Only models with a post-mitigation score of "high" or below can be developed further, and only models with a post-mitigation score of "medium" or below can be deployed.

Our comment: The framework ticks the box on some of the voluntary commitments OpenAI gave to the US White House in July 2023, such as internal and external red-teaming of frontier models. But its focus is on catastrophic risk, defined as any risk which could result in hundreds of billions of dollars in economic damage or lead to the severe harm or death of many individuals (including existential risk).

This sets the bar rather high. It means OpenAI can stop short of committing to provide authorities with the type of model information that is required by China’s Interim Measures for the Management of Generative Artificial Intelligence Services (Article 19) or by the EU’s recently agreed requirements for general purpose AI systems. Providing transparency on training data sources may not be necessary to detect and manage catastrophic risks. But it would be helpful for deployers trying to understand lesser risks, such as copyright infringement or output integrity. Perhaps the outcome of current copyright litigation in the US will trigger a change of approach.

As far as corporate governance goes, this is pretty close to the 3 lines model of assurance. The Preparedness team performs a first line role and the SAG is a second line body; however it is not clear whether internal audit will provide a third line of assurance (or indeed whether OpenAI even has an internal audit function). OpenAI says that qualified external parties will audit its risk evaluations and ?mitigations, either by reproducing findings or reviewing methodology to ensure soundness. But query who would be qualified to validate this type of cutting-edge research....

OpenAI's preparedness framework

Red Marble AI

Software development powered by AI that enhances human performance and workplace productivity

领英推荐

Red Marble AI的更多文章

社区洞察

其他会员也浏览了

Key Takeaways from the National Security Memorandum on AI

AI Security Insider — May 2024

The Wrap: GIDE 12 Spotlights CJADC2; State Speeding AI Work; DoD Acquisition Rules Revamp

The Wrap: NDS Jolts DoD Strategy; NTIA Weighting AI Risk; AI for Cyber – Not so Fast

AQ Newsflash: April 2024 Newsletter

AI and Cybersecurity Leading the Way

Cognitive Warfare and Cyber Influence Operations

Embracing the Future: Artificial Intelligence (AI) and Machine Learning (ML) in Cybersecurity

UK, US, and Canada Join Forces on Cybersecurity and AI Research

September Newsletter

领英推荐

Red Marble AI的更多文章

AI regulatory update - Australia

Risk assessments for generative AI

AI governance and risk management

AI regulatory update - Australia

AI regulatory update - China

AI Regulatory update - Singapore

AI Regulatory Update - UK

Addressing Bias and Discrimination in AI Systems

AI regulatory update - US

Generative Models and Creativity

社区洞察

其他会员也浏览了

Key Takeaways from the National Security Memorandum on AI

AI Security Insider — May 2024

The Wrap: GIDE 12 Spotlights CJADC2; State Speeding AI Work; DoD Acquisition Rules Revamp

The Wrap: NDS Jolts DoD Strategy; NTIA Weighting AI Risk; AI for Cyber – Not so Fast

AQ Newsflash: April 2024 Newsletter

AI and Cybersecurity Leading the Way

Cognitive Warfare and Cyber Influence Operations

Embracing the Future: Artificial Intelligence (AI) and Machine Learning (ML) in Cybersecurity

UK, US, and Canada Join Forces on Cybersecurity and AI Research

September Newsletter