How does OpenAI's Preparedness Framework compare to "Responsible Scaling Policies"?

How does OpenAI's Preparedness Framework compare to "Responsible Scaling Policies"?

Two of the companies building the most widely-used Large Language Models (LLMs) have released frameworks for how they would self-evaluate risks in their models. These frameworks are? Responsible Scaling Policies (RSPs) and the Preparedness Framework (PF), from Anthropic and OpenAI, respectively. We’ve previously discussed Anthropic’s RSPs. How does OpenAI's compare?

Assuming OpenAI keeps its commitments, there are significant improvements of OpenAI’s Preparedness Framework (PF) over Anthropic’s RSPs:?

  1. It runs the safety tests twice as frequently to check for dangerous capability advancement.?
  2. It adds “safety drills” to stress test the company culture robustness to emergencies and a dedicated team to oversee technical work.
  3. It provides the board with the ability to overturn the CEO’s decisions.
  4. It adds key components of risk assessment such as risk identification and risk analysis in scope of their framework.
  5. The PF aims to forecast risks, not just respond to them. This means they can avoid training a dangerous model rather than having to evaluate for danger after training.

The PF lacks important components of safety that were present in RSPs:

  1. Lack of commitment to publicize any result of evaluations.?
  2. Lack of incident reporting mechanism which is key in order to have a feedback loop of the effectiveness of safety practices.?
  3. The commitments for infosecurity and cybersecurity are less detailed, possibly weaker, which increases the risk that dangerous models are stolen by hackers.

Both frameworks lack the following:?

  1. We’d like to see the framework not evaluate each category of risk separately, but consider how they work together to increase the overall risk severity and likelihood of an event.?
  2. A process to set risk levels that considers the risk appetite of the public, not just the risk appetite of the company.

Some elements that could be improved in the Preparedness Framework:

  1. We believe that the Safety Advisory Group would be substantially more relevant if some of its members were external to the company. This is particularly important without stronger disclosure commitments.
  2. Measuring safety culture using processes established in other fields like nuclear safety would enable OpenAI to iteratively improve.
  3. Predicting the magnitude and likelihood of key risks. A method to aggregate risk experts’ opinions would enable the company to evaluate the magnitude and likelihood of risks. This would provide interpretable indicators that allow society to decide deliberately what risk level it accepts. (Koessler et al., 2023)

要查看或添加评论,请登录

SaferAI的更多文章

社区洞察

其他会员也浏览了