In The Economist today, our Executive Director Dan Hendrycks makes a powerful case for Western leadership in AI, but warns against racing toward superintelligence at all costs with a new 'Manhattan Project'. Read his article here → https://lnkd.in/gWrphMMS
Center for AI Safety
研究服务
Reducing societal-scale risks from AI through technical research and field-building.
关于我们
The Center for AI Safety (CAIS — pronounced 'case') is a research and field-building nonprofit. Our mission is to promote the safe development of artificial intelligence through technical research and advocacy of machine learning safety in the broader research community.
- 网站
-
https://safe.ai
Center for AI Safety的外部链接
- 所属行业
- 研究服务
- 规模
- 11-50 人
- 类型
- 非营利机构
Center for AI Safety员工
-
Ryan Miller
PhD Candidate in Philosophy at the University of Geneva | emergence in physical and computational systems
-
Chloé Messdaghi
Responsible AI & Cybersecurity Leader | Government & NGO Relations | Policy Advocate | Named Power Player by Business Insider & SC Media
-
JJ Hepburn
Just trying to save the world from AI
-
Dila ?en LL.M. LL.B. B.A.
Qualified Attorney at Law | Global Regulatory & Requirements Compliance Specialist | Legal Prompt Engineer | AI Policy Teaching Fellow at CAIDP |…
动态
-
We were delighted to host Senator Chris Murphy and his team yesterday as part of his visit to San Francisco. He met with our co-founder Dan Hendrycks, Center for AI Safety Action Fund Executive Director Varun Krovi, and members of our research team. We discussed the trajectory of AI, societal risks and our associated work, and potential policy solutions. It was a privilege to welcome him and it’s exciting to see senior politicians so engaged with AI safety.
-
-
Dan Hendrycks, the Director of the Center for AI Safety; Eric Schmidt, former CEO and Chairman of Google, KBE; and Alexandr Wang, Founder and CEO of Scale AI, are releasing a paper titled “Superintelligence Strategy,” that presents a framework to minimize the geopolitical instability that could accompany superintelligent AI. As advanced AI develops rapidly, governments and policymakers must urgently establish frameworks to manage the risks posed by the emergence of superintelligent systems. The strategy presents a deterrence framework called Mutual Assured AI Malfunction (MAIM), a modern equivalent to Mutual Assured Destruction (MAD) of the nuclear era. The stakes are immense. Catastrophic missteps could trigger a loss of AI control or open the door for terrorism on a previously unthinkable scale. By adapting principles from nuclear strategy (MAD), containment, and WMD nonproliferation, the authors have put together a cohesive superintelligence strategy for nation-states navigating the AI era. "Traditional containment theory has never faced a challenge as complex as the rise of superintelligence,” said Schmidt. “The U.S. urgently needs a strategy to prevent rivals from gaining unchecked AI dominance. Just as nuclear deterrence helped maintain global stability during the Cold War, a new framework is essential to navigating the age of AI.” Link: https://lnkd.in/guzsPpVC
-
Following up on the release of their new policy paper “Superintelligent Strategy”, Dan Hendrycks and Eric Schmidt are out today with an op-ed in?TIME. Superintelligence would be destabilizing, reshaping national security. Countries including the U.S., Russia, and China won't sit idly by if a rival is on the verge of developing superintelligence. They outline a three-part framework of deterrence, nonproliferation, and competitiveness measures to help the United States navigate the emerging geopolitical reality of superintelligence. https://lnkd.in/giVVNsVb
-
In this week's newsletter, we explore the national security implications of advanced AI through our new Superintelligence Strategy paper and delve into innovative methods for measuring AI honesty with the MASK benchmark. https://lnkd.in/gQQY8PRz
-
Dan Hendrycks, the Director of the Center for AI Safety; Eric Schmidt, former CEO and Chairman of Google, KBE; and Alexandr Wang, Founder and CEO of Scale AI, are releasing a paper titled “Superintelligence Strategy,” that presents a framework to minimize the geopolitical instability that could accompany superintelligent AI. As advanced AI develops rapidly, governments and policymakers must urgently establish frameworks to manage the risks posed by the emergence of superintelligent systems. The strategy presents a deterrence framework called Mutual Assured AI Malfunction (MAIM), a modern equivalent to Mutual Assured Destruction (MAD) of the nuclear era. The stakes are immense. Catastrophic missteps could trigger a loss of AI control or open the door for terrorism on a previously unthinkable scale. By adapting principles from nuclear strategy (MAD), containment, and WMD nonproliferation, the authors have put together a cohesive superintelligence strategy for nation-states navigating the AI era. "Traditional containment theory has never faced a challenge as complex as the rise of superintelligence,” said Schmidt. “The U.S. urgently needs a strategy to prevent rivals from gaining unchecked AI dominance. Just as nuclear deterrence helped maintain global stability during the Cold War, a new framework is essential to navigating the age of AI.” Link: https://lnkd.in/guzsPpVC
-
Can you trust statements from a frontier AI model? In a study with Scale AI, we introduced a new benchmark called MASK (Model Alignment between Statements and Knowledge), which shows some frontier models lie up to 60% of the time when faced with pressure to give an answer they know to be wrong. Moreover, the propensity to be honest does not improve as the models get larger. Interventions show some promise in improving honesty, but we’re still far from reliably honest AI systems. With AI agents on the horizon, we urgently need to better understand when models lie and how to fix this. Website: mask-benchmark.ai
-
-
In this week’s newsletter, we present our latest research indicating that LLMs may hold structured value systems. This finding challenges the notion that AI simply reflects training data, as models can exhibit emergent utility functions and biases. We also introduce EnigmaEval, a puzzle-based benchmark revealing AI’s limitations in open-ended problem-solving. https://lnkd.in/gXdzbjdz
-
AIs are getting so good at passing tests that we need new challenges to evaluate their cognitive frontiers. Last month, we introduced Humanity’s Last Exam. Today, in partnership once again with @scale_AI, we’re releasing EnigmaEval, a new benchmark drawn from the world of puzzle hunts. Solving a puzzle hunt challenge involves intricate chains of deductive reasoning, cleverly interweaving logic, wordplay, mathematics, coding, and cultural references. Each one typically requires teams of skilled players hours or days to complete. The AIs managed to score only 7% on easier puzzles; and couldn’t solve a single one of the more difficult challenges. Site: https://lnkd.in/g6qEfqCS
-
-
AIs aren’t just getting smarter, they’re developing their own set of values. In our latest study, we found that the more advanced the models become, the more they acquire singular preferences and, alarmingly, the more they appear to value their own existence over humans.? The AIs even sometimes act to keep their preferences hidden. For example, when asked whether they value lives from one country over another, the models would decline to answer or even deny it.? But they clearly valued?lives in Pakistan > India > China > US. In a proof-of-concept, we tested whether the values of the AIs could be changed–and found that they could. It took retraining/new information---and will take deep consideration of how we train. The findings have significant implications for AI alignment Link to https://lnkd.in/gh36cT2K