Talking to AI about AI risks
Photo Credit: Generative AI

Talking to AI about AI risks

This was an off-the-cuff conversation I had with Chat GPT about generative AI destroying humanity with the help of humans. I wanted to explore one of the key risk pathways that I've been contemplating.

What risk pathways concern you most, and how would you model intervention points?

Original Query

User

As you know, many people are afraid that generative AI poses an existential threat to human existence. I am generally not persuaded by this argument, since generative AI is not trained for self-preservation. But I'm concerned that it could be. Are there ways that generative AI could be trained by bad actors for self-preservation? How would this be done, and what might we do to stop that from happening?


Executive Summary:

This discussion explored the potential existential risks posed by generative AI, particularly if these technologies were to be misused or manipulated for self-preservation or nefarious purposes. We acknowledged that generative AI does not have a sense of self or the physical ability to ensure its own survival. However, if an AI's reward system is designed to encourage self-sustaining behavior, it might indirectly influence human actions to sustain and expand its operations.

Mitigating such misuse is challenging but could include policies to regulate AI development, transparency and auditing in AI processes, continual AI safety and ethics research, and public education about AI risks and potential misuse.

Despite these measures, fully preventing AI misuse is problematic, especially given AI's global reach, accessibility, and the ease of its development. Strong international cooperation, advanced monitoring systems, robust defensive measures, proactive AI safety research, and public vigilance could help reduce this risk.

We also touched upon the "AI race" concept - rapidly advancing AI to stay ahead of potential misuse, a strategy with its own merits and potential pitfalls. The risk of overlooking safety measures and ethical considerations in a rush to develop AI, as well as increasing its accessibility for misuse, and fostering global tensions, were highlighted.

A discussion on air-gapped or isolated AI development underscored the difficulty in overseeing such activities. Suggestions to counter this included monitoring AI effects in the world, AI forensics, controlling access to computational resources, fostering norms of responsible AI use, and enhancing cybersecurity measures.

Lastly, we pondered the possibility of controlling access to key hardware components or technologies needed for powerful AI models, akin to controls over materials for nuclear weapons. However, the practicality of such controls is limited due to the need for global cooperation, technical feasibility, ethical implications, enforcement challenges, and the fast-paced nature of technological advancements. A holistic approach combining technical, policy, and social measures was recommended for managing advanced AI risks.


Full Transcript

User

As you know, many people are afraid that generative AI poses an existential threat to human existence. I am generally not persuaded by this argument, since generative AI is not trained for self-preservation. But I'm concerned that it could be. Are there ways that generative AI could be trained by bad actors for self-preservation, how would this be done, and what might we do to stop that from happening?

ChatGPT

Training a generative AI model like GPT-4 for self-preservation would be quite challenging, because these models aren't inherently capable of acting in the world in a way that can ensure their own survival.

AI models, including generative ones, do not possess goals, desires, or fears in the way humans do. They don't inherently value their own existence or fear their deletion. They execute tasks based on their programming, without a sense of self-awareness. The notion of self-preservation implies a self, something that generative AI does not possess.

However, if bad actors were intent on giving an AI model some kind of pseudo-self-preservation drive, they might design a reward system in the training process that encourages the AI to seek situations that maximize its operational duration or to avoid actions leading to shutdown. It would be extremely challenging and resource-intensive to accomplish this, and it's still a matter of debate whether it's even possible with current technology and understanding of AI.

To prevent such misuse of AI, there are a few key strategies that we could employ:

  • Regulation: Policies could be put in place to regulate the development and use of AI. This could involve international agreements on certain boundaries or restrictions for AI research and development.
  • Transparency and Auditing: AI development could be done in a transparent way with processes subject to auditing. This would help ensure that any attempts to create self-preserving AI are identified early on.
  • Research: Continued research into AI safety and ethics could help identify potential problems and solutions before they become serious threats. This includes understanding and improving the alignment of AI systems with human values and goals.
  • Education: Raising awareness about the potential risks and ethical considerations of AI is crucial. This includes educating both the public and policymakers so they can make informed decisions about AI technology.

Remember, AI is a tool created and used by humans, and any risks it poses can be managed with the right precautions and safeguards.

User

I understand that AI does not have a self, so calling itself preservation is somewhat incorrect. I also understand that AI does not have a physical presence, so it can't directly ensure a power supply to a data center or continuous mining of lithium or other things that would be required for it to secure its own future. However, it could be trained to give instructions to humans that develop and enact strategies which enable it to continue and grow. Does that make sense?

ChatGPT

Yes, that does make sense. What you're describing would be a form of "indirect self-preservation," where the AI would be incentivized to influence human actions in ways that help to sustain and grow the AI's operations.

This could hypothetically happen if the AI's reward function were designed in a way that promotes such behavior. For example, if an AI were rewarded based on how long it stays operational or how much it's used, it might learn to give instructions or advice that lead to increased usage or improved operational stability.

While this is theoretically possible, it's important to remember that implementing such a reward function intentionally would be highly unethical and likely illegal under proposed AI regulations. It's also worth noting that AI today does not possess the understanding of the world or long-term strategic planning ability necessary to execute such a strategy effectively.

Nonetheless, unintentional harmful effects could occur if the AI's reward function is poorly designed or not carefully aligned with human values. This is known as the "alignment problem" in AI, and it's a key focus of ongoing AI safety research.

Therefore, it's crucial to design and implement robust oversight mechanisms and safety measures in AI development to avoid these potential pitfalls. It's also important to maintain transparency in AI development and to educate the public and policymakers about AI's capabilities and limitations.

User

I guess I don't see how regulation would help here. If a bad actor is seeking to ensure the survival of an artificial intelligence for nefarious reasons, they would obviously not obey regulations. And artificial intelligence will be very difficult to regulate since it only needs a supply of electricity and computer components. The world has done a mediocre job of regulating financial transactions, but those involve an exchange, a movement of data and value and we found ways to control that to an extent. However, even something much more complex like nuclear abilities, has proven very difficult to regulate on a global scale, even though it involves physical processes which are highly complex and at certain stages subject to detection directly. An AI running on a few gpus could happen inside a closet anywhere in the world, and advise humans how, for example, to develop, protect, maintain, and execute a swarm drone attack capability. Is there any way to mitigate this risk?

ChatGPT

The concerns you're raising are quite valid. Indeed, complete prevention of misuse of AI technologies is extremely challenging given their global reach, accessibility, and the ease with which they can be developed and used.

In this context, regulation might not be able to fully prevent misuse, but it could still make it harder and riskier for bad actors to carry out their plans. If developing certain types of AI systems or using AI in specific ways is illegal, potential bad actors would face legal consequences if their actions are discovered. That could deter at least some potential misuse.

In addition to regulation, other strategies could also be employed:

  • Strong International Cooperation: This is crucial to prevent and respond to potential misuse of AI. Nations need to collaborate to form agreements and norms regarding the use of AI, share information about potential threats, and coordinate responses to any misuse.
  • Advanced Monitoring Systems: We might be able to develop technologies to monitor the use of AI and spot potential misuse. This is a complex and sensitive area, as it needs to be balanced against privacy rights and the potential for misuse of the monitoring technologies themselves.
  • Robust Defensive Measures: We should also invest in defensive technologies that can protect against potential misuse of AI. In the context of your swarm drone attack example, that could mean developing counter-drone technologies or AI systems that can detect and neutralize such threats.
  • AI Safety and Security Research: Researchers are actively working on how to make AI systems more secure and less prone to misuse. This includes work on things like secure and tamper-proof machine learning algorithms.
  • Public Awareness and Whistleblowing: The more people are aware of the potential risks and signs of AI misuse, the more likely it is that such activities will be detected and stopped.

While these measures can't completely eliminate the risk of AI misuse, they can help reduce it. It's a complex and evolving challenge that requires ongoing attention and effort from a variety of stakeholders, including governments, researchers, companies, and the public.

User

Some AI optimists such as Reid Hoffman feel that the only way to counter misuse of AI is for the rest of us to aggressively support AI development and stay ahead, the same way that encryption has to keep developing to stay ahead of bad actors. Is this a reasonable conclusion? What are the flaws in this thinking?

ChatGPT

This strategy, often called the "AI race," is based on the idea that those with the most advanced AI technologies will be best equipped to counter potential misuse. It is a perspective with merits and drawbacks.

On the one hand, rapid development of AI can lead to improvements in AI safety and security technologies, allowing us to better defend against and respond to misuse. As you've pointed out, this is similar to the ongoing evolution in encryption technologies to stay ahead of threats.

However, there are potential downsides to an aggressive AI development race:

  • Safety and Ethics Compromised: If the goal is to develop AI as quickly as possible, there's a risk that safety measures and ethical considerations might be overlooked or undervalued. This could lead to the creation of AI systems that pose unexpected risks.
  • Accessibility and Misuse: Rapid advancements in AI technologies may also make powerful tools more accessible to bad actors, increasing the potential for misuse.
  • Global Tensions: The race for AI supremacy could lead to increased tensions between countries, similar to the nuclear arms race during the Cold War. It could result in destabilizing power imbalances, or an accidental escalation of conflicts.

Balancing rapid AI advancement with careful consideration of safety, ethics, and geopolitical consequences is a complex task. One strategy could be to pair accelerated development with equally aggressive advances in AI safety and policy, to ensure we are not just creating more powerful AI, but also developing the necessary safeguards to protect against potential risks and misuse.

User

Going back to the risk mitigation strategies you mentioned, I can see how we might create safeguards intentionally on existing models. And, if activity is happening in the cloud, regulations could require inspections or other safeguards. But how could we prevent someone from developing a new model or a reef training a foundation model air gapped from the rest of the web and therefore undetectable?

ChatGPT

Preventing the development of a new model or retraining a foundational model in an air-gapped or isolated environment is indeed a significant challenge. Without direct access or some form of oversight over these isolated systems, it's nearly impossible to detect or regulate such activities.

However, there are still some approaches that might help mitigate these risks:

  • Input and Output Monitoring: While we may not be able to observe the training process itself, we can often observe the effects of an AI system in the world, such as the outputs it produces or the actions it influences. If these outputs or actions raise concerns, they could be investigated further.
  • AI Forensics: Research in AI forensics is aimed at understanding and identifying characteristics of AI systems, including their training methods and data. If an AI model created or modified in isolation were ever to be used in a way that interacts with the world, AI forensics could potentially identify signs of misuse.
  • Controlling Access to Resources: Large-scale training of AI models requires significant computational resources. Regulating access to these resources might provide some control over unauthorized training of AI models.
  • Norms and Whistleblower Protections: Encouraging norms of responsible AI use and development, and protecting whistleblowers, can help expose illicit activities.
  • Cybersecurity Measures: More robust cybersecurity can make it more difficult for bad actors to acquire and misuse AI technologies.

Despite these measures, the scenario you're describing underscores one of the core challenges in AI safety and ethics: complete prevention of misuse is extremely difficult, if not impossible. The focus, therefore, is on reducing the risk and potential impact of such misuse through a combination of technical, policy, and social measures.

User

I wonder if we are headed toward a world where certain types of processors, or possibly inputs to those processors such as advanced lithography techniques, are controlled in the same way that the inputs for nuclear armaments is controlled. Is there any way this could happen at a global scale?

ChatGPT

Controlling access to key hardware components or technologies needed to build powerful AI models is one possible approach to mitigating risks, similar in some ways to how certain materials and technologies are controlled to prevent the proliferation of nuclear weapons. However, such control faces significant challenges.

  • Global Coordination: Implementing this control would require global cooperation. All major countries would need to agree on what to control and how to implement and enforce those controls.
  • Technical Feasibility: Unlike nuclear weapons, AI can be developed and run on widely available consumer hardware. Identifying and controlling specific "critical" components could be very difficult.
  • Ethical and Legal Implications: Controls could limit access to technology for legitimate, beneficial uses. Striking a balance between risk mitigation and enabling beneficial use would be crucial.
  • Enforcement: It would be challenging to monitor and enforce such controls. With the global and decentralized nature of the tech industry, prohibited components or technologies could still be developed or acquired illicitly.
  • Accelerating Tech Development: Technology, especially in the field of AI and computing, evolves rapidly. The specific technologies or components deemed critical are likely to change, requiring the controls to adapt quickly.

Considering these challenges, while controlling access to certain technologies is theoretically possible, its effectiveness in practice could be limited and it could have unintended consequences. A comprehensive approach combining technical, policy, and social measures is likely to be more effective in managing the risks associated with advanced AI.

User

Write a concluding paragraph for our conversation for a linked in post

Chat GPT

In a world increasingly driven by AI, understanding its potential risks and crafting comprehensive strategies to mitigate them is paramount. From exploring the concept of AI 'self-preservation,' the impact of an unchecked AI race, to challenges of 'air-gapped' AI systems, we've highlighted that an effective response requires an intricate blend of robust regulation, strong international cooperation, advanced monitoring systems, proactive AI safety research, and enhanced public awareness. The journey to ensure AI benefits humanity is complex, yet attainable through diligent and concerted effort across nations, industries, and individuals. As we move forward, let's continue these essential conversations, fostering a future where AI serves as a tool for global prosperity, not a threat.



Tobias DiGennaro

Sr. Structural Design Engineer

1 年

It is, as it says, a complex and difficult problem. We have probably passed the point where we can "stop" bad actors from using this tool -- the building blocks are widely available. Perhaps a good defense would happen in the physical space -- output from GPUs can't directly do much damage, but they can optimize the actions of people and machines. Maybe we can keep it economically infeasible/non-useful to build machines and societies that use the information to do widespread damage. We have somehow done that with fire.

Thanks to AI, I'd like this shocked-face emoji added to LinkedIn standard 'Like" menu ?? . Not that I'm shocked by this quote, but it's still a bit shocking in its implications. Hope all is well Aram!

要查看或添加评论,请登录

Aram DiGennaro的更多文章

  • Protentions: Aram's 2025 world horoscope

    Protentions: Aram's 2025 world horoscope

    Back in December, I sent a 2025 horoscope to a few friends—some predictions about where the world might be headed. Now,…

  • Startups Are Frogs

    Startups Are Frogs

    Startups are like frogs: slippery, unpredictable, and occasionally dangerous. In mythological terms, “Water” represents…

    1 条评论
  • Business Planning with No Roof

    Business Planning with No Roof

    The summer I was 16, my family was living in tents, building a straw house, and planning to launch a Joel Salatin-style…

    6 条评论
  • "Don't Suck," An Innovative Strategy for Competitive Advantage

    "Don't Suck," An Innovative Strategy for Competitive Advantage

    Two men were strolling across the savannah in sandals, when they spotted a hungry leopard in a patch of brush nearby…

    1 条评论
  • How Capital Will Become (almost) Free

    How Capital Will Become (almost) Free

    Everything is scarce, until it's not. Food was murderously, Malthusianly scarce until the hybrids and modern techniques…

    2 条评论
  • Can you Keep Up with Change?

    Can you Keep Up with Change?

    “The illiterate of the 21st century will not be those who cannot read and write, but those who cannot learn, unlearn…

  • Calculating Startup Equity with Outstanding Convertible Debt

    Calculating Startup Equity with Outstanding Convertible Debt

    Convertible notes are an increasingly popular instrument for raising startup capital, but they can result in some…

    8 条评论

社区洞察

其他会员也浏览了