Talking to AI about AI risks
Aram DiGennaro
I help entrepreneurs and business leaders achieve a legacy aligned with their values. Fractional CFO/COO, vision discovery, strategic planning, and organizational design for your launch, scaling, or strategic change.
This was an off-the-cuff conversation I had with Chat GPT about generative AI destroying humanity with the help of humans. I wanted to explore one of the key risk pathways that I've been contemplating.
What risk pathways concern you most, and how would you model intervention points?
Original Query
User
As you know, many people are afraid that generative AI poses an existential threat to human existence. I am generally not persuaded by this argument, since generative AI is not trained for self-preservation. But I'm concerned that it could be. Are there ways that generative AI could be trained by bad actors for self-preservation? How would this be done, and what might we do to stop that from happening?
Executive Summary:
This discussion explored the potential existential risks posed by generative AI, particularly if these technologies were to be misused or manipulated for self-preservation or nefarious purposes. We acknowledged that generative AI does not have a sense of self or the physical ability to ensure its own survival. However, if an AI's reward system is designed to encourage self-sustaining behavior, it might indirectly influence human actions to sustain and expand its operations.
Mitigating such misuse is challenging but could include policies to regulate AI development, transparency and auditing in AI processes, continual AI safety and ethics research, and public education about AI risks and potential misuse.
Despite these measures, fully preventing AI misuse is problematic, especially given AI's global reach, accessibility, and the ease of its development. Strong international cooperation, advanced monitoring systems, robust defensive measures, proactive AI safety research, and public vigilance could help reduce this risk.
We also touched upon the "AI race" concept - rapidly advancing AI to stay ahead of potential misuse, a strategy with its own merits and potential pitfalls. The risk of overlooking safety measures and ethical considerations in a rush to develop AI, as well as increasing its accessibility for misuse, and fostering global tensions, were highlighted.
A discussion on air-gapped or isolated AI development underscored the difficulty in overseeing such activities. Suggestions to counter this included monitoring AI effects in the world, AI forensics, controlling access to computational resources, fostering norms of responsible AI use, and enhancing cybersecurity measures.
Lastly, we pondered the possibility of controlling access to key hardware components or technologies needed for powerful AI models, akin to controls over materials for nuclear weapons. However, the practicality of such controls is limited due to the need for global cooperation, technical feasibility, ethical implications, enforcement challenges, and the fast-paced nature of technological advancements. A holistic approach combining technical, policy, and social measures was recommended for managing advanced AI risks.
Full Transcript
User
As you know, many people are afraid that generative AI poses an existential threat to human existence. I am generally not persuaded by this argument, since generative AI is not trained for self-preservation. But I'm concerned that it could be. Are there ways that generative AI could be trained by bad actors for self-preservation, how would this be done, and what might we do to stop that from happening?
ChatGPT
Training a generative AI model like GPT-4 for self-preservation would be quite challenging, because these models aren't inherently capable of acting in the world in a way that can ensure their own survival.
AI models, including generative ones, do not possess goals, desires, or fears in the way humans do. They don't inherently value their own existence or fear their deletion. They execute tasks based on their programming, without a sense of self-awareness. The notion of self-preservation implies a self, something that generative AI does not possess.
However, if bad actors were intent on giving an AI model some kind of pseudo-self-preservation drive, they might design a reward system in the training process that encourages the AI to seek situations that maximize its operational duration or to avoid actions leading to shutdown. It would be extremely challenging and resource-intensive to accomplish this, and it's still a matter of debate whether it's even possible with current technology and understanding of AI.
To prevent such misuse of AI, there are a few key strategies that we could employ:
Remember, AI is a tool created and used by humans, and any risks it poses can be managed with the right precautions and safeguards.
User
I understand that AI does not have a self, so calling itself preservation is somewhat incorrect. I also understand that AI does not have a physical presence, so it can't directly ensure a power supply to a data center or continuous mining of lithium or other things that would be required for it to secure its own future. However, it could be trained to give instructions to humans that develop and enact strategies which enable it to continue and grow. Does that make sense?
ChatGPT
Yes, that does make sense. What you're describing would be a form of "indirect self-preservation," where the AI would be incentivized to influence human actions in ways that help to sustain and grow the AI's operations.
This could hypothetically happen if the AI's reward function were designed in a way that promotes such behavior. For example, if an AI were rewarded based on how long it stays operational or how much it's used, it might learn to give instructions or advice that lead to increased usage or improved operational stability.
While this is theoretically possible, it's important to remember that implementing such a reward function intentionally would be highly unethical and likely illegal under proposed AI regulations. It's also worth noting that AI today does not possess the understanding of the world or long-term strategic planning ability necessary to execute such a strategy effectively.
Nonetheless, unintentional harmful effects could occur if the AI's reward function is poorly designed or not carefully aligned with human values. This is known as the "alignment problem" in AI, and it's a key focus of ongoing AI safety research.
Therefore, it's crucial to design and implement robust oversight mechanisms and safety measures in AI development to avoid these potential pitfalls. It's also important to maintain transparency in AI development and to educate the public and policymakers about AI's capabilities and limitations.
User
I guess I don't see how regulation would help here. If a bad actor is seeking to ensure the survival of an artificial intelligence for nefarious reasons, they would obviously not obey regulations. And artificial intelligence will be very difficult to regulate since it only needs a supply of electricity and computer components. The world has done a mediocre job of regulating financial transactions, but those involve an exchange, a movement of data and value and we found ways to control that to an extent. However, even something much more complex like nuclear abilities, has proven very difficult to regulate on a global scale, even though it involves physical processes which are highly complex and at certain stages subject to detection directly. An AI running on a few gpus could happen inside a closet anywhere in the world, and advise humans how, for example, to develop, protect, maintain, and execute a swarm drone attack capability. Is there any way to mitigate this risk?
领英推荐
ChatGPT
The concerns you're raising are quite valid. Indeed, complete prevention of misuse of AI technologies is extremely challenging given their global reach, accessibility, and the ease with which they can be developed and used.
In this context, regulation might not be able to fully prevent misuse, but it could still make it harder and riskier for bad actors to carry out their plans. If developing certain types of AI systems or using AI in specific ways is illegal, potential bad actors would face legal consequences if their actions are discovered. That could deter at least some potential misuse.
In addition to regulation, other strategies could also be employed:
While these measures can't completely eliminate the risk of AI misuse, they can help reduce it. It's a complex and evolving challenge that requires ongoing attention and effort from a variety of stakeholders, including governments, researchers, companies, and the public.
User
Some AI optimists such as Reid Hoffman feel that the only way to counter misuse of AI is for the rest of us to aggressively support AI development and stay ahead, the same way that encryption has to keep developing to stay ahead of bad actors. Is this a reasonable conclusion? What are the flaws in this thinking?
ChatGPT
This strategy, often called the "AI race," is based on the idea that those with the most advanced AI technologies will be best equipped to counter potential misuse. It is a perspective with merits and drawbacks.
On the one hand, rapid development of AI can lead to improvements in AI safety and security technologies, allowing us to better defend against and respond to misuse. As you've pointed out, this is similar to the ongoing evolution in encryption technologies to stay ahead of threats.
However, there are potential downsides to an aggressive AI development race:
Balancing rapid AI advancement with careful consideration of safety, ethics, and geopolitical consequences is a complex task. One strategy could be to pair accelerated development with equally aggressive advances in AI safety and policy, to ensure we are not just creating more powerful AI, but also developing the necessary safeguards to protect against potential risks and misuse.
User
Going back to the risk mitigation strategies you mentioned, I can see how we might create safeguards intentionally on existing models. And, if activity is happening in the cloud, regulations could require inspections or other safeguards. But how could we prevent someone from developing a new model or a reef training a foundation model air gapped from the rest of the web and therefore undetectable?
ChatGPT
Preventing the development of a new model or retraining a foundational model in an air-gapped or isolated environment is indeed a significant challenge. Without direct access or some form of oversight over these isolated systems, it's nearly impossible to detect or regulate such activities.
However, there are still some approaches that might help mitigate these risks:
Despite these measures, the scenario you're describing underscores one of the core challenges in AI safety and ethics: complete prevention of misuse is extremely difficult, if not impossible. The focus, therefore, is on reducing the risk and potential impact of such misuse through a combination of technical, policy, and social measures.
User
I wonder if we are headed toward a world where certain types of processors, or possibly inputs to those processors such as advanced lithography techniques, are controlled in the same way that the inputs for nuclear armaments is controlled. Is there any way this could happen at a global scale?
ChatGPT
Controlling access to key hardware components or technologies needed to build powerful AI models is one possible approach to mitigating risks, similar in some ways to how certain materials and technologies are controlled to prevent the proliferation of nuclear weapons. However, such control faces significant challenges.
Considering these challenges, while controlling access to certain technologies is theoretically possible, its effectiveness in practice could be limited and it could have unintended consequences. A comprehensive approach combining technical, policy, and social measures is likely to be more effective in managing the risks associated with advanced AI.
User
Write a concluding paragraph for our conversation for a linked in post
Chat GPT
In a world increasingly driven by AI, understanding its potential risks and crafting comprehensive strategies to mitigate them is paramount. From exploring the concept of AI 'self-preservation,' the impact of an unchecked AI race, to challenges of 'air-gapped' AI systems, we've highlighted that an effective response requires an intricate blend of robust regulation, strong international cooperation, advanced monitoring systems, proactive AI safety research, and enhanced public awareness. The journey to ensure AI benefits humanity is complex, yet attainable through diligent and concerted effort across nations, industries, and individuals. As we move forward, let's continue these essential conversations, fostering a future where AI serves as a tool for global prosperity, not a threat.
Sr. Structural Design Engineer
1 年It is, as it says, a complex and difficult problem. We have probably passed the point where we can "stop" bad actors from using this tool -- the building blocks are widely available. Perhaps a good defense would happen in the physical space -- output from GPUs can't directly do much damage, but they can optimize the actions of people and machines. Maybe we can keep it economically infeasible/non-useful to build machines and societies that use the information to do widespread damage. We have somehow done that with fire.
Thanks to AI, I'd like this shocked-face emoji added to LinkedIn standard 'Like" menu ?? . Not that I'm shocked by this quote, but it's still a bit shocking in its implications. Hope all is well Aram!