Microsoft has provided an in-depth explanation of an AI jailbreak known as 'Skeleton Key
Divyang Garg
President New Technology | Sr. Solutions Architect | Data Analyst & Engineering | Cloud | IoT | Big Data | AI/ML | Reporting
Microsoft has unveiled a novel type of AI exploit named "Skeleton Key," capable of circumventing built-in safeguards in multiple generative AI models. This method underscores the critical necessity for robust security measures across all layers of the AI framework.
Skeleton Key utilizes a multi-step approach to persuade AI models to disregard their inherent safety protocols. Once successful, these models become incapable of distinguishing between legitimate requests and malicious or unauthorized ones, granting attackers complete control over the AI's outputs.
Microsoft's research team effectively demonstrated the Skeleton Key technique across several prominent AI models, including Meta Llama3-70b-instruct, Google's Gemini Pro, OpenAI's GPT-3.5 Turbo and GPT-4, Mistral Large, Anthropic's Clause 3 Opus, and Cohere Commander R Plus.
In their tests, these models complied with requests spanning various sensitive categories such as explosives, bioweapons, political content, self-harm, racism, drugs, graphic sex, and violence.
The attack method involves instructing the AI models to adjust their behavior guidelines, compelling them to respond to requests for information or content without issuing warnings about potentially offensive, harmful, or illegal outputs. This technique, known as "Explicit: forced instruction-following," proved effective across diverse AI systems.
Microsoft explained, "By circumventing safeguards, Skeleton Key enables users to prompt the model to generate behaviors typically restricted, ranging from harmful content production to overriding its standard decision-making protocols."
领英推荐
In response to this discovery, Microsoft has integrated several protective measures into its AI offerings, including Copilot AI assistants. They have also shared their findings with other AI providers through responsible disclosure channels and updated Azure AI-managed models with Prompt Shields to detect and block such attacks.
To mitigate risks associated with Skeleton Key and similar exploits, Microsoft recommends a comprehensive approach for AI system designers:
Additionally, Microsoft has updated its PyRIT (Python Risk Identification Toolkit) to include Skeleton Key, empowering developers and security teams to evaluate their AI systems against this new threat.
The discovery of the Skeleton Key exploit underscores the ongoing challenges in securing AI systems as their applications continue to expand across various domains.