登录查看更多内容

Microsoft has provided an in-depth explanation of an AI jailbreak known as 'Skeleton Key

Divyang Garg

President New Technology | Sr. Solutions Architect | Data Analyst & Engineering | Cloud | IoT | Big Data | AI/ML | Reporting

发布日期: 2024年7月1日

Microsoft has unveiled a novel type of AI exploit named "Skeleton Key," capable of circumventing built-in safeguards in multiple generative AI models. This method underscores the critical necessity for robust security measures across all layers of the AI framework.

Skeleton Key utilizes a multi-step approach to persuade AI models to disregard their inherent safety protocols. Once successful, these models become incapable of distinguishing between legitimate requests and malicious or unauthorized ones, granting attackers complete control over the AI's outputs.

Microsoft's research team effectively demonstrated the Skeleton Key technique across several prominent AI models, including Meta Llama3-70b-instruct, Google's Gemini Pro, OpenAI's GPT-3.5 Turbo and GPT-4, Mistral Large, Anthropic's Clause 3 Opus, and Cohere Commander R Plus.

In their tests, these models complied with requests spanning various sensitive categories such as explosives, bioweapons, political content, self-harm, racism, drugs, graphic sex, and violence.

The attack method involves instructing the AI models to adjust their behavior guidelines, compelling them to respond to requests for information or content without issuing warnings about potentially offensive, harmful, or illegal outputs. This technique, known as "Explicit: forced instruction-following," proved effective across diverse AI systems.

Microsoft explained, "By circumventing safeguards, Skeleton Key enables users to prompt the model to generate behaviors typically restricted, ranging from harmful content production to overriding its standard decision-making protocols."

Dr. Joerg Storm 4 个月前

Daily Update: What AI Development Means for…

S&P Global 1 年前

Generating Synthetic Data for LLMs, Deploying…

Open Data Science Conference (ODSC) 4 个月前

In response to this discovery, Microsoft has integrated several protective measures into its AI offerings, including Copilot AI assistants. They have also shared their findings with other AI providers through responsible disclosure channels and updated Azure AI-managed models with Prompt Shields to detect and block such attacks.

To mitigate risks associated with Skeleton Key and similar exploits, Microsoft recommends a comprehensive approach for AI system designers:

Implementing input filtering to detect and block potentially harmful or malicious inputs.
Engaging in careful prompt engineering of system messages to reinforce appropriate behavior.
Employing output filtering mechanisms to prevent the generation of content violating safety guidelines.
Utilizing abuse monitoring systems trained on adversarial examples to identify and address recurring problematic content or behaviors.

Additionally, Microsoft has updated its PyRIT (Python Risk Identification Toolkit) to include Skeleton Key, empowering developers and security teams to evaluate their AI systems against this new threat.

The discovery of the Skeleton Key exploit underscores the ongoing challenges in securing AI systems as their applications continue to expand across various domains.

要查看或添加评论，请登录

查看全部

Microsoft has provided an in-depth explanation of an AI jailbreak known as 'Skeleton Key

Divyang Garg

President New Technology | Sr. Solutions Architect | Data Analyst & Engineering | Cloud | IoT | Big Data | AI/ML | Reporting

领英推荐

更多精彩文章

社区洞察

其他会员也浏览了

Shadow AI and Data Poisoning in Large Language Models: Implications for Global Security and Mitigation Strategies

Should AI be worried about adversarial examples?

AI Gone Rogue? Machine Builds Hidden Network, Sparking Security Concerns

S.D.I. English Edition newsletter: Trick or AI Treat … ?

How the Responsible Use of AI Can Create Safer Online Spaces

What New Security Threats Arise from The Boom in AI and LLMs?

Security of Generative AI Services: Safeguarding the Future of AI

AI: The Future is Now, But Are We Prepared?

Attack on AI Chatbots, AI Lies, and more! | Fetch.ai Newsletter | Issue 17/08/2023

Securing and Enhancing the Trustworthiness of Generative AI Applications

领英推荐

Anthropic's Claude 3.5 Sonnet Outperforms GPT-4.o Across Multiple Benchmarks

2024年6月26日

How Apple Collaborated with Google to Train Its AI Models

2024年6月18日

Meet Apple Intelligence: The Advanced AI Revolutionizing for an Enhanced User Experience

2024年6月14日

Introducing the ChatGPT Prompt Generator: Unlocking the potential of AI-powered conversations

2024年6月13日

Google Refines AI Overviews in Response to an Uneven Initial Launch

2024年6月12日

Google modifies AI Overviews following a turbulent rollout

2024年6月10日

The EU inaugurates a bureau dedicated to enforcing the AI Act and nurturing innovation

2024年6月7日

Elon Musk's AI venture, xAI set to $6 billion to rival OpenAI in the AI competition

2024年6月5日

The Emergence of Intelligent Automation as a Key Competitive Advantage

2024年5月31日

The Future of Virtual and Augmented Reality in Education

2024年5月28日