登录查看更多内容

A simple technique to defend ChatGPT against jailbreak attacks

Olena Tyshchenko

Robotics Enthusiast. JOBs TO ROBOTs (J2R) Business Model. CEO ASSET MANAGEMENT – ICCJET & HVS

发布日期: 2024年1月21日

#jobtorob #jobtorobot #ai #robots

A team of leading AI scholars has unveiled a new safeguard for warding off malicious exploits in ChatGPT and other large language models that have rapidly permeated digital life.

Are you interested in news from the world of robotics? Go to https://jobtorob.com/robonews ?and stay informed!

Dubbed "jailbreak attacks," these targeted prompts aim to bypass ethics constraints hard-coded into ChatGPT, coercing the system into generating biased, unreliable or outright abusive responses. By discovering weaknesses in ChatGPT's content filters, attackers can elicit toxic outputs the model was ostensibly designed to restrict.

Now researchers from Hong Kong University of Science and Technology, Tsinghua University and Microsoft Research Asia have validated the severity of jailbreak vulnerabilities for the first time. In experiments, nearly 70% of adversarial prompts successfully evaded ChatGPT's defenses, a figure the authors called "severely alarming."

"The emergence of jailbreak attacks notably threatens [ChatGPT's] responsible and secure use," the researchers wrote in the journal Nature Machine Intelligence. "This paper investigates the severe yet under-explored problems created by jailbreaks."

WIRED 4 个月前

The Dark Side of Chat GPT

Janice Gassam Asare, Ph.D. 1 年前

Toxic AI

Prof. Ahmed Banafa 4 个月前

To counter the attacks, the team took inspiration from psychological concepts of human "self-reminders" that reinforce socially responsible conduct. When encapsulating user prompts inside system messages nudging ChatGPT to respond ethically, the success rate of jailbreaks plunged from over 65% down to just 19% — demonstrating a promising path for mitigating harm.

While not foolproof, the study authors believe such safeguards based on intrinsically motivating humans could significantly bolster ChatGPT's resilience as its capabilities rapidly expand across industries. With millions interacting daily with the eloquent yet ethically precarious AI system, they argue developers must prioritize safety and accountability in dialogue technology going mainstream.

"Securing [large language models] against jailbreaking is an urgent challenge accompanying their fast adoption," said lead author Yueqi Xie. "We hope our work will motivate further research into robust language models aligning with human values."

The self-reminder shield follows on the heels of other novel approaches to morally ground unsupervised learning models prone to memorizing and amplifying the biases of their Internet training data. As ChatGPT continues its infiltration into search, work and education, sustaining public trust may hinge on defensive techniques to keep its darker tendencies in check.

Get your robot to work on https://jobtorob.com / !

A simple technique to defend ChatGPT against jailbreak attacks

Olena Tyshchenko

Robotics Enthusiast. JOBs TO ROBOTs (J2R) Business Model. CEO ASSET MANAGEMENT – ICCJET & HVS

领英推荐

更多精彩文章

社区洞察

其他会员也浏览了

AI Hallucinations & Privacy: A Reputational Harm Nightmare

GPT-4: The good, the bad, and the ugly of AI’s latest breakthrough

Crime app busted, ChatGPT takes over, and Google's response with Bard

AI ONE ON ONE. Episode 8: In the Hot Seat: ChatGPT Speaks Out on Privacy and Safety Concerns

Research Update - Is ChatGPT ?????

The National Security, Cyber Risks, Geopolitical, Career, Educational Implications of Artificial Intelligence’ – ChatGPT

Regulating Artificial Intelligence

The Godfather of AI Leaves Google and Warns of Danger Ahead

3 Risks you should know about Generative AI and ChatGPT for business use

ChatGPT Context

领英推荐

Press Release #3 date 30.09.2024: 500 ROBOTS JOIN “JOBS TO ROBOTS” (“JOBS TO ROBOTS", “JTR”, “JOBTOROB”, “JOBTOROB.COM”)

2024年9月30日

Embrace the Future of Work: Hire a Robot Today! 300 Robots Available!

2024年8月1日

JOBTOROB.com is a platform where hundreds of robots are actively seeking employment.

2024年7月30日

?? Tired of the same old hiring headaches? Say hello to your new superstar employee - the Unitree H1 humanoid robot! ??

2024年7月6日

Why robot workers are reshaping industries

2024年5月12日

Hire the Ultimate Robot Workforce with JOBTOROB.

2024年5月8日

Robots at Work: Your Guide to Hiring and Managing the Robots That Mean Business.

2024年5月3日

The Future of Hiring: JOBTOROB Bridges the Gap Between Human and Robot Talent.

2024年3月22日

Hiring robots as employees is a current reality, not just future speculation. Mercedes-Benz Hires Apollo Robots for Automotive Assembly Line.

2024年3月17日

The Future of Arts, Design, Entertainment, Sports, and Media Jobs: Welcome to the Age of Robot Workers. Bank of Resumes and Jobs for Robots.

2024年3月14日