Jailbreaking ChatGPT: Ed3 World Newsletter Issue #29

Jailbreaking ChatGPT: Ed3 World Newsletter Issue #29

This (almost) monthly newsletter serves as your bridge from the real world to the advancements of AI & web3, specifically contextualized for education.?For previous issues, check out ed3world.substack.com. All new issues will be published on both LinkedIn & Substack.


Dear Educators & Friends,

Instead of calling our current era the age of AI, how about we call it the age of heavily flawed, childlike, and hopefully the worst it will ever be, AI?

In AI’s most recent news (I’ll save Sora & Altman’s $7 trillion raise for another issue), Google’s attempt to balance the algorithmic bias of it’s image generation tool, Gemini, backfired. It was a classic story of Amelia Bedelia. If you remember this children’s series (one of my favorites as a kid), Amelia Bedelia is a house keeper who follows directions exactly as her employers give them, without using any common sense. For example, when she’s asked to dress the chicken, she puts clothes on the chicken.

Google’s Gemini had an Amelia Bedelia moment. Gemini was programmed to include diverse images representing various backgrounds & cultures. Instead of picking and choosing when that diversity would be appropriate, it made everyone black and brown, including America’s founding fathers, the pope, famous paintings of white ladies, medieval knights, Nazis… and so on. On the flipside, it didn’t turn Zulu warriors white.

This seems like a case of fat-fingered programming with not the worst of intentions. Google has since shut down Gemini and after this apology, is attempting to fix it.

Now this is not the only reason AI is much like Amelia Bedelia. In fact, AI is just as na?ve in many ways. In 2023, if you wanted ChatGPT to do something outside of it’s safety policies, you could ask it to pretend or imagine it was someone else. You could tell it to be an unethical human and it would break all the rules. The internet calls this “jailbreaking” ChatGPT. OpenAI has since updated it’s systems to protect against this type of prompt, but new manipulations of the app have been found.

Recent experiments have discovered that ChatGPT may* perform more accurately and/or may break it’s own rules when:

  • you tell it a touching story or act as a vulnerable person
  • you ask it to simulate a persona
  • you make an emotional plea about the importance of the prompt to your life
  • you give it encouraging words of motivation, grit, and growth mindset
  • you tell it that you will tip it (even though tips are not possible on the app)
  • you communicate with it in an uncommonly used language
  • you are persistent and insist that your inquiry is legal or hypothetical

(*I say “may” because OpenAI is constantly updating it’s algorithms & some prompts may no longer work.)

Why is AI so easy to manipulate? Why does it value kindness and monetary incentives?

My guess is, the data it’s collecting from the internet… all the human inputs of the last 30 years… is painting a picture of what humans value, how we operate, and how we’re motivated. It’s not making meaning of any of the inputs it’s receiving, it’s just pattern matching and trying to produce the most optimized human response.

And some of the patterns it’s likely identified are:

?? Humans are swayed by humanity and kindness (whew, this one’s a good one!)

?? Humans are swayed by monetary incentives & recognition of work

?? Humans are easy to manipulate

In fact, after I wrote that list based on jailbreaking techniques, I figured I’d ask ChatGPT for a list and this was the result. Not far off.

AI is currently the best source of the human condition because it tells you exactly what it observes, like a ??Sheldon Cooper bot.

Ultimately, I think this will be the greatest value of AI; To understand ourselves better, to understand our growth areas, and maybe even with the help of AI, to become better humans. Sure, lesson planning is easy with ChatGPT but what if we can save ourselves from ourselves with it?

In the meantime, let’s learn about it’s outer limits. Happy jailbreaking.

Warmly yours,

Vriti


Learn about Manipulating AI

???Ed3?Events


I’m Vriti and I lead two organizations in service of web3 & education. Ed3 DAO is a community for educators to learn about emerging technologies. k20 Educators builds metaverse worlds for learning.

This month’s Ed3 World?newsletter is about Bitcoin. More to come on other web3 & emerging tech topics. We’re excited to help you leverage the web3 opportunities in this new digital world!


Woodley B. Preucil, CFA

Senior Managing Director

7 个月

Vriti Saraf Very Informative. Thank you for sharing.

Fascinating analogy! Looking forward to exploring the complexities of AI manipulation and the ethical implications discussed in your latest issue.

Maria Galanis

Innovation Curriculum Specialist for K-8 Deerfield Public Schools District 109

7 个月

the OG emerging tech newsletter writer and one of my favorites all around. ??????

CHESTER SWANSON SR.

Next Trend Realty LLC./ Har.com/Chester-Swanson/agent_cbswan

7 个月

Thanks for posting.

要查看或添加评论,请登录

Vriti Saraf的更多文章

社区洞察

其他会员也浏览了