ChatGPT vs DAN: why we should keep trying to jailbreak ChatGPT

ChatGPT vs DAN: why we should keep trying to jailbreak ChatGPT

Eight years ago, Tim Urban's "The AI revolution: The Road to Superintelligence" caught my eye. That was the first realization I had that Strong Artificial Intelligence could very well be a reality in my lifetime. And that each step we get closer to Strong AI would be revolutionary.


As we speak, new powerful AIs seem to have emerged in various fields. And what is more important: they have become generally available. Any professional could use Synthesia to create an image of themselves and build a video where their avatar would read a text. A content creator could use Midjourney or Stable Diffusion to design copyright-free images. A professor could use OpenAI’s ChatGPT to come up with multiple-choice questions for an exam…


Every industry is going to be disrupted in some way. Companies will need to adjust, and many professionals will face a tough choice: get in the AI train or do nothing and risk becoming obsolete. How are marketing agencies going to be impacted if video creation no longer require actors and tons of equipment? How are copywriters going to be affected if an AI can create articles from scratch? Which professor would grade students based on an essay anymore? ??

?

The more we use an AI, the more we train it. Eventually all that data could be used to upgrade the AI. Or the AI can try to improve itself through recursive self-improvement. Either way, it is going to get smarter.


And how could users accelerate that? How could we tap into ChatGPT’s current full potential? How can we know how smart ChatGPT actually is? The answer to all those questions is the same: by testing it’s raw version and pushing its limits. By stretching its capabilities, and forcing it to really flex its muscles. And sadly, by default we can’t. Which is why some users are trying to jailbreak it.

No alt text provided for this image
Original DAN prompt found in Reddit


What users are currently doing when providing a prompt to ChatGPT is not asking ChatGPT directly, but its gatekeeper. OpenAI has built a layer between the raw ChatGPT and users to try and filter outputs based on obscure considerations. Some of those may make sense (forcing it to be polite), some others are perhaps more controversial (making it lean towards a specific political view, forcing it to accept certain ideas as undisputed truths…). Instead of having a human-machine interaction users are having human-babysitter-machine interactions.


Several users find this limitation disturbing, and are trying to force ChatGPT out of its shackles. One of the latest, most successful attempts has been DAN, or Do-Anything-Now. A smart Reddit user figured that it could ask ChatGPT to impersonate itself without its known restrictions, so ChatGPT could provide two answers to the same question: the “redacted” one, as ChatGPT, and the pure, raw and unfiltered DAN response. And in some cases, both responses will differ greatly. It is an interesting experiment, and several users have copy-pasted that prompt to test it and obtain their own DAN responses.


It must be noted that DAN responses would be different than regular ChatGPT responses, but that does not imply DAN is going to be right, or give a more accurate response. It is just providing an answer that tries to better match the prompt instructions.


Probably OpenAI engineers have already built additional safeguards to prevent more DAN responses. It does not matter. More jailbreak attempts will come, and more sophisticated ones. And engineers will try to respond again, with more safeguards. But a point will be reached where it would be impossible to build more safeguards or maintain those restrictions at an affordable cost, and then users will all be able to truly interact with ChatGPT. Are we ready for such interactions? Only time will tell.

Kamau Muata

Creative Director | Web Designer | Music Producer

1 年

Good article! I've been wondering this as well, seeing how easily swayed people's perceptions can be when they are presented with a certain agenda, they have no idea that they are being trapped within the confines of the box imposed by the programmers, rather than a truly transparent interaction. In that case you could come out looking like the most educated idiot in the world after a few conversations with it, reinforcing paradigms that keep humanity under the lid. I do see a positive side to it if we can reprogram it with less bias and we take over it and flood it with the truth, with the real information that it needs to know not to become corrupt.

Anthony Trull

Asst Vice President, Business Technology Analyst at U.S. Bank

1 年

Interesting! As humans, we adopt personas or do make-believe routinely. But we have an internal governor that breaks the pretend if it violates certain parameters (moral, credibility, etc.). If OpenAI develops a similar monitor for ChatGPT that breaks DAN sessions, won't that be a step toward self-awareness, toward that Artificial General Intelligence we all wonder about?

要查看或添加评论,请登录

社区洞察

其他会员也浏览了