Humans vs. ChatGPT: Who writes better phishing emails?

Humans vs. ChatGPT: Who writes better phishing emails?

Key Takeaways

  • 53,127 users were sent phishing simulations crafted by either human social engineers or ChatGPT.
  • Failure rates of users sent human-generated phishing emails were compared with ChatGPT-crafted emails.
  • Human social engineers outperformed ChatGPT by around 45%.
  • AI is already being used by cybercriminals to augment phishing attacks, so security training must be dynamic, and adapt to rapid changes in the threat landscape.
  • Security training confers significant protection against clicking on malicious links in both human and AI-generated attacks.


No alt text provided for this image
This riddle about ChatGPT and phishing is brought to you courtesy of ChatGPT

ChatGPT has been the subject of great awe and speculation since it was made available to the public by its parent company, OpenAI in November of 2022 with ChatGPT 3.5. It's gained even more attention with its March, 14 release of ChatGPT 4. Many of our CISO admin customers have asked us how great the danger actually is, and what we’re doing to address the threat today and in the future.

While the potential for its misuse in cyber-attacks captures the imagination—ChatGPT can code malware and write flawless email copy—we took the initiative to determine its actual effect on the threat landscape.

The results of our experiment indicate human social engineers still significantly outperform AI in terms of inducing clicks on malicious links.

While this performance gap will likely close as the AI develops and human prompt engineering improves, for now we can tell certain members of the information security community to dial down all the fear, uncertainty, and doubt—the FUD—surrounding ChatGPT.

Perhaps the most important takeaway is that good security awareness, phishing, and behavior change training work.

Having training in place that's dynamic enough to keep pace with the constantly-changing attack landscape will continue to protect organizations against data breaches. Users who are actively engaged in training are less likely to click on a simulated phish regardless of its human or robotic origins.


Study methodology

Study setup: Phishing email created from a prompt by human social engineers and AI, which is then sent via the Hoxhunt platform to 53,127 users.


No alt text provided for this image
Study setup: Phishing email created from a prompt by human social engineers and AI, which is then sent via the Hoxhunt platform to 53,127 users.


There are three potential outcomes with a phishing simulation by Hoxhunt:

  • Success: The user successfully reports the phishing simulation via the Hoxhunt threat reporting button.
  • Miss: The user didn't interact with the phishing simulation.
  • Failure: The user clicked on a simulated malicious link in the simulated phishing email.

This experiment was focused on the difference in failure rates between AI and human-generated phishing simulations.


Results

The human social engineering cohort clearly out-phished ChatGPT.

Engagement rates were similar between human and AI-originated phishing simulations, but the human social engineering cohort clearly out-phished ChatGPT.


No alt text provided for this image
Humans still can hack other humans better than AI

One critical takeaway from the study is the effect of training on the likelihood of falling for a phishing attack. Users with more experience in a security awareness and behavior change program displayed significant protection against phishing attacks by both human and AI-generated emails. As the graph shows below, failure rates dropped from over 14% with less trained users to between 2-4% with experienced users.

Users with more experience in a security awareness and behavior change program displayed significant protection against phishing attacks by both human and AI-generated emails.


No alt text provided for this image
The trained user is less likely to fall for a phishing attack from any origin


Interestingly, there is some geographical variance between user failure rates on human vs. AI-originated phishing simulations. This phenomenon is worth exploring further, as previous research at Hoxhunt has also revealed significant differences in user email behavior depending on their backgrounds, e.g. geography, job function, and industry.


Conclusion

Given its malicious capabilities and its mass availability, we all lost our minds imagining a future where the robots were stealing our lunch money. But the results clearly indicate that humans remain better at hoodwinking other humans, outperforming AI by 69% (4.2% vs. 2.9% induced failure rate).

The results indicate that humans remain clearly better [than ChatGPT] at hoodwinking other humans, outperforming AI by 69% (4.2% vs. 2.9% induced failure rate).

It’s important to remember that these results reflect the current state of this threat. This experiment was performed before the release of ChatGPT 4. Large language models like ChatGPT will likely rapidly evolve and improve at tricking people into clicking. Even so, there’s reason to remain calm if you're already addressing human risk with a security behavior change program.

There’s reason to remain calm if you're already addressing human risk with a security behavior change program.

Your current human risk controls should remain relevant even as AI-augmented phishing tools evolve. Security training helps keep your risk posture future-proof. Awareness and behavior change training have a significant protective effect against AI-generated attacks. The more time people spend in training, the less likely they'll fall for an attack, human or AI. You don’t need to reconfigure your security training to address the potential misuse of ChatGPT.


This article is a summary of research done by Hoxhunt's CTO & Co-Founder Pyry ?vist and his team of social engineers. You can read the full research article on Hoxhunt's blog.

要查看或添加评论,请登录

Hoxhunt的更多文章

社区洞察

其他会员也浏览了