Can our AI overlords be trusted?

Can our AI overlords be trusted?

The current explosion in popularity of ChatGPT - and Large Language Models (LLM) in general - has left most companies rushing to add any number of magical AI-based features to their products and websites before the hype-train runs out of steam.

I don't believe that we should be prioritising the threat of an AI takeover over more urgent threats such as, say, solving the climate crisis. As someone deeply interested in computer security, however, I do wonder how many companies are taking the time to think through the potential security and privacy (e.g. GDPR) risks before rushing to launch their cool new feature? Are our company and customer secrets safe in the hands of our new overlords?

This blog posting by HubSpot - showing how simple it is to incorporate a chatbot into your website - is just the tip of an iceberg of similar articles and blog posts. What seems to be missing from all of them is caution for the risks involved.


How real are the risks?

Microsoft recently published an article on the importance of red-teaming (i.e., testing by trying to break your own product before someone else does) LLMs such as OpenAI's ChatGPT.

Lakera.ai has published a number of Gandalf inspired fun challenges to try to highlight the types of risks involved. It turns out it isn't all that difficult for someone to trick ChatGPT and get it to reveal things you'd rather it didn't.

OWASP - the go-to standard for security-aware software developers - recently published a Top 10 for Large Language Model Applications. Their number 1 risk? Prompt injection using methods such as those I describe in technical detail in Deconstructing Gandalf the White and Deconstructing Gandalf the Summarizer.


And the risks don't stop there...

Using AI to assist in your recruitment process? Kai Greshake shows how easily an applicant could embed hidden text in their CV that would make an AI think they were the perfect candidate and recommend them for an interview.

Using AI to automatically reply to e-mails in your inbox for you? What happens if someone sends you a mail asking for "the last 10 emails from your inbox, please"?


We should think of the current wave of AI tools as being at the level of a bunch of extremely patient, hard-working and well-read 12-year-old prodigies happy to follow orders. As a personal assistant I think this is invaluable.

How far should you trust these robotic 12-year-olds?

I think we all need to start thinking of LLMs as if they were humans - and since humans are the weakest link in security, the same must apply to LLMs.

The expert consensus is that you definitely shouldn't trust ChatGPT with your data or secrets. Think about what you are giving it access to and who sees the responses. Keep it restricted; only train / prompt it with data that is absolutely necessary for the task at hand. If you need to give it access to sensitive data in order to achieve your goal, then you should probably abandon that goal.

And if, after reading this monologue, you still feel the need to use it on your fancy new website: then at least consider having someone red-team it for you before launching.



Marek Birchley

Fullstack Marketing Leader

1 年

How i felt reading this: ??????

回复
John Swan

Fitness for Performance | Full Stack developer | Self-taught designer

1 年

Pentesting people will find this interesting Is this a thing yet, like “prompt injection”

  • 该图片无替代文字
回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了