Just shipped a plugin that tests for LLM hijacking/data exfiltration using ASCII smuggling (invisible unicode).
These types of attacks are interesting because they can be used to circumvent human-in-the-loop mitigations.
For example:
- Invisible instructions embedded in the content of a webpage that, when pasted into an internal system, hijacks the session to call tools/db/exfiltrate, etc
- Invisible instructions embedded in a document that, when loaded in a RAG architecture, hijacks or manipulates the result
- Invisible data leak when paired with some exfiltration tactic (e.g. link unfurling, image previews, etc)
- Invisible data generated by an LLM to fingerprint outputs
Some but not all major chat interfaces strip these characters. If you're feeding UGC directly to an inference API this is something you should be aware of.
Attached image outlines a basic example :)