Securing LLMs: The Rise of Large Language Model based Cybersecurity
Sanjay Rao
Generative AI and Product | Venture Capital | prev McKinsey, Microsoft, Norwest Venture Partners
Functional areas from customer service to sales to marketing are being redefined to embrace generative AI and large language models (LLMs).?Those pop-up rule based chatbots found on websites are undergoing a significant upgrade to truly be conversational with authoritative lengthy responses in part due to API connections to services like ChatGPT on the back end and deeper connections to internal enterprise data.
Yet, as programmatic responses evolve from simplistic deterministic outputs to longer prose, the surface area for sensitive, incorrect, corrupted, or controversial content has also increased.?Enterprises need to build guardrails on their LLMs to track inputs to models, monitor model drift, and manage outputs to ensure reasonable responses.?
Outlined below are a few, albeit of many, areas to focus on in securing LLMs:
1)?Manage Malicious and Confidential Input to LLMs
Implementing effective input filters to LLMs can ensure appropriate responses and limit the exposure of confidential information. Prompts should be assessed for words, phrases and intent that is deemed inappropriate or out of scope.?
Since prompts can also be stored and used to train future LLM responses, inputs from private enterprises to public LLMs should be assessed for the presence of confidential internal or sensitive data or obfuscated at the time of input. Enterprises are increasingly likely to build custom LLMs that use open source platforms such as LangChain to further control inputs.
2)?Identity and LLMs – AI models are somewhat living creations which need to evolve over time or risk atrophy.?Most enterprises employ continuous monitoring of LLMs to manage the effectiveness of model predictions and track model drift.
Addressing model drift with an identity lens can help ensure more personalized responses.?For example, identity can help determine whether a user's prompt should be leveraged as training data for reducing model drift in future responses.?
Understanding user identity can also be utilized in feedback loops.?As an example, in most interactions with LLMs, users can flag responses as being appropriate, inappropriate, relevant and so forth.?By utilizing only qualified user responses in addressing model drift, enterprises can limit malicious actors or unqualified users from unduly influencing the predictions of models.
When a user specific identity isn’t available, personalizing LLM responses to a persona (e.g. a location, domain, etc...) can be helpful in keeping responses within scope.
Further, ensuring LLMs are appropriately connected with data security management platforms can ensure that only appropriate users are accessing sensitive data.
3)?Managing Output – LLMs are susceptible to a host of challenges such as hallucinations and biased responses.?Running LLM responses though an output filter can help ensure appropriate responses.?At a minimum, output filters can limit the disclosure based on keywords, phrases, topics, data fields (e.g. PII information like phone numbers).?
Beyond this, many LLMs attempt to keep early responses consistent with subsequent responses as a persona within the context of a conversation. Periodically checking the active conversation in totality for an unrealistic response by, for example, aggregating all prompts as one fresh prompt in a new session can help identify hallucinations or provide an indication as to when to reset the session context.
This is a purposely short article to be a conversation starter on a given topic. Tau Ventures is an early stage AI focused venture capital firm. More at https://www.tauventures.com.