What if LLMs are GOOD for security?
I had recently shared some thoughts on appropriate security access for LLMs on confidential data, but what if LLMs aren’t just regarded as a security risk, but an opportunity to enhance our data security?
One thing that LLMs are increasingly good at is entity recognition.? So, it is easy to imagine cases where the sensitive data falls into particular types of entities - personal names, zip codes, drug targets, and so on.? By identifying these entities in text and replacing them with generic placeholders, it is possible to obfuscate some of the content.? Think of all the scientific posters that have “compound X” or “kinase Y” on them.? However, this method goes beyond similar find-and-replace, since it can work on generic entities.
As an example, I’ve pulled in a section of the wikipedia article on the discovery of statins:
Given the following article: In the mid-19th century, a German pathologist named Rudolf Virchow discovered that cholesterol was to be found in the artery walls of people that died from occlusive vascular diseases, like myocardial infarction. The cholesterol was found to be responsible for the thickening of the arterial walls and thus decreasing the radius in the arteries which leads in most cases to hypertension and increased risk of occlusive vascular diseases.
In the 1950s the Framingham heart study led by Dawber revealed the correlation between high blood cholesterol levels and coronary heart diseases. Following up from that study the researchers explored a novel way to lower blood cholesterol levels without modifying the diet and lifestyle of subjects suffering with elevated blood cholesterol levels. The primary goal was to inhibit the cholesterol biosynthesis in the body. Hence HMG-CoA reductase (HMGR) became a natural target. HMGR was found to be the rate-limiting enzyme in the cholesterol biosynthetic pathway. There is no build-up of potentially toxic precursors when HMGR is inhibited, because hydroxymethylglutarate is water-soluble and there are alternative metabolic pathways for its breakdown.
领英推荐
In the 1970s the Japanese microbiologist Akira Endo first discovered natural products with a powerful inhibitory effect on HMGR in a fermentation broth of Penicillium citrinum, during his search for antimicrobial agents. The first product was named compactin (ML236B or mevastatin). Animal trials showed very good inhibitory effect as in clinical trials, however in a long term toxicity study in dogs it resulted in toxic effects at higher doses and as a result was believed to be too toxic to be given to humans. In 1978, Alfred Alberts and colleagues at Merck Research Laboratories discovered a new natural product in a fermentation broth of Aspergillus terreus, their product showed good HMGR inhibition and they named the product mevinolin, which later became known as lovastatin.
…now, applying the following prompt:
Starting from the initially provided text, replace all proper names with sequential names - "Person A", "Person B", "Person C".? Also, replace any gene, enzyme, pathway, metabolite or protein names with "[redacted target]"
…here is the result:
In the mid-19th century, a German pathologist named Person A discovered that [redacted target] was to be found in the artery walls of people that died from occlusive vascular diseases, like myocardial infarction. The [redacted target] was found to be responsible for the thickening of the arterial walls and thus decreasing the radius in the arteries, which leads in most cases to hypertension and increased risk of occlusive vascular diseases.
In the 1950s, the Framingham heart study led by Person B revealed the correlation between high blood [redacted target] levels and coronary heart diseases. Following up from that study, the researchers explored a novel way to lower blood [redacted target] levels without modifying the diet and lifestyle of subjects suffering from elevated blood [redacted target] levels. The primary goal was to inhibit the [redacted target] biosynthesis in the body. Hence [redacted target] reductase ([redacted target]) became a natural target. [redacted target] was found to be the rate-limiting enzyme in the [redacted target] biosynthetic pathway. There is no build-up of potentially toxic precursors when [redacted target] is inhibited because [redacted target] is water-soluble, and there are alternative metabolic pathways for its breakdown.
In the 1970s, a Japanese microbiologist named Person C first discovered natural products with a powerful inhibitory effect on [redacted target] in a fermentation broth of Penicillium citrinum, during their search for antimicrobial agents. The first product was named compactin (ML236B or mevastatin). Animal trials showed a very good inhibitory effect as in clinical trials; however, in a long-term toxicity study in dogs, it resulted in toxic effects at higher doses and, as a result, was believed to be too toxic to be given to humans. In 1978, Person D and colleagues at Merck Research Laboratories discovered a new natural product in a fermentation broth of Aspergillus terreus. Their product showed good [redacted target] inhibition, and they named the product mevinolin, which later became known as lovastatin.
Now, of course, for truly sensitive stuff, maybe it would still make sense to use a locally-run LLM (like BioGPT or Dolly) - these are already pretty good at entity recognition and might be sufficient for the case. And, of course, a manual review would still be helpful.
You could also imagine cases for LLMs speeding up external review - if a poster is to be published, the model could be asked to flag sensitive entities for additional review and possible redaction.
Senior B2B commercial leader; 12+yrs of marketing & sales in life sciences+software
1 年Daniel Jamieson your kind of article