What if LLMs are GOOD for security?
Modified from: https://www.nature.com/articles/s41577-018-0094-3

What if LLMs are GOOD for security?

I had recently shared some thoughts on appropriate security access for LLMs on confidential data, but what if LLMs aren’t just regarded as a security risk, but an opportunity to enhance our data security?


One thing that LLMs are increasingly good at is entity recognition.? So, it is easy to imagine cases where the sensitive data falls into particular types of entities - personal names, zip codes, drug targets, and so on.? By identifying these entities in text and replacing them with generic placeholders, it is possible to obfuscate some of the content.? Think of all the scientific posters that have “compound X” or “kinase Y” on them.? However, this method goes beyond similar find-and-replace, since it can work on generic entities.


As an example, I’ve pulled in a section of the wikipedia article on the discovery of statins:

Given the following article: In the mid-19th century, a German pathologist named Rudolf Virchow discovered that cholesterol was to be found in the artery walls of people that died from occlusive vascular diseases, like myocardial infarction. The cholesterol was found to be responsible for the thickening of the arterial walls and thus decreasing the radius in the arteries which leads in most cases to hypertension and increased risk of occlusive vascular diseases.
In the 1950s the Framingham heart study led by Dawber revealed the correlation between high blood cholesterol levels and coronary heart diseases. Following up from that study the researchers explored a novel way to lower blood cholesterol levels without modifying the diet and lifestyle of subjects suffering with elevated blood cholesterol levels. The primary goal was to inhibit the cholesterol biosynthesis in the body. Hence HMG-CoA reductase (HMGR) became a natural target. HMGR was found to be the rate-limiting enzyme in the cholesterol biosynthetic pathway. There is no build-up of potentially toxic precursors when HMGR is inhibited, because hydroxymethylglutarate is water-soluble and there are alternative metabolic pathways for its breakdown.
In the 1970s the Japanese microbiologist Akira Endo first discovered natural products with a powerful inhibitory effect on HMGR in a fermentation broth of Penicillium citrinum, during his search for antimicrobial agents. The first product was named compactin (ML236B or mevastatin). Animal trials showed very good inhibitory effect as in clinical trials, however in a long term toxicity study in dogs it resulted in toxic effects at higher doses and as a result was believed to be too toxic to be given to humans. In 1978, Alfred Alberts and colleagues at Merck Research Laboratories discovered a new natural product in a fermentation broth of Aspergillus terreus, their product showed good HMGR inhibition and they named the product mevinolin, which later became known as lovastatin.

…now, applying the following prompt:

Starting from the initially provided text, replace all proper names with sequential names - "Person A", "Person B", "Person C".? Also, replace any gene, enzyme, pathway, metabolite or protein names with "[redacted target]"

…here is the result:

In the mid-19th century, a German pathologist named Person A discovered that [redacted target] was to be found in the artery walls of people that died from occlusive vascular diseases, like myocardial infarction. The [redacted target] was found to be responsible for the thickening of the arterial walls and thus decreasing the radius in the arteries, which leads in most cases to hypertension and increased risk of occlusive vascular diseases.
In the 1950s, the Framingham heart study led by Person B revealed the correlation between high blood [redacted target] levels and coronary heart diseases. Following up from that study, the researchers explored a novel way to lower blood [redacted target] levels without modifying the diet and lifestyle of subjects suffering from elevated blood [redacted target] levels. The primary goal was to inhibit the [redacted target] biosynthesis in the body. Hence [redacted target] reductase ([redacted target]) became a natural target. [redacted target] was found to be the rate-limiting enzyme in the [redacted target] biosynthetic pathway. There is no build-up of potentially toxic precursors when [redacted target] is inhibited because [redacted target] is water-soluble, and there are alternative metabolic pathways for its breakdown.
In the 1970s, a Japanese microbiologist named Person C first discovered natural products with a powerful inhibitory effect on [redacted target] in a fermentation broth of Penicillium citrinum, during their search for antimicrobial agents. The first product was named compactin (ML236B or mevastatin). Animal trials showed a very good inhibitory effect as in clinical trials; however, in a long-term toxicity study in dogs, it resulted in toxic effects at higher doses and, as a result, was believed to be too toxic to be given to humans. In 1978, Person D and colleagues at Merck Research Laboratories discovered a new natural product in a fermentation broth of Aspergillus terreus. Their product showed good [redacted target] inhibition, and they named the product mevinolin, which later became known as lovastatin.

Now, of course, for truly sensitive stuff, maybe it would still make sense to use a locally-run LLM (like BioGPT or Dolly) - these are already pretty good at entity recognition and might be sufficient for the case. And, of course, a manual review would still be helpful.

You could also imagine cases for LLMs speeding up external review - if a poster is to be published, the model could be asked to flag sensitive entities for additional review and possible redaction.

Mira Nair

Senior B2B commercial leader; 12+yrs of marketing & sales in life sciences+software

1 年

Daniel Jamieson your kind of article

回复

要查看或添加评论,请登录

Jon Hill的更多文章

  • Partners in Science: Evolving from Student to Scientific Leader

    Partners in Science: Evolving from Student to Scientific Leader

    At Boehringer Ingelheim, our commitment to engaging with our local communities is deeply ingrained in our corporate…

    1 条评论
  • Summarization and Prompting

    Summarization and Prompting

    I recently came across a preprint from Griffin Adams et al that covered a new approach called Chain of Density for…

    2 条评论
  • Don’t Confuse Consistency with Quality

    Don’t Confuse Consistency with Quality

    Earlier this summer, I’d decided that it would be a good idea to learn Microsoft Power BI. This is a tool used to…

    6 条评论
  • Leading Change

    Leading Change

    During recent travel, I had the opportunity to read Leading Change, by John Kotter, which is a sort of "business…

    1 条评论
  • LLMs and Sensitive Data

    LLMs and Sensitive Data

    My colleague Victoria Gamerman, PhD recently shared an article from Tamer Chowdhury about architecture for using…

    1 条评论
  • The Six (Prompting) Hats

    The Six (Prompting) Hats

    I had previously shared some impressions on the Six Thinking Hats method which was recommended by a colleague as a way…

    1 条评论
  • Seeing Images in Single Cell Data (Pareidolia)

    Seeing Images in Single Cell Data (Pareidolia)

    This post will describe a bit of an unusual application for generative AI. To be honest, I’m still not sure if it falls…

  • ChatG-PPi-T: Finding Interactions with OpenAI

    ChatG-PPi-T: Finding Interactions with OpenAI

    In an earlier article, I’d posted about some mixed results in using the different LLMs provided by OpenAI to answer…

    2 条评论
  • PowerPoint to Email with OpenAI

    PowerPoint to Email with OpenAI

    I was having a conversation with a colleague during his recent visit to the U.S.

    9 条评论
  • Using Chat-GPT to Generate Structured Biological Knowledge

    Using Chat-GPT to Generate Structured Biological Knowledge

    After my previous post on using Chat-GPT to explain biological findings, I was interested in digging in a bit more…

    11 条评论

社区洞察

其他会员也浏览了