登录查看更多内容

Using Chat-GPT to Generate Structured Biological Knowledge

Jon Hill

Experienced Data Scientist Focused on Computational Biology

发布日期: 2023年2月26日

After my previous post on using Chat-GPT to explain biological findings, I was interested in digging in a bit more.? Specifically, I wanted to explore the reverse idea - rather than using Chat-GPT to provide interpretation on structured data, could I flip things around and use the tool to generate core lists for further analysis?

Anyone who has been in informatics at all is familiar with some of the vendor-provided tools like Qiagen’s Ingenuity IPA or Clarivate's Metacore. These are great tools for pathway analysis and interpretation, but rely on a core database of curated discoveries from the literature.? Protein A binds protein B, protein B phosphorylates protein C, and so on.? Text mining approaches are sometimes used to supplement these (or in some tools, replace them entirely), but I was wondering how Chat-GPT, a fairly general tool, could perform at these tasks.

The results were…mixed.? This was a case where I’d found that the OpenAI API models did not do nearly as well as Chat-GPT.? Sometimes they would recover ligands for a receptor, sometimes they would just dump out lists of similar receptors.? I found that Chat-GPT itself was more convincing (note that original output was a table, but that was not supported in this format of post):

What are proteins that bind to the protein CCR1??

Entrez Gene; Gene Symbol; Relation Type
CXCL12; CXCL12; Ligand
CCL5; CCL5; Ligand
CCL3; CCL3; Ligand
CCL4; CCL4; Ligand
CXCL4; PF4; Ligand
CXCL8; IL8; Ligand
CCL15; CCL15; Ligand
CCR1; CCR1; Homodimerization
G protein subunits; GNAI2, GNAI3, GNAQ; Signal transduction
β-Arrestin; ARRB1, ARRB2; Signal transduction

That is a nice result (leaving aside the headers of "entrez gene and gene symbol" not being 100% accurate)…but without an API, I’m not exactly going to build my own interactome for all the protein-coding genes.? Worth revisiting once API-access to Chat-GPT becomes available.

百时美施贵宝 5 个月前

How Elsevier is using AI and data science to…

Elsevier for Life Sciences 9 个月前

Building Data Foundation for Biology

Andrii Buvailo, Ph.D. 6 个月前

Emboldened by this, I attempted another query - what if I could build a directory of disease-relevant animal models?

What are specific mouse models for multiple sclerosis, displayed as a table with columns for the model name, phenotype, and the reference?

Model Name; Phenotype; Reference
Experimental autoimmune encephalomyelitis (EAE); Paralysis, demyelination, and inflammation in the central nervous system; Stromnes and Goverman, 2006 (PMID: 16815317)
Theiler's murine encephalomyelitis virus-induced demyelinating disease (TMEV-IDD); Demyelination and inflammation in the central nervous system; Lipton and Miller, 2005 (PMID: 15872086)
Cuprizone-induced demyelination model; Demyelination in the central nervous system; Matsushima and Morellini, 2001 (PMID: 11676928)
Chronic relapsing experimental autoimmune encephalomyelitis (CREAE); Relapsing-remitting paralysis, demyelination, and inflammation in the central nervous system; Lyons et al., 1999 (PMID: 10207173)
Spontaneous autoimmune encephalomyelitis (S-AE); Paralysis, demyelination, and inflammation in the central nervous system; Huseby et al., 2001 (PMID: 11514607)
Myelin oligodendrocyte glycoprotein (MOG)-induced experimental autoimmune encephalomyelitis (EAE); Paralysis, demyelination, and inflammation in the central nervous system; Bettelli et al., 2003 (PMID: 12914783)
Proteolipid protein (PLP)-induced experimental autoimmune encephalomyelitis (EAE); Paralysis, demyelination, and inflammation in the central nervous system; Sun et al., 2001 (PMID: 11356892)
T cell receptor transgenic (TCR-Tg) mice; Paralysis, demyelination, and inflammation in the central nervous system; Anderson et al., 2003 (PMID: 12829604)
Inducible costimulator (ICOS)-deficient mice; Reduced severity of paralysis, demyelination, and inflammation in the central nervous system; McAdam et al., 2001 (PMID: 11546870)
Interleukin-7 receptor alpha (IL-7Rα)-deficient mice; Reduced number of T cells and less severe paralysis, demyelination, and inflammation in the central nervous system; Peschon et al., 1994 (PMID: 8001160)

Incredible! A list of models, some entirely new to me (although I don’t work on MS as an indication, so maybe that isn’t a surprise…), along with a rough description of phenotype that could allow for a good selection, and a reference.

But wait…what if we actually have a look at one of these references?? Perschon et al did actually write a paper in 1994 about IL-7Ra deficient mice. But it is cataloged under PMID 7964471.? The PMID referenced in the table is a totally different paper from 1994, that has some connection with neurons, but is not the reference that was presented.? The ICOS model’s reference is even farther off - the PMID points to paper analyzing the size of U.S. businesses.?

Maybe it is not surprising that models trained largely on narrative text do a bit better with extracting descriptions than some of these references, but it’s definitely a cause for caution. The kinds of errors that can creep in with Chat-GPT look very convincing.? I think for the immediate future, for scientific work, Chat-GPT might work more effectively when primed directly with the relevant facts (where it can then reformat them), rather than direct question answering - unless the asker is prepared to take caution in confirming the results.

Dr Raminderpal Singh

AI and GenAI Evangelist | Startup Advisor

1 年

Renier van Rooyen

Dr Raminderpal Singh

AI and GenAI Evangelist | Startup Advisor

1 年

Helpful. Thanks!

Jon Hill

Experienced Data Scientist Focused on Computational Biology

1 年

Addendum: Trying the same query with Perplexity.ai on mouse models returned well-referenced hits; just fewer in number and without the possibility of tabular formatting.

Mikael Hvidtfeldt Christensen

Scientific Software - Drug Discovery, Machine Learning, Bioinformatics

1 年

I much agree. Using GPT-3 models directly for answering biomedical questions will likely to be somewhat inaccurate and will lack the traceability to the source.? But notice, that Microsoft is taking a slightly different approach with their Bing Chat integration. Here they are retrieving external content, which is then used for context when priming GPT-3 (the Microsoft "Prometheus" integration model). This means their content is always up-to-date - and they can provide real links to the sources used when answering the question - lowering the risk for providing false information. See the example below, where I repeated your query using Bing Chat - the references are correct and based on the latest published papers and not a frozen GPT model.

1 次回应

George Okafo, Ph.D., BSc., ARCS

Program Director @ Boehringer Ingelheim | Healthcare Data and Analytics

1 年

Fascinating, thanks for sharing this.

1 次回应

查看更多评论

要查看或添加评论，请登录

Jon Hill的更多文章

Partners in Science: Evolving from Student to Scientific Leader

2024年2月23日

Partners in Science: Evolving from Student to Scientific Leader

At Boehringer Ingelheim, our commitment to engaging with our local communities is deeply ingrained in our corporate…

1 条评论
Summarization and Prompting

2023年9月25日

Summarization and Prompting

I recently came across a preprint from Griffin Adams et al that covered a new approach called Chain of Density for…

2 条评论
Don’t Confuse Consistency with Quality

2023年9月13日

Don’t Confuse Consistency with Quality

Earlier this summer, I’d decided that it would be a good idea to learn Microsoft Power BI. This is a tool used to…

6 条评论
Leading Change

2023年6月13日

Leading Change

During recent travel, I had the opportunity to read Leading Change, by John Kotter, which is a sort of "business…

1 条评论
What if LLMs are GOOD for security?

2023年6月8日

What if LLMs are GOOD for security?

I had recently shared some thoughts on appropriate security access for LLMs on confidential data, but what if LLMs…

1 条评论
LLMs and Sensitive Data

2023年5月31日

LLMs and Sensitive Data

My colleague Victoria Gamerman, PhD recently shared an article from Tamer Chowdhury about architecture for using…

1 条评论
The Six (Prompting) Hats

2023年3月28日

The Six (Prompting) Hats

I had previously shared some impressions on the Six Thinking Hats method which was recommended by a colleague as a way…

1 条评论
Seeing Images in Single Cell Data (Pareidolia)

2023年3月22日

Seeing Images in Single Cell Data (Pareidolia)

This post will describe a bit of an unusual application for generative AI. To be honest, I’m still not sure if it falls…
ChatG-PPi-T: Finding Interactions with OpenAI

2023年3月6日

ChatG-PPi-T: Finding Interactions with OpenAI

In an earlier article, I’d posted about some mixed results in using the different LLMs provided by OpenAI to answer…

2 条评论
PowerPoint to Email with OpenAI

2023年3月2日

PowerPoint to Email with OpenAI

I was having a conversation with a colleague during his recent visit to the U.S.

9 条评论

See all articles

Using Chat-GPT to Generate Structured Biological Knowledge

Jon Hill

Experienced Data Scientist Focused on Computational Biology

领英推荐

Jon Hill的更多文章

社区洞察

其他会员也浏览了

AI-Driven Biological Discovery: Advances, Challenges, and Ethical Considerations in the Post-AlphaFold Era

AI in Healthcare and Biotechnology: A Review of the Saudi Experience

AI for good: Cleora.AI created by Synerise in Biomedical Sciences.

Artificial intelligence: a great crash of hype into reality

The Power of AI in Biotechnology: Revolutionizing Innovation

The Other AI Stuff

Don't Feed Junk to Your Model

What Do Foundation Single Cell Models Mean for Pharma Leaders? | Part 1

Would you trust AI to design a Drug?

Lore #80: Weather Forecasting, Gene Editing, and Speech AI

领英推荐

Jon Hill的更多文章

Partners in Science: Evolving from Student to Scientific Leader

Summarization and Prompting

Don’t Confuse Consistency with Quality

Leading Change

What if LLMs are GOOD for security?

LLMs and Sensitive Data

The Six (Prompting) Hats

Seeing Images in Single Cell Data (Pareidolia)

ChatG-PPi-T: Finding Interactions with OpenAI

PowerPoint to Email with OpenAI

社区洞察

其他会员也浏览了

AI-Driven Biological Discovery: Advances, Challenges, and Ethical Considerations in the Post-AlphaFold Era

AI in Healthcare and Biotechnology: A Review of the Saudi Experience

AI for good: Cleora.AI created by Synerise in Biomedical Sciences.

Artificial intelligence: a great crash of hype into reality

The Power of AI in Biotechnology: Revolutionizing Innovation

The Other AI Stuff

Don't Feed Junk to Your Model

What Do Foundation Single Cell Models Mean for Pharma Leaders? | Part 1

Would you trust AI to design a Drug?

Lore #80: Weather Forecasting, Gene Editing, and Speech AI