登录查看更多内容

Gene-associated Disease Discovery using LLM

Manoj Kumar

Healthcare AI leader with deep expertise in genomics and precision medicine

发布日期: 2024年3月11日

This paper [2401.09490] Gene-associated Disease Discovery Powered by Large Language Models (arxiv.org) published in Jan 2024 describes the framework for disease discovery that is associated with gene alterations using LLM.

The framework employs Large Language Models (LLMs) in this case GPT-4 for the discovery of diseases associated with specific genes. This framework aims to automate the labor-intensive process of sifting through medical literature for evidence linking genetic variations to diseases, thereby enhancing the efficiency of disease identification. The approach involves using LLMs to conduct literature searches, summarize relevant findings, and pinpoint diseases related to specific genes.

Current process

The physician usually searches for evidence in the medical literature that somehow is relevant to the genetic variations of interest, then analyzes the evidence related to each of the variations and identifies the potential disease the patient may have.

Depiction of disease discovery process in clinical practice. It begins with the patient (A) visiting a clinic and undergoing genetic sequencing (B). The physician (C) then analyzes the sequencing results to pinpoint suspicious genetic variations. Subsequently, the physician searches databases or medical literature (D) for records pertinent to these specific genes (E). Finally, the potential disease related to these genes is identified. Our framework is designed to automate the labor-intensive steps from (D) to (F)

Current challenges

The task of sifting through literature for evidence is exceedingly laborious, given the potential existence of thousands of papers concerning a specific gene. The researcher is tasked with the meticulous job of pinpointing those documents that specifically contain insights demonstrating the association of the gene with a particular disease. This process demands significant time and attention to detail, as it involves discerning the most relevant and informative studies from a vast sea of academic research.

领英推荐

Using a Generative AI Assistant to Interpret…

Celeste Miranda 12 个月前

AI in Clinical Genetics: A New Era in the Evaluation…

Arturo Israel Lopez Molina 7 个月前

Unlocking top 10 #GenAI health sciences use cases

Pankaj Bakshi 1 年前

Proposed solution in this paper

Framework powered by LLMs for discovering diseases associated with specific genes. This framework is capable of conducting a literature search based on specified genes, summarizing the retrieved literature, and identifying diseases related to the input genes. Utilizing this framework, the extensive and complex process of literature retrieval and summarization to identify potential diseases from specific genes can be significantly streamlined and automated.

Framework of the proposed method. The framework starts from specific genes suspicious to cause disease of the patient. Then the PubMed API is leveraged to search literature regarding these genes by criteria such as relevance or time. Top K papers are then selected and queried based on crafted prompts by LLMs (e.g., GPT-4). During this phase, the content of the literature is analyzed by LLMs. Relevant diseases are identified and ranked through the in-context learning capabilities of Large Language Models LLMs. This process is iterated several times, with diseases being re-ranked based on the frequency of their occurrence in the outputs

Paper review

Gen AI could be used to reduce the time and burden of performing literature reviews in various research-related fields. This is one such example where Gen AI has proven useful.
The paper talks about RAG (Retrieval Augmentation Generation) techniques but does not go in too much detail on how this was performed. Parsing of documents, cracking, chunking, vectorization, and vector store updates are some of the important retrieval concepts in unstructured data RAG that need to be given importance for more accurate results.
Langchain was used for orchestration. The authors have not specified the version of Langchain used. One must be careful while using Langchain as there were security vulnerabilities found for versions up to 0.0.131 - CVE-2023-29374: LangChain Code Injection Vulnerability (vulert.com)
Grounding is a very important concept very relevant to healthcare. Grounding refers to the ability to connect model output to verifiable sources of information. By providing models with access to specific data sources, grounding tethers their output to concrete data, reducing the chances of inventing content. Accuracy and reliability are paramount in various scenarios, such as financial reporting and health reporting. Grounding ensures that the model’s responses are anchored to specific information, enhancing their trustworthiness and applicability. Reduce model hallucinations (Instances where the model generates content that isn’t factual), anchoring model responses (Tying them to specific, verifiable data), and Enhancing trustworthiness (Ensuring the generated content aligns with reliable sources) are some of the important benefits of grounding. A beautiful article to be referred to understand the importance of grounding is Why Grounding of Generative AI Matters - by Hadas Bitran (substack.com) - Credits Hadas Bitran
The most powerful gen AI model GPT-4 was used for this experiment. Similar results could be achieved if tried on some advanced SLM (Small Language Models) like Phi-2 (Phi-2: The surprising power of small language models - Microsoft Research) which is a 2.7 billion-parameter language model that demonstrates outstanding reasoning and language understanding capabilities, showcasing state-of-the-art performance among base language models with less than 13 billion parameters.
Another LLM that could be used would be BioGPT: generative pre-trained transformer for biomedical text generation and mining | Briefings in Bioinformatics | Oxford Academic (oup.com) which is a domain-specific generative pre-trained Transformer language model for biomedical text generation and mining.

Conclusion

The framework in the paper automates the labor-intensive process of discovering diseases associated with specific genes. It would be beneficial to relook at the retrieval techniques. There are plug-and-play tools that can be used for gen AI solution orchestration. It is very important to look at security, responsible AI measures, modern LLM evaluation techniques, and grounding while proposing any gen AI solutions. It is interesting to see that more and more research is being done in implementing solutions that are catering to genomics and drug discovery study areas which is both encouraging and fascinating.

要查看或添加评论，请登录

Manoj Kumar的更多文章

Value-Based Competition in Healthcare

2021年8月3日

Value-Based Competition in Healthcare

Read time [5 mins] Michael Proter, the management guru in his book, "Redefining Healthcare" describes a win-win…

1 条评论
Snakemake pipeline on Azure CycleCloud (Part 2)

2021年4月30日

Snakemake pipeline on Azure CycleCloud (Part 2)

Read time [5 mins], Experimentation time [1 hr.], Audience - Bioinformatician, Data engineers.

1 条评论
Snakemake pipeline on Azure CycleCloud (Part 1)

2021年4月29日

Snakemake pipeline on Azure CycleCloud (Part 1)

Read time [10 mins], Experimentation time [1 hr.], Audience - Bioinformatician, data engineers Disclaimer: I am not a…

4 条评论
Evolution of Health IT standards

2020年10月14日

Evolution of Health IT standards

Here is a timeline view of how healthcare IT standards evolved over time. Created this timeline view using content from…
Product mindset to data & information

2020年6月29日

Product mindset to data & information

"Data products", "Information products" or "data & information products" are often interchangeably used and are often…
Basics of Universal Healthcare

2019年12月20日

Basics of Universal Healthcare

The following blog covers basics of universal health care, countries that work on this kind of healthcare, advantages…
Power of Association

2019年8月3日

Power of Association

[Time to Read - 3 mins] They say the books we read and the people we associate with will determine where we will be 5…
How can technology help kids with Down Syndrome?

2019年7月29日

How can technology help kids with Down Syndrome?

[Update 30th July 2020] - Wow, what an amazing experience working with folks from Gigi's, Evolv Rehab and Microsoft. We…

16 条评论
Brief history of U.S. Healthcare

2019年7月1日

Brief history of U.S. Healthcare

Last month, I presented to an internal audience on the future trends in data and artificial intelligence shaping…

2 条评论
Genomics learning resources

2018年8月22日

Genomics learning resources

If you are like me, you would probably take interest in progress happening around analytics in healthcare. I am…

See all articles

社区洞察

Biotechnology

How can you identify alternative splicing events with RNA-seq data?

Gene-associated Disease Discovery using LLM

Manoj Kumar

Healthcare AI leader with deep expertise in genomics and precision medicine

领英推荐

Manoj Kumar的更多文章

社区洞察

其他会员也浏览了

AI Transforms Genomic Data Into Actionable Insights, Revolutionizing Healthcare

AI and Genomics

Breaking Barriers: How AI, Genetic Testing, and Partnerships Revolutionize Rare Disease Diagnosis

Genomic Data and Precision Medicine: A Pinnacle in Healthcare Evolution

Overcoming the Challenges of Variant Interpretation in Precision Genomics

Unlocking the Future of Healthcare: The Power of Digital Genome

The future of healthtech and healthcare series: Whole genome sequencing is the answer to the future

Genoox enables precision medicine through analyzing genetic data!

Deciphering DNA, Genomics Reshaping the Future of Health

Bringing Therapeutics to Rare Diseases: The Artificial Intelligence Way

领英推荐

Manoj Kumar的更多文章

Value-Based Competition in Healthcare

Snakemake pipeline on Azure CycleCloud (Part 2)

Snakemake pipeline on Azure CycleCloud (Part 1)

Evolution of Health IT standards

Product mindset to data & information

Basics of Universal Healthcare

Power of Association

How can technology help kids with Down Syndrome?

Brief history of U.S. Healthcare

Genomics learning resources

社区洞察

其他会员也浏览了

AI Transforms Genomic Data Into Actionable Insights, Revolutionizing Healthcare

AI and Genomics

Breaking Barriers: How AI, Genetic Testing, and Partnerships Revolutionize Rare Disease Diagnosis

Genomic Data and Precision Medicine: A Pinnacle in Healthcare Evolution

Overcoming the Challenges of Variant Interpretation in Precision Genomics

Unlocking the Future of Healthcare: The Power of Digital Genome

The future of healthtech and healthcare series: Whole genome sequencing is the answer to the future

Genoox enables precision medicine through analyzing genetic data!

Deciphering DNA, Genomics Reshaping the Future of Health

Bringing Therapeutics to Rare Diseases: The Artificial Intelligence Way