登录查看更多内容

Navigating Data Scarcity: AI's Emerging Role in Biotech

Andrii Buvailo, Ph.D.

Science & Tech Communicator | AI & Digital | Life Sciences | Chemistry

发布日期: 2023年10月20日

‘Garbage in, garbage out’ is a well-known principle in the machine learning (ML) community, and it is certainly true when it comes to adopting ML-based methods in biotech and drug discovery.

According to a recent McKinsey report, ‘lack of high-quality data sources and data integration’ was named as one of the three key factors slowing down digitalization and data analytics in life sciences (the other two being lack of cross-disciplinary talent, and lack of tech adoption at scale).

My own small poll here on LinkedIn resulted in the 52% of respondents favoring ‘lack of domain-specific data’ as the biggest challenge facing AI adoption in the biotech industry (a decent part of the respondents list are subject matter experts, based on my brief review).

Tackling the problem of data scarcity?

San Francisco-based ‘techbio’ company Atomic AI developed a tool to tackle the lack of data about RNA structures.?

Atomic AI’s proprietary AI-driven 3D RNA structure engine, known as PARSE, generates RNA structural datasets, integrating machine learning foundation models with large-scale, in-house experimental wet-lab biology to unveil functional binders to RNA targets.

The company’s technology has the ability to predict structured, ligandable RNA motifs at unprecedented speed and accuracy, a key barrier to current approaches to RNA drug discovery.

Atomic AI plans to use its database of discovered and designed 3D RNA structures to develop a pipeline of rationally designed small-molecule drug candidates.

What is interesting, Atomic AI is using so-called geometric deep learning, and can learn from very small RNA data.

Geometric deep learning is a subfield of machine learning that generalizes traditional neural network methodologies to data on non-Euclidean domains, such as graphs, manifolds, and complex networks. It seeks to understand data through its inherent geometric structures and relationships,

The method, called the Atomic Rotationally Equivariant Scorer (ARES), surpasses existing techniques in performance—even with training on just 18 known RNA structures. ARES's capacity to learn from minimal data addresses a significant challenge faced by typical deep neural networks. With its reliance solely on atomic coordinates and no RNA-specific details, this method has potential applications in various fields including structural biology, chemistry, and materials science, among others.

According to this Science paper, ARES operates without any predetermined ideas regarding the essential features of a structural model's accuracy. It doesn't come with any inherent understanding of double helices, base pairs, nucleotides, or hydrogen bonds. ARES's methodology isn't exclusive to RNA; it can be applied to any molecular system.

Instead of pre-defined specifications, the initial stages of the ARES network are tailored to detect structural patterns, learning their identities during training. Every layer calculates various characteristics for each atom, considering the spatial arrangement of adjacent atoms and the outcomes from the preceding layer. The only inputs for the initial layer are the 3D coordinates and the chemical element classification of every atom.

Zero-shot Learning

Another interesting example of tackling the data problem in biology was demonstrated by Canadian company Absci, focusing on designing antibodies using AI.?

领英推荐

AGI is coming soon!? Ilya believes LLMs plateaued. AI…

Steve Nouri 4 个月前

TAI #121: Is This the Beginning of AI Starting To…

Towards AI 5 个月前

AI & Startups November 18th - November 24th

Avinash A. 3 个月前

Absci has pioneered a milestone in generative AI for drug development by being the the first (as they claim) to craft and verify therapeutic antibodies using zero-shot machine learning.

What's zero-shot?

It's a machine learning approach where a model is trained on certain categories of data and is then able to make predictions or classifications on entirely new, unseen categories, often leveraging the relationships between known and unknown categories. For example, if trained on images of horses, the model might be able to recognize zebras, even if it hasn't been explicitly trained on zebra images.

In Absci’s case, antibodies are designed to latch onto certain targets without any prior training data from known antibodies for those targets.

Why is this significant? The zero-shot model by Absci produces antibody configurations distinct from existing antibody databases, encompassing de novo versions of all three heavy chain CDRs (HCDR123), the antibody regions most critical to target binding.

How efficient is this approach? In tests against over 100,000 antibodies, Absci’s success rate proved to be between five and 30 times higher than established biological benchmarks.?

Synthetic data

A quite innovative concept is the application of synthetic data to close the data gaps in those areas where real data is scarce. What is synthetic data??

Synthetic data is information that's artificially manufactured rather than generated by real-world events, but it has probability distribution similar to the real data. It, therefore, can be used for training machine learning models the same way as real data.?

For instance, there is promising evidence that state-of-the-art synthetic data models can produce artificial versions of even highly dimensional and complex genomic and phenotypic data.

Researchers from Gretel.ai, in collaboration with Illumina’s Emerging Solutions, are investigating the possibility of generating synthetic versions of real-world genomic datasets. The synthetic data crafted by Gretel preserves the structure of the original dataset while ensuring increased privacy, allowing researchers open access without jeopardizing patient confidentiality. Initial studies on a sample of 1,220 mice have shown promising results, suggesting that synthetic data can potentially revolutionize data sharing in genomics. Gretel and its collaborators aim to further refine the scalability, accuracy, and privacy of synthetic genomics data in the future.

---

Welcome to my newsletter, "Where Technology Meets Biology." I am sharing noteworthy news, trends, biotech startup picks, industry analyses, and interviews with pharma KOLs. Contact me for consulting or sponsorship opportunities here or at www.BiopharmaTrend.com.

Enjoying the newsletter? Subscribe to become part of 10K+ readers here on LinkedIn. Please help us spread the word by sharing it with your colleagues and friends.

Also, consider?joining my Substack community?where we are exploring a lot more (4.3K+ industry professionals are eading it via email).

-- Andrii

Where Technology Meets Biology

22,528 位关注者

Prof Frederic Cadet

Co-founder & Chairman of the Board at PEACCEL

1 年

Thank You for sharing Andrii Buvailo

1 次回应

Dr. Daniel Muln?s

Senior Data Scientist in Machine Intelligence | Biomolecular A.I. software developer | Bio/Chem informatician | Data Scientist

1 年

Zero-shot learning is hardly what I would consider innovative. Any properly trained supervised learning model should be able to perform this task. If your model is not able to perform well on test data that is different from your training data, then you have trained it poorly and have a bad model. Claiming that zero-shot learning is a novel and innovative thing is mostly a marketing gimmick from companies that wants to throw out fancy-sounding buzz-words to describe what everyone doing proper training in the ML community are already achieving. If you train a model like Alphafold without the ability to generalize to unseen data (i.e. sequences that are very different from the ones the method was trained on), you'd be facing a flood of criticism from the ML community that your model is memorizing the training data and has not properly learned the task of translating sequences to structures. But Alphafold was trained properly, and so it does generalize and you don't see people praising it for its zero-shot learning capabilities (They praise it for many other things, and that's well deserved).

3 次回应

查看更多评论

要查看或添加评论，请登录

Andrii Buvailo, Ph.D.的更多文章

The “4th Wave” of AI Drug Discovery is Here, According to This Report

2024年12月3日

The “4th Wave” of AI Drug Discovery is Here, According to This Report

I’ve come across a really well-put report on the evolving state of artificial intelligence (AI) in drug discovery in…

44 条评论
How Companies Adapt to Changes in Clinical Development Market?

2024年11月8日

How Companies Adapt to Changes in Clinical Development Market?

I have just participated in FT Live ???????????? ???????????? ?????? ?????????????? ???????????? ????????, a 3-day…

10 条评论
Key Trends in Aging Research: Where Are We Now?

2024年10月31日

Key Trends in Aging Research: Where Are We Now?

Over the past decade, aging research has transitioned from foundational biological studies, including a landmark…

16 条评论
A Race Towards Better Model For Blood-Brain Barrier Permeability Prediction is On

2024年10月17日

A Race Towards Better Model For Blood-Brain Barrier Permeability Prediction is On

I've just read the news from 1910 Genetics that their new AI model CANDID-CNS? achieved an 83% success rate for…

34 条评论
Microbiome, Aging, and the Synthetic Immune System: A New Frontier in Geroscience

2024年9月20日

Microbiome, Aging, and the Synthetic Immune System: A New Frontier in Geroscience

Today’s newsletter is sponsored by BiopharmaTrend, your go-to resource for news, trends, and analysis of the…

19 条评论
"Tech in Bio" Corner #2

2024年8月29日

"Tech in Bio" Corner #2

Welcome back to the second recap of the most interesting news, companies and developments with a focus on technology…

19 条评论
"Tech in Bio" Corner #1

2024年8月15日

"Tech in Bio" Corner #1

This week let's review some of the recent advances in tech+bio, and several interesting companies and trends. Here's…

18 条评论
How to Define Intelligence and Consciousness for In Silico and Organoid-based Systems?

2024年8月2日

How to Define Intelligence and Consciousness for In Silico and Organoid-based Systems?

(Peer-Reviewed Publication by Cortical Labs) A call for collaboration to define the language in all AI related spaces…

14 条评论
10 Notable Biotech Companies With Recent Major VC Rounds

2024年7月22日

10 Notable Biotech Companies With Recent Major VC Rounds

As the biotech sector recovers to pre-COVID investment levels, venture capital firms have become more selective in…

23 条评论
Embracing AI Transformation: Let’s Start with Data

2024年7月8日

Embracing AI Transformation: Let’s Start with Data

The integration of artificial intelligence (AI) into early-stage drug discovery promises to accelerate hit discovery…

8 条评论

See all articles

Navigating Data Scarcity: AI's Emerging Role in Biotech

Andrii Buvailo, Ph.D.

Science & Tech Communicator | AI & Digital | Life Sciences | Chemistry

Tackling the problem of data scarcity?

Zero-shot Learning

领英推荐

Synthetic data

Where Technology Meets Biology

22,528 位关注者

Andrii Buvailo, Ph.D.的更多文章

社区洞察

其他会员也浏览了

?? AI Agents: Quick & Easy

January 2025 Newsletter

How to Prepare for an Unpredictable Generative AI Future in Pharma

TensorFlow Ecosystems for Deep Learning, Detecting Adversarial Attacks, and Real-Time HRV-B

AI: Driving the Tech Industry and Beyond - A Report

AI For Scientific Discovery: The Future Of Research

The 2024 Nobel Prizes and Their Impact on AI and Machine Learning

Turning Point in AI?-?Leaders Rally for a Natural AI Initiative

#15 AI Research News Updates

OI versus AI - biocomputing and intelligence

Tackling the problem of data scarcity?

Zero-shot Learning

领英推荐

Synthetic data

Where Technology Meets Biology

22,528 位关注者

Andrii Buvailo, Ph.D.的更多文章

The “4th Wave” of AI Drug Discovery is Here, According to This Report

How Companies Adapt to Changes in Clinical Development Market?

Key Trends in Aging Research: Where Are We Now?

A Race Towards Better Model For Blood-Brain Barrier Permeability Prediction is On

Microbiome, Aging, and the Synthetic Immune System: A New Frontier in Geroscience

"Tech in Bio" Corner #2

"Tech in Bio" Corner #1

How to Define Intelligence and Consciousness for In Silico and Organoid-based Systems?

10 Notable Biotech Companies With Recent Major VC Rounds

Embracing AI Transformation: Let’s Start with Data

社区洞察

其他会员也浏览了

?? AI Agents: Quick & Easy

January 2025 Newsletter

How to Prepare for an Unpredictable Generative AI Future in Pharma

TensorFlow Ecosystems for Deep Learning, Detecting Adversarial Attacks, and Real-Time HRV-B

AI: Driving the Tech Industry and Beyond - A Report

AI For Scientific Discovery: The Future Of Research

The 2024 Nobel Prizes and Their Impact on AI and Machine Learning

Turning Point in AI?-?Leaders Rally for a Natural AI Initiative

#15 AI Research News Updates

OI versus AI - biocomputing and intelligence