登录查看更多内容

Not all machine learning needs to be deep

Diogo Camacho

Biotech Executive | Comp Bio | AI/ML | Computing Biology

发布日期: 2024年7月16日

It has been all the rage for a while now: better computer chips, access to almost unlimited compute power (if you're willing to pony up the cost), re-imagining of some algorithms, and a lot more accessibility to data than even just 20 years ago, and machine learning took off. For us practitioners of the trade, who anguished over data set availability 20 years ago, it was not a taking off but a "see? told you we could do it!". But I digress.

You then got all these new words, acronyms, and concepts follow suit. LLMs! RAG! Deep Learning! Generative AI! DALL-E! ChatGPT! Now they are commonplace and, even though I'm not a betting man, I wonder when Generative AI is going to be the word of the year for Merriam Webster. These are exciting times indeed. It's no wonder that graduate schools all over the world are pushing our more and more newly minted data scientists (even if they go by different names.)

But I am going to be controversial here (but not that controversial). Sometimes, deep learning and generative AI are NOT what you need. Example: can a computer design a new molecule? Yes. Is this molecule going to be a sure shot in treatment of, say, glaucoma? No. Let's take a moment to unravel that.

For those who have followed my other ruminations into machine learning in the biological sciences (masked as computational biology, bioinformatics, biostatistics, biometrics, and many other names), you have read about how hard biology is. Yes, we have so much data. But when we think about how complex biology is, we are always blown away. 20k genes, a few thousand proteins, thousands of small molecules and their derivatives. Over the years we have developed technologies that have allowed us to measured with more and more precision and resolution all of these molecules and many other aspects of a biological system. Yet, we are many times surprised when an orca plays with a seal before eating it, we get flummoxed when a crow uses a stick as a tool, and we still can't figure out how a cell is made.

When it comes to machine learning in biology, there's still a lot to explore, especially given the excitement that you see day in day out in the field. Not all machine learning that is applied to a biological system needs to follow a RAG or an LLM architecture, not every model needs to be a deep learning model. As any machine learning scientist will corroborate, it's all in the context. There are a few research papers out there on the topic (the one by Travers Ching a few years back is great), and I thought it would be great to recast the challenges and opportunities (in a succint manner):

Language models and biological sequences: Yes, 100% we should do this! If AlphaFold has shown has anything was that with significant "language" data we can solve problems that have been challenging to broach for years (I don't have my computer with the installation of Folding@Home anymore, but that was a thing -- distributed computing at its finest, rawest, geekiest!)
Graph-based machine learning for Knowledge Graphs: Yes. Biology is, inherently, a graph. With tons of data we can make very obvious connections, and work like the one that is spearheaded by Marinka Zitnik is the perfect exemplification of how you can mine graph data to make connections in the data
LLMs for medical records: Yes, but. Here is where we start to have some challenges. Once we can solve the data privacy concerns that will likely arise for a truly engrossing model that takes in data from any EHR system, then we will likely see leaps and bounds in patient care predictions, from medication routines to best course of action when patient presents condition X/Y/Z.
Deep learning models for omics data: Well... maybe. What many have shown is that deep learning models, when compared to your "traditional" and more boring stuff that you learn in statistical learning courses like boosted trees and logistic regression, tend to perform on par or worse. I'm not going to go into the details of the math here, but this should not surprise a machine learning scientist.

Overall, biological sciences have a lot to gain from the implementation of machine learning models. Of course we want to use all of the data to find the cause behind the emergence of a disease or why a molecule acted the way it did (cueing Justin Bélair and the Causal AI group here on LinkedIn ). But while scraping the internet for text to give us recommendations of a workout routine (hey, I get bored sometimes...) is fairly straight forward and no ill consequences will come of me doing squats today versus tomorrow (other than when my legs will be sore), we need to be mindful that a biological system works in the way that it does for a reason (good or bad). Choosing the wrong target or the wrong patient population to enroll in a trial will have more dire consequences than walking funny for a day. When approaching your next machine learning project to solve a biological problem ask yourself: what is the context? If you do, you will build the model that you need for the problem you have.

领英推荐

The must-have AI skills tech pros are hunting for

Airswift 1 个月前

The Rise or Fall of Artificial Intelligence

DR. SAZZAD KHAN 1 年前

Understanding Differences Between Encoding and…

Sanjay Kumar MBA,MS,PhD 1 年前

Books of the month

I already made a post about "Make Your Bed", by William (Bill) McRaven . A great book for those of you interested in leadership, with very cool learnings from his time with the SEALs. Coming up is "Supercommunicators" by Charles Duhigg (I'm very excited about this one), and "Digital Minimalism" by Cal Newport. Maybe I will throw in a sci-fi book out of the pile to read at the beach ("To Sleep in a Sea of Stars" by Christopher Paolini has been staring at me for a while). If you have other recommendations, especially in leadership, science, biology, math, ML, or sci-fi, please send them my way! Always looking to add more to the library!

News of the month

A lot to pick from (and I haven't found a good way to summarize those that I think are interesting/exciting), but for this time around we can't overlook the $3.6B fund that Flagship Pioneering announced recently. A lot of cool and exciting new ideas to come from there, for sure!

Coming up

For next time, data strategy. If you are a computational scientist or a data leader in your organization, this is something that you obsess about. So let's deconstruct it. Until then, happy computing!

Computing Biology

1,517 位关注者

Renée Deehan

Senior Vice President, Science and Artificial Intelligence at InsideTracker

8 个月

“Sometimes, deep learning and generative AI are NOT what you need” ?? Great article!

1 次回应

Nicholas T.

Agent of benign yet disruptive change in the field of IT Services.

8 个月

Diogo - I really appreciate this post. I’m not a Computational Biologist and don’t pretend to be but I can appreciate the ideas of using patterns and knowledge to train a model and get results as well as the complexities therein. What specifically resonated with me was your point on the use of EHR/EMR data to help further treat folks. That reminds me of a use-case that is similar to IT and Cybersecurity…using different companies’ security telemetry to build a knowledge graph of threats and false positives. That data itself definitely has some identifiable marks (hostnames, IP Addressing or even application names) but if those companies are using the same security toolsets, odds are they may see similar threats and false positives. If the tool provider could aggregate the field findings, it would cut down on noise and help teams identify actionable threats rather than get bogged down. Similar to patient care, we could effectively cut down on time diagnosing by taking these trends and proffering up paths forward and taking treatment outcomes to help align the best with the diagnoses. Could we possibly see a dynamic routine within the learning model that ID’s and removes PHI from a dataset before ingesting?

查看更多评论

要查看或添加评论，请登录

Diogo Camacho的更多文章

Genetics and drug discovery

2024年8月27日

Genetics and drug discovery

What a crazy summer! 3 major sporting events (yay Espa?a!!), giving people psychedelics may not the right way to treat…

1 条评论
Data Strategy: What is it and how do we think about it in biotech?

2024年7月21日

Data Strategy: What is it and how do we think about it in biotech?

As you put together your newest idea into a company, following all of the recommendations that authors like Stephanie…

2 条评论
Computing Biology redux

2024年7月10日

Computing Biology redux

If you have followed my blog over the past couple of months (here) you know what I like to write about: Computational…

Not all machine learning needs to be deep

Diogo Camacho

Biotech Executive | Comp Bio | AI/ML | Computing Biology

领英推荐

Books of the month

News of the month

Coming up

Computing Biology

1,517 位关注者

Diogo Camacho的更多文章

社区洞察

其他会员也浏览了

Machine Learning (ML) and Artificial Intelligence (AI)

What is Artificial Intelligence (AI)?

What Are Image Embeddings for Computer Vision Data Curation?

THE SEDUCTIVE BUSINESS LOGIC OF ALGORITHMS

Fabio Cuzzolin Deciphered Epistemic Artificial Intelligence

Deep learning predicts ABC's Bachelorette/Bachelor #TheBachelorette

Computing/Machine Ontology is All We Need for AI/ML/DL/GenAI/MLL/Robotics: Machine Intelligence and Learning: Artificial Minds

Inexplicably Explainable AI

How to handle limited ground truth?

AI Glossary

领英推荐

Books of the month

News of the month

Coming up

Computing Biology

1,517 位关注者

Diogo Camacho的更多文章

Genetics and drug discovery

Data Strategy: What is it and how do we think about it in biotech?

Computing Biology redux

社区洞察

其他会员也浏览了

Machine Learning (ML) and Artificial Intelligence (AI)

What is Artificial Intelligence (AI)?

What Are Image Embeddings for Computer Vision Data Curation?

THE SEDUCTIVE BUSINESS LOGIC OF ALGORITHMS

Fabio Cuzzolin Deciphered Epistemic Artificial Intelligence

Deep learning predicts ABC's Bachelorette/Bachelor #TheBachelorette

Computing/Machine Ontology is All We Need for AI/ML/DL/GenAI/MLL/Robotics: Machine Intelligence and Learning: Artificial Minds

Inexplicably Explainable AI

How to handle limited ground truth?

AI Glossary