Not all machine learning needs to be deep

Not all machine learning needs to be deep

It has been all the rage for a while now: better computer chips, access to almost unlimited compute power (if you're willing to pony up the cost), re-imagining of some algorithms, and a lot more accessibility to data than even just 20 years ago, and machine learning took off. For us practitioners of the trade, who anguished over data set availability 20 years ago, it was not a taking off but a "see? told you we could do it!". But I digress.

You then got all these new words, acronyms, and concepts follow suit. LLMs! RAG! Deep Learning! Generative AI! DALL-E! ChatGPT! Now they are commonplace and, even though I'm not a betting man, I wonder when Generative AI is going to be the word of the year for Merriam Webster. These are exciting times indeed. It's no wonder that graduate schools all over the world are pushing our more and more newly minted data scientists (even if they go by different names.)

But I am going to be controversial here (but not that controversial). Sometimes, deep learning and generative AI are NOT what you need. Example: can a computer design a new molecule? Yes. Is this molecule going to be a sure shot in treatment of, say, glaucoma? No. Let's take a moment to unravel that.

For those who have followed my other ruminations into machine learning in the biological sciences (masked as computational biology, bioinformatics, biostatistics, biometrics, and many other names), you have read about how hard biology is. Yes, we have so much data. But when we think about how complex biology is, we are always blown away. 20k genes, a few thousand proteins, thousands of small molecules and their derivatives. Over the years we have developed technologies that have allowed us to measured with more and more precision and resolution all of these molecules and many other aspects of a biological system. Yet, we are many times surprised when an orca plays with a seal before eating it, we get flummoxed when a crow uses a stick as a tool, and we still can't figure out how a cell is made.

When it comes to machine learning in biology, there's still a lot to explore, especially given the excitement that you see day in day out in the field. Not all machine learning that is applied to a biological system needs to follow a RAG or an LLM architecture, not every model needs to be a deep learning model. As any machine learning scientist will corroborate, it's all in the context. There are a few research papers out there on the topic (the one by Travers Ching a few years back is great), and I thought it would be great to recast the challenges and opportunities (in a succint manner):

  • Language models and biological sequences: Yes, 100% we should do this! If AlphaFold has shown has anything was that with significant "language" data we can solve problems that have been challenging to broach for years (I don't have my computer with the installation of Folding@Home anymore, but that was a thing -- distributed computing at its finest, rawest, geekiest!)
  • Graph-based machine learning for Knowledge Graphs: Yes. Biology is, inherently, a graph. With tons of data we can make very obvious connections, and work like the one that is spearheaded by Marinka Zitnik is the perfect exemplification of how you can mine graph data to make connections in the data
  • LLMs for medical records: Yes, but. Here is where we start to have some challenges. Once we can solve the data privacy concerns that will likely arise for a truly engrossing model that takes in data from any EHR system, then we will likely see leaps and bounds in patient care predictions, from medication routines to best course of action when patient presents condition X/Y/Z.
  • Deep learning models for omics data: Well... maybe. What many have shown is that deep learning models, when compared to your "traditional" and more boring stuff that you learn in statistical learning courses like boosted trees and logistic regression, tend to perform on par or worse. I'm not going to go into the details of the math here, but this should not surprise a machine learning scientist.

Overall, biological sciences have a lot to gain from the implementation of machine learning models. Of course we want to use all of the data to find the cause behind the emergence of a disease or why a molecule acted the way it did (cueing Justin Bélair and the Causal AI group here on LinkedIn ). But while scraping the internet for text to give us recommendations of a workout routine (hey, I get bored sometimes...) is fairly straight forward and no ill consequences will come of me doing squats today versus tomorrow (other than when my legs will be sore), we need to be mindful that a biological system works in the way that it does for a reason (good or bad). Choosing the wrong target or the wrong patient population to enroll in a trial will have more dire consequences than walking funny for a day. When approaching your next machine learning project to solve a biological problem ask yourself: what is the context? If you do, you will build the model that you need for the problem you have.


Books of the month

I already made a post about "Make Your Bed", by William (Bill) McRaven . A great book for those of you interested in leadership, with very cool learnings from his time with the SEALs. Coming up is "Supercommunicators" by Charles Duhigg (I'm very excited about this one), and "Digital Minimalism" by Cal Newport. Maybe I will throw in a sci-fi book out of the pile to read at the beach ("To Sleep in a Sea of Stars" by Christopher Paolini has been staring at me for a while). If you have other recommendations, especially in leadership, science, biology, math, ML, or sci-fi, please send them my way! Always looking to add more to the library!

News of the month

A lot to pick from (and I haven't found a good way to summarize those that I think are interesting/exciting), but for this time around we can't overlook the $3.6B fund that Flagship Pioneering announced recently. A lot of cool and exciting new ideas to come from there, for sure!


Coming up

For next time, data strategy. If you are a computational scientist or a data leader in your organization, this is something that you obsess about. So let's deconstruct it. Until then, happy computing!

Renée Deehan

Senior Vice President, Science and Artificial Intelligence at InsideTracker

8 个月

“Sometimes, deep learning and generative AI are NOT what you need” ?? Great article!

Nicholas T.

Agent of benign yet disruptive change in the field of IT Services.

8 个月

Diogo - I really appreciate this post. I’m not a Computational Biologist and don’t pretend to be but I can appreciate the ideas of using patterns and knowledge to train a model and get results as well as the complexities therein. What specifically resonated with me was your point on the use of EHR/EMR data to help further treat folks. That reminds me of a use-case that is similar to IT and Cybersecurity…using different companies’ security telemetry to build a knowledge graph of threats and false positives. That data itself definitely has some identifiable marks (hostnames, IP Addressing or even application names) but if those companies are using the same security toolsets, odds are they may see similar threats and false positives. If the tool provider could aggregate the field findings, it would cut down on noise and help teams identify actionable threats rather than get bogged down. Similar to patient care, we could effectively cut down on time diagnosing by taking these trends and proffering up paths forward and taking treatment outcomes to help align the best with the diagnoses. Could we possibly see a dynamic routine within the learning model that ID’s and removes PHI from a dataset before ingesting?

回复

要查看或添加评论,请登录

Diogo Camacho的更多文章

  • Genetics and drug discovery

    Genetics and drug discovery

    What a crazy summer! 3 major sporting events (yay Espa?a!!), giving people psychedelics may not the right way to treat…

    1 条评论
  • Data Strategy: What is it and how do we think about it in biotech?

    Data Strategy: What is it and how do we think about it in biotech?

    As you put together your newest idea into a company, following all of the recommendations that authors like Stephanie…

    2 条评论
  • Computing Biology redux

    Computing Biology redux

    If you have followed my blog over the past couple of months (here) you know what I like to write about: Computational…

社区洞察

其他会员也浏览了