Revisiting CoMFA with Neural Networks
This article is a preview of my presentation at the CINF session of the ACS National Meeting in Chicago in a few weeks.
When CoMFA was published in 1988 it was a momentous innovation in relating 3D chemical structures to their bioactivities. Previous to this time the protein structure was used to model binding poses, however protein structures were few. Instead of requiring knowledge of the protein structure, the binding pose could be modeled using the "Active Analog Approach" pioneered by Garland Marshall and Richard Dammkoehler at Washington University. In this approach the pharmacophore is identified and a conformational search is jointly performed on all analogs to identify conformations that superimposed the pharmacophore points. Compounds superimposed this way can be imputed to be in their binding poses.
Today we have many times more protein structures, and in principle AlphaFold can model all of them. Proposing binding poses is therefore much easier. However it is still valuable to use statistical analysis to understand and quantify which parts of molecules improve activity and which reduce it.
The Comparative Molecular Field Analysis "CoMFA" does exactly that. First one superimposes the molecules in the putative binding mode, then a 3D field is computed using Van der Waals and electrostatic fields. Then, the variations in these fields are related to the variations in bioactivity to create a model that can predict the activity of novel compounds based on 3D shape. The explanatory diagram from the original 1988 paper is shown above. The publication used partial-least squares to create models with the overdetermined data from the field. The number of papers using this method is too large to give a bibliography here, however ScienceDirect has a dedicated CoMFA topic heading that covers many of them. Many useful derivatives have been developed in recent years, including template-based CoMFA, CoMSIA, Topomer CoMFA, HQSAR, and others.
In recent work we have rewritten the CoMFA concept in a python/keras/AI context to use the support of 3D fields in keras to implement the method, and further enable using 3D convolutions, filters and other neural network operations to study generalizing the relationships. Just as 2D convolutions are used to color photographs by recognizing hands and eyes and learning to color them wherever they are, 3D convolutions may learn to recognize pharmacophore features and reduce the need to align the molecules. In this implementation the keras training process is used for an appropriate number of epochs to allow the system to learn which molecular features impact bioactivity.
As the first test we re-ran the original data set - the superimposed steroids. The steric electrostatic and steric fields were computed then a 3D neural network was created to analyze each, and combine then to learn how to associate the 3D field points with the activities. One can then extract the neural network weights to understand which points are contributing to the bioactivity.
CoMFA original steroids in the weighted van der Waals computed field. The color scale from red to blue - the redder points more conducive to corticosteroid binding than the blue points.
This is the field that has "learned" which points are associated with changes in activity.
领英推荐
The same molecule set in the electrostatic field, using the same color scale.
The results of the AI method on this test set are very comparable to the original report.
On the left the training set for the neural network method is red, and the test set is blue. The predicted vs actual from the original publication is on the right.
The neural network approaches are much less prone to overtraining than the partial least squares formulation, leading to the ability to do a number of novel experiments. If you find this interesting, plan to attend my talk at the upcoming ACS National Meeting -
Charles River Laboratories are always looking at ways to improve our services and speed drug discovery by using data analytics. If you would like to join these efforts check our web pages for open positions.
Director - Big Data & Data Science & Department Head at IBM
1 年?? Discover the path to your SAS Certification success at #Analyticsexam. Train like a professional, succeed like a champion! ???? #SASChampionPathway ?? www.analyticsexam.com/sas-certification
Consulting Scientist, Author and Artist
2 年Fascinating, How does this differ from HASL?
Manager & Staff Engineer @ Chemify | Former Endogena Therapeutics, Roche, IDSIA, UniGE | Ph.D. in Chemistry | Drug Hunter | Integrating Data Science Everywhere | Open Source Enthusiast | High Quality Software Development
2 年Interesting. What do you mean for neural network are less prone to overtraining than pls? Thank you
Computational chemical biologist
2 年Looks cool…very cool. ??
Retired data scientist
2 年Matt, the following link is a little off topic but interesting; it begins "CoMFA is a folk religion..." which is how I fondly remember it. https://en.wikipedia.org/wiki/Comfa