Revisiting CoMFA with Neural Networks
31 Steroids superimposed in the Van der Waals field weights

Revisiting CoMFA with Neural Networks

This article is a preview of my presentation at the CINF session of the ACS National Meeting in Chicago in a few weeks.

When CoMFA was published in 1988 it was a momentous innovation in relating 3D chemical structures to their bioactivities. Previous to this time the protein structure was used to model binding poses, however protein structures were few. Instead of requiring knowledge of the protein structure, the binding pose could be modeled using the "Active Analog Approach" pioneered by Garland Marshall and Richard Dammkoehler at Washington University. In this approach the pharmacophore is identified and a conformational search is jointly performed on all analogs to identify conformations that superimposed the pharmacophore points. Compounds superimposed this way can be imputed to be in their binding poses.

Today we have many times more protein structures, and in principle AlphaFold can model all of them. Proposing binding poses is therefore much easier. However it is still valuable to use statistical analysis to understand and quantify which parts of molecules improve activity and which reduce it.

No alt text provided for this image

The Comparative Molecular Field Analysis "CoMFA" does exactly that. First one superimposes the molecules in the putative binding mode, then a 3D field is computed using Van der Waals and electrostatic fields. Then, the variations in these fields are related to the variations in bioactivity to create a model that can predict the activity of novel compounds based on 3D shape. The explanatory diagram from the original 1988 paper is shown above. The publication used partial-least squares to create models with the overdetermined data from the field. The number of papers using this method is too large to give a bibliography here, however ScienceDirect has a dedicated CoMFA topic heading that covers many of them. Many useful derivatives have been developed in recent years, including template-based CoMFA, CoMSIA, Topomer CoMFA, HQSAR, and others.

In recent work we have rewritten the CoMFA concept in a python/keras/AI context to use the support of 3D fields in keras to implement the method, and further enable using 3D convolutions, filters and other neural network operations to study generalizing the relationships. Just as 2D convolutions are used to color photographs by recognizing hands and eyes and learning to color them wherever they are, 3D convolutions may learn to recognize pharmacophore features and reduce the need to align the molecules. In this implementation the keras training process is used for an appropriate number of epochs to allow the system to learn which molecular features impact bioactivity.

As the first test we re-ran the original data set - the superimposed steroids. The steric electrostatic and steric fields were computed then a 3D neural network was created to analyze each, and combine then to learn how to associate the 3D field points with the activities. One can then extract the neural network weights to understand which points are contributing to the bioactivity.

No alt text provided for this image

CoMFA original steroids in the weighted van der Waals computed field. The color scale from red to blue - the redder points more conducive to corticosteroid binding than the blue points.

This is the field that has "learned" which points are associated with changes in activity.


No alt text provided for this image



The same molecule set in the electrostatic field, using the same color scale.



The results of the AI method on this test set are very comparable to the original report.

On the left the training set for the neural network method is red, and the test set is blue. The predicted vs actual from the original publication is on the right.

No alt text provided for this image

The neural network approaches are much less prone to overtraining than the partial least squares formulation, leading to the ability to do a number of novel experiments. If you find this interesting, plan to attend my talk at the upcoming ACS National Meeting -

No alt text provided for this image

Charles River Laboratories are always looking at ways to improve our services and speed drug discovery by using data analytics. If you would like to join these efforts check our web pages for open positions.

Palak Mazumdar

Director - Big Data & Data Science & Department Head at IBM

1 年

?? Discover the path to your SAS Certification success at #Analyticsexam. Train like a professional, succeed like a champion! ???? #SASChampionPathway ?? www.analyticsexam.com/sas-certification

回复
Arthur M. Doweyko

Consulting Scientist, Author and Artist

2 年

Fascinating, How does this differ from HASL?

回复
Giuseppe Marco Randazzo

Manager & Staff Engineer @ Chemify | Former Endogena Therapeutics, Roche, IDSIA, UniGE | Ph.D. in Chemistry | Drug Hunter | Integrating Data Science Everywhere | Open Source Enthusiast | High Quality Software Development

2 年

Interesting. What do you mean for neural network are less prone to overtraining than pls? Thank you

回复
Bob Clark

Computational chemical biologist

2 年

Looks cool…very cool. ??

David Patterson

Retired data scientist

2 年

Matt, the following link is a little off topic but interesting; it begins "CoMFA is a folk religion..." which is how I fondly remember it. https://en.wikipedia.org/wiki/Comfa

要查看或添加评论,请登录

Matthew Clark的更多文章

社区洞察

其他会员也浏览了