AI Deep Learning Accelerates Drug Development
Deep learning is a subset of artificial intelligence (AI) that mimics the neural networks of the human brain to learn from large amounts of data, enabling machines to solve complex problems. Deep learning technology has made significant progress in the biomedical field. Researchers have developed a series of application based on deep learning for disease diagnosis, protein design, and medical image recognition. The pharmaceutical industry is also beginning to recognize the importance of deep learning technology, hoping to leverage it to accelerate drug development and reduce costs.
Application of Deep Learning in Drug Development
Previous studies have demonstrated that deep learning technology offers significant advantages in several key areas of drug development, including optimization of chemical synthesis routes, ADME-Tox prediction, target identification and validation and generation of novel molecules.
Virtual Screening: Protein-Ligand Affinity
Deep learning can learn and identify potential binding patterns by comparing known protein-small molecule binding instances. During the training process, the deep learning models continuously optimize their parameters to enhance the accuracy and reliability of their predictions.
Yelena Guttman et al. developed a CYP3A4 inhibitor prediction model based on DeepChem framework. They created a KNIME workflow for data curation and employed the DeepChem module in Maestro to build a categorical classifier. This classifier was then used to virtually screen approximately 68,900 compounds from the FooDB database, leading to the successful identification of two new CYP3A4 inhibitors[2].
ADME-Tox Prediction
Poor pharmacokinetic properties as well as toxicity issues are considered the main reasons for terminating the development process for drug candidates. Thus, there is an increasing need for robust screening methods to provide early information on absorption, distribution, metabolism, excretion, and toxicity (ADME-Tox) properties of compounds. Many studies have shown by leveraging these extensive ADME datasets, deep learning models can automatically identify and extract complex relationships between compound features and their corresponding ADMET properties. These trained models can then be used to predict the ADME properties of new compounds, thereby accelerating the process of drug discovery and development.
Liu et al. utilized directed message passing neural networks (D-MPNN, Chemprop) to predict the Nrf2 dietary-derived agonists and safety of compounds in the FooDB database. They successfully identified Nicotiflorin, a drug that exhibits both agonistic activity of Nrf2 and safety, which was validated in vitro and in vivo[3].
Optimize Chemical Synthesis Routes
In recent years, it has been seen that artificial intelligence (AI) starts to bring revolutionary changes to chemical synthesis. However, the lack of suitable ways of representing chemical reactions and the scarceness of reaction data has limited the wider application of AI to reaction prediction. Deep learning is increasingly being applied to chemical synthesis, enabling the automatic identification and extraction of features and patterns from large datasets. This capability enhances the prediction of the efficiency and selectivity of new synthesis routes, significantly accelerating drug development and production.
Li et al. introduced a novel reaction representation, GraphRXN, for reaction prediction. GraphRXN directly takes the 2D molecular structures of organic components as input and learn the task-related representations of chemical reaction automatically during training and achieves on-par or slightly better performance over the baseline models[4]. Segler et al. combined Monte Carlo tree search with an expansion policy network that guides the search, and a filter network to pre-select the most promising retrosynthetic steps[5]. These study have demonstrated that deep learning model could yield moderate to good accuracy in reaction prediction regardless of limited size of the datasets and many complex influencing variables.
Drug Screening Based on Deep Learning
The application of deep learning in the field of virtual screening primarily involves using neural networks to predict the activity or properties of compounds, thereby identifying potential candidate drugs or materials in a virtual environment. Commonly used deep learning models include Convolutional Neural Networks (CNN), Graph Neural Networks (GNN), Recurrent Neural Networks (RNN), Generative Adversarial Networks (GAN) and Transformer models.
领英推荐
In summary, deep learning is revolutionizing drug development by enhancing efficiency, accuracy, and cost-effectiveness across multiple stages of the process. As technology continues to evolve, its integration into pharmaceutical research is likely to deepen, paving the way for innovative therapeutic solutions.
References:
Products:
MedChemExpress (MCE) provides high quality virtual screening service that enables researchers to identify most promising candidates. Based on the laws of quantum and molecular physics, our virtual screening services can achieve highly accurate results. Our optimized virtual screening protocol can reduce the size of chemical library to be screened experimentally, increase the likelihood to find innovative hits in a faster and less expensive manner, and mitigate the risk of failure in the lead optimization process.
?
MCE 50K Diversity Library consists of 50,000 lead-like compounds with multiple characteristics such as calculated good solubility (-3.2 < logP < 5), oral bioavailability (RotB <= 10), drug transportability (PSA < 120). These compounds were selected by dissimilarity search with an average Tanimoto Coefficient of 0.52. There are 36,857 unique scaffolds and each scaffold 1 to 7 compounds. What’s more, compounds with the same scaffold have as many functional groups as possible, which make abundant chemical spaces.
?
With MCE's 40,662 BBs, covering around 273 reaction types, more than 40 million molecules were generated. Compounds which comply with Ro5 criteria were selected. Inappropriate chemical structures, such as PAINS motifs and synthetically difficult accessible, were removed. Based on Morgan Fingerprint, molecular clustering analysis was carried out, and molecules close to each clustering center were extracted to form this drug-like and synthesizable diversity library. These selected molecules have 805,822 unique Bemis-Murcko Scaffolds (BMS) with diversified chemical space. This library is highly recommended for AI-based lead discovery, ultra-large virtual screening and novel lead discovery.
?
MegaUni 50K Virtual Diversity Library consists of 50,000 novel, synthetically accessible, lead-like compounds. With MCE's 40,662 Building Blocks, covering around 273 reaction types, more than 40 million molecules were generated. Based on Morgan Fingerprint and Tanimoto Coefficient, molecular clustering analysis was carried out, and molecules closest to each clustering center were extracted to form a drug-like and synthesizable diversity library. The selected 50,000 drug-like molecules have 46,744 unique Bemis-Murcko Scaffolds (BMS), each containing only 1-3 compounds. This diverse library is highly recommended for virtual screening and novel lead discovery.