YerevaNN的动态

查看YerevaNN的组织主页

1,654 位关注者

This year was monumental in AI research globally. Over the final 10 days of 2024, we will summarize our team's efforts from this year to advance modern AI, sharing one highlight per day. ??3/10: Small Molecule Optimization with Large Language Models [Chemlactica / Chemma] Language models are remarkable. But what happens when they are combined with a proper search algorithm? What if there is an external oracle function providing feedback on the search? This project demonstrates the powerful synergy of all these components. We trained the Galactica and Gemma models on a massive corpus of small molecules (40 billion tokens!). The corpus was constructed to enable the resulting language models to understand complex prompts, such as similarity to given molecules or basic molecular properties. We integrated these models into a genetic algorithm that receives supervision signals from an external oracle function. As a cherry on top, we periodically fine-tuned the language model using the scores provided by the oracle to guide the model along the optimization trajectory in molecular space. The results are impressive: state-of-the-art performance on drug-likeness (QED) optimization (popularized by NVIDIA’s RetMol), the Practical Molecular Optimization benchmark, and several benchmarks involving protein docking simulations. Various aspects of this work were presented at the ICML ML for Life and Material Sciences Workshop and recently at the NeurIPS workshop on Foundation Models for Science (although without our physical presence, as the Canadian embassy is still processing the visa application). The preprint is available on arXiv: https://lnkd.in/eREAM5af The pretraining code and optimization algorithm are available on GitHub: https://lnkd.in/egspkaT3 The models have been downloaded more than 17,500 times on HuggingFace: https://lnkd.in/edTVFzXV Model development and pretraining of the smaller models were conducted on A100 GPUs at Yerevan State University, while the larger models were trained on H100 Cloud GPUs generously provided by Nebius AI. Philipp’s work was supported by a Yandex Armenia fellowship. Philipp Guevorguian, Menua Bedrosian, Tigran Fahradyan, Gayane Chilingaryan, Hrant Khachatrian, Armen Aghajanyan

  • 该图片无替代文字
Arin Avanoosyan

Chemist-Data Scientist

2 个月

Congratulations on presenting on the NeurIPS??

要查看或添加评论,请登录