This year was monumental in AI research globally. Over the final 10 days of 2024, we will summarize our team's efforts from this year to advance modern AI, sharing one highlight per day. ??3/10: Small Molecule Optimization with Large Language Models [Chemlactica / Chemma] Language models are remarkable. But what happens when they are combined with a proper search algorithm? What if there is an external oracle function providing feedback on the search? This project demonstrates the powerful synergy of all these components. We trained the Galactica and Gemma models on a massive corpus of small molecules (40 billion tokens!). The corpus was constructed to enable the resulting language models to understand complex prompts, such as similarity to given molecules or basic molecular properties. We integrated these models into a genetic algorithm that receives supervision signals from an external oracle function. As a cherry on top, we periodically fine-tuned the language model using the scores provided by the oracle to guide the model along the optimization trajectory in molecular space. The results are impressive: state-of-the-art performance on drug-likeness (QED) optimization (popularized by NVIDIA’s RetMol), the Practical Molecular Optimization benchmark, and several benchmarks involving protein docking simulations. Various aspects of this work were presented at the ICML ML for Life and Material Sciences Workshop and recently at the NeurIPS workshop on Foundation Models for Science (although without our physical presence, as the Canadian embassy is still processing the visa application). The preprint is available on arXiv: https://lnkd.in/eREAM5af The pretraining code and optimization algorithm are available on GitHub: https://lnkd.in/egspkaT3 The models have been downloaded more than 17,500 times on HuggingFace: https://lnkd.in/edTVFzXV Model development and pretraining of the smaller models were conducted on A100 GPUs at Yerevan State University, while the larger models were trained on H100 Cloud GPUs generously provided by Nebius AI. Philipp’s work was supported by a Yandex Armenia fellowship. Philipp Guevorguian, Menua Bedrosian, Tigran Fahradyan, Gayane Chilingaryan, Hrant Khachatrian, Armen Aghajanyan
YerevaNN的动态
最相关的动态
-
As we celebrate #AIAppreciationDay, I can’t help but feel immense gratitude to the eight scientists behind the groundbreaking paper “Attention Is All You Need.” Published back in 2017, it was the catalyst for the development of large language models.?? ? Their seminal work introduced the Transformer architecture (the “T” in GPT), which is the foundation for modern language models and has given us a pathway to artificial general intelligence (AGI). Their work — which stands on decades of research in the area of deep learning and neural networks — exemplifies the collaborative and iterative nature of scientific progress, especially in an evolving field like AI.? ? The paper was authored by a team of researchers from Google: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan Gomez, Lukasz Kaiser and Illia Polosukhin. Their diverse background and expertise came together serendipitously to create something truly revolutionary. The fascinating backstory is chronicled so well in the Financial Times article linked in the comments.? ? At the heart of the transformer architecture is the self-attention mechanism, a novel approach that allows the models to weigh the importance of different words in a sentence in parallel and not sequentially, which was the prevailing method in the past. This shortens training time, requires less computational power and allows models to train on vast amounts of data so they can generate output that is human-like in its properties.?? ? Different models are using this approach in unique ways to enable understanding of context, nuance and the inherent ambiguity in language.? ? For the era of Generative AI to truly dawn, three things needed to happen:?? ? 1???Digitization of all our data? 2???Access to massive computational power? 3???Mathematical breakthrough in connecting these together efficiently?? ? The first two happened over decades with the evolution of the world wide web and development of powerful GPUs by companies like NVIDIA. But the third ingredient was necessary for LLMs to scale and advance. The transformer architecture gave us that important piece.?? ? We owe a huge debt of gratitude to the scientists behind this revolutionary concept. My team and I are inspired by both the technical breakthrough and the power of people truly working together.?? ? We are doing our part to innovate on GenAI, building on our 30-year legacy of pioneering AI for our clients and ecosystem (with over 40 patents and a dozen products using transformer technology). On all our behalf, thank you Vaswani et al for your work and we wish you even more success in your current endeavors.? ? Now, tell me in the comments what you’re thankful for this AI Appreciation Day!?
要查看或添加评论,请登录
-
-
Day 1467: When was the term "artificial intelligence" coined and how did it all start? ? ? ? ? ? Groundwork for AI: 1900-1950 In the early 1900s, there was a lot of media created that centered around the idea of artificial humans. So much so that scientists of all sorts started asking the question: is it possible to create an artificial brain? Some creators even made some versions of what we now call “robots” (and the word was coined in a Czech play in 1921) though most of them were relatively simple. These were steam-powered for the most part, and some could make facial expressions and even walk. 1921:?Czech playwright Karel ?apek released a science fiction play “Rossum’s Universal Robots” which introduced the idea of “artificial people” which he named robots. This was the first known use of the word. 1929:?Japanese professor Makoto Nishimura built the first Japanese robot, named?Gakutensoku. 1949:?Computer scientist Edmund Callis Berkley published the book “Giant Brains, or Machines that Think” which compared the newer models of computers to human brains. Birth of AI: 1950-1956 This range of time was when the interest in AI really came to a head. Alan Turing published his work “Computer Machinery and Intelligence” which eventually became The Turing Test, which experts used to measure computer intelligence. The term “artificial intelligence” was coined and came into popular use. 1950:?Alan Turing published “Computer Machinery and Intelligence” which proposed a test of machine intelligence called The Imitation Game. 1952:?A computer scientist named?Arthur Samuel?developed a program to play checkers, which is the first to ever learn the game independently. 1956:?John McCarthy?held a workshop at Dartmouth on “artificial intelligence” which is the first use of the word, and how it came into popular usage. The Dartmouth Summer Research Project on Artificial Intelligence was a seminal event for artificial intelligence as a field. ?In 1956, a small group of scientists gathered for the Dartmouth Summer Research Project on Artificial Intelligence, which was the birth of this field of research. The initial meeting was organized by?John McCarthy, then a mathematics professor at the College. In his proposal, he stated that the conference was “to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it." Since 1956, research and development into the field of AI has increased, albeit two six years periods in which little or no progress was made, which are referred to as AI winters. In the?history of artificial intelligence, an?AI winter?is a period of reduced funding and interest in?artificial intelligence?research.The field has experienced several?hype cycles, followed by disappointment and criticism, followed by funding cuts, followed by renewed interest years or even decades later. ? #learnonethingaday
要查看或添加评论,请登录
-
-
Yesterday evening, I listened to an outstanding lecture on the future of AI at my grad school alma matter #universityofchicago from Prof. Stuart Russell. He gave a very solid argument on how to regulate the evolution of AI tools that are on the path to General AI. Specifically, he argues that government and industry must shift focus from a "guardrails" approach to requiring use of two design principles: Machines must be designed to act in the best interest of humans; and machines must be designed with explicit uncertainty about what those interests are. While this may seem very simplistic at a prima facia level, they reflect very thoughtful technical insights regarding the math and coding structures of Large Language Models (LLMs), neural networks, and other AI technologies. Professor Russell's presentation was one of the most well-thought-out discussions I have encountered on this topic, and hopefully will be posted soon on YouTube. https://lnkd.in/eWf8_fmF
要查看或添加评论,请登录
-
HEMANTH LINGAMGUNTA Integrating top equations of physics into the training of Large Language Models (LLMs), Vision-Language Models (VLMs), and APIs represents a cutting-edge approach in artificial intelligence. This integration can enhance the models' understanding and application of complex scientific principles, leading to more accurate and efficient AI systems. The Intersection of Physics and AI Recent advancements in AI have shown the potential of combining physics with machine learning. This approach, often referred to as Physics-Informed Machine Learning (PIML), uses physical laws and equations to inform and guide the training of AI models. By embedding conservation principles and differential equations into the learning process, these models can achieve higher accuracy and reliability, especially in fields where physical laws are paramount[4]. Applications in LLMs and VLMs 1. Enhanced Model Training: Incorporating physics equations can improve the training of LLMs and VLMs by providing additional context and constraints, leading to models that better understand and predict real-world phenomena. 2. Improved Performance: Models like Code Llama and Google's Gemini have demonstrated the effectiveness of specialized training datasets and infrastructure, which can be further enhanced by integrating physics-based data and principles[2][3]. 3. Broader Applications: This integration opens up new possibilities in various domains, such as scientific research, engineering, and environmental modeling, where understanding complex systems is crucial. Cutting-Edge Technologies The development of these advanced models involves using state-of-the-art technologies, such as Tensor Processing Units (TPUs) and sophisticated training algorithms, to handle the computational demands of integrating large datasets and complex equations[3]. These technologies enable the efficient scaling and deployment of models across different platforms and applications. Conclusion Integrating physics into AI model training is a promising frontier that combines the strengths of traditional scientific methods with modern AI capabilities. This approach not only enhances the performance of LLMs and VLMs but also expands their applicability in solving complex, real-world problems. Share this Idea:- Join the conversation on integrating physics with AI by sharing this post with your network. #PhysicsInAI #MachineLearning #AIInnovation #LLMs #VLMs #APIs #TechRevolution Citations: [1] Understanding LLMs: A Comprehensive Overview from Training to ... https://lnkd.in/gBd8Ue55 [2] 5 Recent AI Research Papers - Encord https://lnkd.in/gvgq83v4 [3] Google Launches Gemini, Its New Multimodal AI Model - Encord https://lnkd.in/ghvd8y2b [4] Integrating Physics with Machine learning: A promising frontier in AI https://lnkd.in/gc8mM47V
要查看或添加评论,请登录
-
New algorithm from NVIDIA speeds up LLMs by up to 11x. Transformer-based Large Language Models (LLMs) have revolutionized natural language processing, but their quadratic complexity in self-attention poses significant challenges for inference on long sequences. Star Attention addresses this issue by introducing a two-phase block-sparse approximation that shards attention across multiple hosts while minimizing communication overhead. Here's how it works: 1. The context is split into blocks, each prefixed with an anchor block. 2. Context tokens are processed using blockwise-local attention across hosts, in parallel. 3. Query and response tokens attend to all prior cached tokens through sequence-global attention. The technique seamlessly integrates with most transformer-based LLMs trained with global attention, significantly reducing memory requirements and inference time (by up to 11x). As Star Attention is scaled to even longer sequences (up to 1M) and larger models, the speedups become even more impressive. However, there are still open questions around the optimal size of anchor blocks and performance on more complex long-context tasks. But the trend is clearly ongoing: faster models that require less memory. ↓ Liked this post? Join my newsletter with 50k+ readers that breaks down all you need to know about the latest LLM research: llmwatch.com ??
要查看或添加评论,请登录
-
?? NVIDIA's Star Attention is a breakthrough in optimizing Large Language Models (LLMs)! The ability to address the quadratic complexity of self-attention while achieving up to 11x speedups is a game-changer, especially for long-context tasks. ?? ?? The two-phase block-sparse approximation is a brilliant innovation. By splitting the context into blocks with anchor prefixes and leveraging blockwise-local attention across hosts, it effectively reduces computational overhead while maintaining accuracy. The inclusion of sequence-global attention ensures that query and response tokens still capture the full context. ? The scaling potential is enormous. With support for sequences up to 1M tokens, Star Attention not only reduces memory requirements but also makes inference times significantly faster. For transformer-based LLMs, this opens doors to new possibilities for handling large datasets and real-time applications. ?? Key questions remain about optimizing anchor block sizes and performance on complex tasks, but the trend toward faster, memory-efficient LLMs is clear. Innovations like Star Attention are setting the stage for the next generation of transformers, making them more practical for real-world use cases. ?? Thanks for sharing this exciting development! The pace of innovation in LLM research is incredible, and NVIDIA’s contributions are taking efficiency to a whole new level. Can’t wait to see how Star Attention evolves and impacts future models! ?? #LLMs #StarAttention #AI #NLP #TransformerModels #Innovation
New algorithm from NVIDIA speeds up LLMs by up to 11x. Transformer-based Large Language Models (LLMs) have revolutionized natural language processing, but their quadratic complexity in self-attention poses significant challenges for inference on long sequences. Star Attention addresses this issue by introducing a two-phase block-sparse approximation that shards attention across multiple hosts while minimizing communication overhead. Here's how it works: 1. The context is split into blocks, each prefixed with an anchor block. 2. Context tokens are processed using blockwise-local attention across hosts, in parallel. 3. Query and response tokens attend to all prior cached tokens through sequence-global attention. The technique seamlessly integrates with most transformer-based LLMs trained with global attention, significantly reducing memory requirements and inference time (by up to 11x). As Star Attention is scaled to even longer sequences (up to 1M) and larger models, the speedups become even more impressive. However, there are still open questions around the optimal size of anchor blocks and performance on more complex long-context tasks. But the trend is clearly ongoing: faster models that require less memory. ↓ Liked this post? Join my newsletter with 50k+ readers that breaks down all you need to know about the latest LLM research: llmwatch.com ??
要查看或添加评论,请登录
-
New algorithm from NVIDIA speeds up LLMs by up to 11x. Transformer-based Large Language Models (LLMs) have revolutionized natural language processing, but their quadratic complexity in self-attention poses significant challenges for inference on long sequences. Star Attention addresses this issue by introducing a two-phase block-sparse approximation that shards attention across multiple hosts while minimizing communication overhead. Here's how it works: 1. The context is split into blocks, each prefixed with an anchor block. 2. Context tokens are processed using blockwise-local attention across hosts, in parallel. 3. Query and response tokens attend to all prior cached tokens through sequence-global attention. The technique seamlessly integrates with most transformer-based LLMs trained with global attention, significantly reducing memory requirements and inference time (by up to 11x). As Star Attention is scaled to even longer sequences (up to 1M) and larger models, the speedups become even more impressive. However, there are still open questions around the optimal size of anchor blocks and performance on more complex long-context tasks. But the trend is clearly ongoing: faster models that require less memory. ↓ Liked this post? Join my newsletter with 50k+ readers that breaks down all you need to know about the latest LLM research: llmwatch.com ??
要查看或添加评论,请登录
-
Titan Transformer: The LSTM Moment for Transformers In 2017, the introduction of Long Short-Term Memory (LSTM) networks addressed critical limitations in Recurrent Neural Networks (RNNs), enabling them to capture long-range dependencies and effectively process sequential data. This innovation marked a pivotal moment in deep learning, expanding the horizons of what RNNs could achieve. Fast forward to January 2025, Google unveiled the "Titans" architecture, representing a similar leap forward for Transformer-based models. Traditional Transformers, while powerful, face challenges in handling extremely long sequences due to their fixed context windows and quadratic computational complexity. Titans overcome these limitations by integrating a neural long-term memory module that learns to memorize and store historical data during inference. This allows the model to effectively manage both short-term and long-term dependencies, processing sequences with millions of tokens efficiently. Key Features of Titans: Neural Long-Term Memory Module: Inspired by human memory systems, this component captures surprising or unexpected events, determining the memorability of inputs based on a "surprise" metric. It incorporates a decaying mechanism to manage memory capacity, allowing the model to forget less relevant information over time. Memory Management: Titans handle large sequences by adaptively forgetting information that is no longer needed, achieved through a weight decay mechanism similar to a forgetting gate in modern recurrent models. The memory update process is formulated as gradient descent with momentum, enabling the model to retain information about past surprises and manage memory effectively. Efficiency and Scalability: Designed to handle context windows larger than 2 million tokens, Titans are optimized for both training and inference, making them suitable for large-scale tasks such as language modeling, time series forecasting, and genomics. By addressing the limitations of traditional Transformers, Titans represent a transformative step in AI architecture, much like LSTMs did for RNNs. This advancement opens new possibilities for processing extensive and complex data sequences, paving the way for more sophisticated and context-aware AI applications.
要查看或添加评论,请登录
-
-
? Reflecting on insightful talks from the #NLPSummit (now available to watch on demand!), here are some key lessons we have learned: - David Talby from John Snow Labs shared insights into a solution that automatically converts large amounts of raw, multi-format, multi-modal, untrusted medical data into coherent longitudinal patient stories, and presented it in action - Loubna Ben Allal from the HuggingFace shared the methodologies behind the creation of the Cosmopedia and FineWeb-Edu datasets - Maziyar P. from John Snow Labs introduced Scaling and Accelerating LLM Inference on Intel, Nvidia, and Apple Silicon - Prashanth Rao, from Kùzu, Inc., explained the process of deconstructing Graph RAG - Robert Nishihara from Anyscale described the process of Navigating the New AI Infra Challenges - Veysel Kocaman, PhD from John Snow Labs presented the results of double-blind case study, in which medical doctors compared John Snow Labs’ medical LLMs with OpenAI’s GPT-4o - Abha Godse from West Virginia University explored how WVU Medicine has harnessed unstructured patient data within their EMR system to accurately assess and assign HCC codes, by leveraging NLP models from John Snow Labs - Vishakha Sharma, PhD from Roche explained how analyzing comprehensive patient data, including genetic, epigenetic, and phenotypic information, the LLM accurately aligns individual patient profiles with the most relevant clinical guidelines - Sanjay Basu PhD from Oracle discussed the deployment process of Multi-Agent Based Agentic Workflow in Healthcare - Alina Dia Trambitas-Miron, PhD from John Snow introduced an advanced tool designed to automate key aspects of the literature review process. - Supriya Raman from JPMorgan Chase & Co. explored different quantization methods and techniques, the common libraries used, and discussed the evaluation of performance and quality of quantized #LLMs using standard metrics - Chang She from Lance DB explained how Lance format works, and the value it delivers to AI teams training models or putting applications into production. - Petros Zerfos from IBM Research discussed #LLM app development with the open-source data prep kit - Alain Biem from New York Live explained the importance of context in language models’ Watch now: https://lnkd.in/gbdNEqEC #GenerativeAI #LargeLanguageModels #LLM #ResponsibleAI #DigitalTransformation #RAG #RetrivalAugmentedGeneration #HealthcareAI #ClinicalLLM #DigitalHealth #PromptEngineering #MedicalChatbot #MedicalGPT
要查看或添加评论,请登录
-
-
In the realm of AI, where algorithms reign ??, A curious conundrum arises a math pain ??. AI systems, built on logic and code ??, Struggle with math, a fundamental roadblock of olde ??. The irony is rich; the irony is bright ??, AI, the epitome of computational might ??, Falters on math, an essential skill, so bright ??, It's a paradox that's both amusing and quite a sight ??. Research suggests that AI's math woes ??, Stem from limitations in algorithmic flows ??, Roundoff errors, numerical instability ??, And biases in training data, a math tragedy ???. https://lnkd.in/gNSdhYFW
Mathematical paradox demonstrates the limits of AI
cam.ac.uk
要查看或添加评论,请登录
Chemist-Data Scientist
2 个月Congratulations on presenting on the NeurIPS??