登录查看更多内容

?? The LSTMpire Strikes Back

Pascal Biese

Daily AI highlights for 70k+ experts ???? AI/ML Engineer

发布日期: 2024年5月10日

+ 关注

In this issue:

AlphaFold: third time’s the charm
xLSTM: bringing back a classic
Evaluating LLMs for evaluating LLM

′ Subscribe here

1. Accurate structure prediction of biomolecular interactions with AlphaFold?3

Watching: AlphaFold 3 (paper)

What problem does it solve? AlphaFold 2 revolutionized the field of protein structure prediction, enabling highly accurate modeling of individual proteins and protein complexes. However, proteins often interact with other types of molecules, such as nucleic acids (DNA and RNA), small molecules (ligands), ions, and modified residues. Accurately predicting these interactions is crucial for understanding biological processes and designing new drugs. AlphaFold 3 addresses this challenge by extending the capabilities of the model to jointly predict the structure of complexes involving proteins and these other types of molecules.

How does it solve the problem? AlphaFold 3 introduces a substantially updated architecture based on diffusion models. Diffusion models have shown impressive results in generating high-quality images and have recently been applied to protein structure prediction. By leveraging this powerful framework, AlphaFold 3 can model the interactions between proteins and various other molecules within a single unified deep learning system. The model demonstrates significantly improved accuracy compared to previous specialized tools, such as state-of-the-art docking tools for protein-ligand interactions, nucleic-acid-specific predictors for protein-nucleic acid interactions, and AlphaFold-Multimer v2.3 for antibody-antigen prediction.

What's next? AlphaFold 3 could be used to identify new drug targets, design more effective drugs, and gain a deeper understanding of complex biological processes. As the model continues to improve and incorporate additional types of molecules, it may become an essential tool for researchers across various fields, from structural biology to pharmacology. Furthermore, the success of AlphaFold 3 demonstrates the potential of diffusion-based models in solving complex scientific problems, which may inspire further advancements in machine learning and its applications in the life sciences.

2. xLSTM: Extended Long Short-Term Memory

Watching: xLSTM (paper)

领英推荐

The Protein Puzzle: How AI Solved a 50-Year Biological…

Sidd TUMKUR 3 个月前

How AI is impacting drug discovery with natural…

美国化学文摘社 1 年前

Graph Neural Networks for Molecular Property…

Anand Ramachandran 4 个月前

What problem does it solve? While the Transformer architecture has become the de-facto standard for Large Language Models (LLMs) in recent years, it's important to remember that LSTMs were the original building blocks of early LLMs. The main advantage of Transformers over LSTMs is their ability to parallelize computations, particularly through the self-attention mechanism. However, LSTMs have certain desirable properties, such as their ability to capture long-term dependencies, which raises the question: can we scale LSTMs to billions of parameters while leveraging modern techniques to mitigate their limitations and make them competitive with Transformers?

How does it solve the problem? The researchers introduce several modifications to the standard LSTM architecture to create xLSTM. Firstly, they introduce exponential gating with normalization and stabilization techniques to improve the flow of information through the network. Secondly, they modify the LSTM memory structure in two ways: (i) sLSTM uses a scalar memory and update with new memory mixing, and (ii) mLSTM is fully parallelizable with a matrix memory and covariance update rule. These LSTM extensions are then integrated into residual block backbones and stacked to form the xLSTM architecture. The combination of exponential gating and modified memory structures enhances the capabilities of xLSTMs, allowing them to perform favorably compared to state-of-the-art Transformers and State Space Models.

What's next? The xLSTM architecture demonstrates that with appropriate modifications and scaling, LSTMs can still be competitive with modern Transformer-based models. This opens up new avenues for research into the potential of LSTMs and other recurrent architectures in the context of LLMs. It will be interesting to see if xLSTMs can be further improved and if they can be applied to a wider range of tasks beyond language modeling. Additionally, the techniques introduced in this paper, such as exponential gating and modified memory structures, could potentially be adapted to other architectures to enhance their performance and scaling properties.

3. Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models

Watching: Prometheus 2 (paper/code)

What problem does it solve? Evaluating the quality of outputs from Large Language Models (LLMs) is a challenging task. While proprietary models like GPT-4 are often used as a benchmark, they come with limitations in terms of transparency, control, and cost. Existing open-source evaluation models, on the other hand, have their own shortcomings. They often produce scores that differ significantly from human judgments and lack the flexibility to perform both direct assessment and pairwise ranking. Moreover, they are limited to evaluating general attributes and cannot handle custom evaluation criteria.

How does it solve the problem? Prometheus 2 is designed to address the limitations of existing open-source evaluator LMs. It achieves a closer alignment with human and GPT-4 judgments, providing more reliable and consistent evaluation scores. Additionally, Prometheus 2 offers greater flexibility by supporting both direct assessment and pairwise ranking formats. This allows users to choose the evaluation approach that best suits their needs. Furthermore, Prometheus 2 introduces the ability to evaluate based on user-defined criteria, enabling customized assessments beyond general attributes like helpfulness and harmlessness.

What's next? With its improved alignment with human judgments and enhanced flexibility, Prometheus 2 has the potential to become a valuable tool for researchers and practitioners in the field. Future work could focus on expanding its capabilities to handle a wider range of evaluation tasks, and exploring its application in real-world scenarios. Additionally, the open-source nature of Prometheus 2 encourages collaboration and contributions from the community, fostering the development of even more advanced and reliable evaluation models.

Papers of the Week:

LLM Watch

53,821 位关注者

Ferenc József Rab

Freelance at Moody's Corporation

8 个月

Nagyon király szuper!

Sean Edward Flanigan, M.S.

Resourceful data analyst with 15+ years of track record in data engineering, analytics, and marketing. Proficient in ETL processes, data modeling, and statistical analysis.

10 个月

Beautiful post!

1 次回应

Jaap Oosterbroek

Artificial Intelligence, Datascience, Healthcare

10 个月

Great post. Keep these comming please.

1 次回应

Fethi Filali, PhD

Director of Technology & Applied Research | Expertise in Data Analytics, AI/ML, Generative AI, IoT | Driving Innovation from R&D to Market | 15+ Years in Industry, 8 Years as Professor | Lifelong Learner & Public Speaker

10 个月

Check this GitHub repo which has a collection of xLMTS related ressources: https://github.com/AI-Guru/xlstm-resources

2 次回应

Sebastian Illing

Fixing Email with GenAI ?? | CTO @ MailMaestro | Serial Entrepreneur ?? | Building with AI ?? for over 6 years

10 个月

Interesting! Do you know if the model is publicly available somewhere? How does it compare in terms of latency to the LLMs?

1 次回应

查看更多评论

要查看或添加评论，请登录

Pascal Biese的更多文章

?? Quantum-Enhanced AI - It's Here

2025年3月21日

?? Quantum-Enhanced AI - It's Here

In this issue: Chinese researchers introduce quantum-enhanced fine-tuning Enabling open-source reinforcement learning…

3 条评论
?? Search-R1, Gemini Embeddings & Controlled Reasoning with L1

2025年3月14日

?? Search-R1, Gemini Embeddings & Controlled Reasoning with L1

In this issue: Emergent search behavior in LLMs Stopping reasoning models from “overthinking” The best embeddings - for…

1 条评论
?? QwQ-32B: 20x smaller than DeepSeek-R1

2025年3月7日

?? QwQ-32B: 20x smaller than DeepSeek-R1

In this issue: China just did it again: a new open source powerhouse The art of post-training reasoning models A new…

6 条评论
OpenAI Can Not Be Happy About This

2025年2月28日

OpenAI Can Not Be Happy About This

In this issue: OpenAI releases first “vibe” model Microsoft bets on data quality and efficiency When old benchmarks…
?????? One Giant Leap for AI Optimization

2025年2月21日

?????? One Giant Leap for AI Optimization

In this issue: Sakana’s AI CUDA Engineer Inner Thinking Transformers Better Code Generation for any model Accelerate…
LLM Watch#74: DeepSeek-R1 Was Only The Beginning

2025年2月14日

LLM Watch#74: DeepSeek-R1 Was Only The Beginning

In this issue: 1B model > 405B model AI winning Olympic Gold Generating world models on the fly For those of you that…

5 条评论
?? Massive Progress in Reasoning Models

2025年2月7日

?? Massive Progress in Reasoning Models

In this issue: Beating OpenAI with Open-Source 99% performance with only 1% data Chain-of-Associated-Thoughts (CoAT)…

2 条评论
??? Automatic Prompt Engineering 2.0

2025年1月31日

??? Automatic Prompt Engineering 2.0

Foreword: hi everyone, I hope you had a great week! Before we dive into this newsletter and its (hopefully) exciting…

5 条评论
?? This AI Makes Big Tech Panic

2025年1月24日

?? This AI Makes Big Tech Panic

In this issue: Re-defining what’s possible in AI DeepMind going even deeper Self-training agents are coming 1…

11 条评论
?? Google Releases Transformer 2.0

2025年1月17日

?? Google Releases Transformer 2.0

In this issue: From Transformers to Titans Smaller, weaker, yet better O1-preview-level results for $450 Interested in…

9 条评论

See all articles

?? The LSTMpire Strikes Back

Pascal Biese

Daily AI highlights for 70k+ experts ???? AI/ML Engineer

In this issue:

1. Accurate structure prediction of biomolecular interactions with AlphaFold?3

2. xLSTM: Extended Long Short-Term Memory

领英推荐

3. Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models

Papers of the Week:

LLM Watch

53,821 位关注者

Pascal Biese的更多文章

社区洞察

其他会员也浏览了

Of Algorithms and Minds: Navigating the AI-Human Partnership #14 Exploring The Dynamic Synergy Between Artificial Intelligence And Humans

Blending Biology and AI: Dr. Markus Gershater on the Future of Life Sciences

Unlocking Biological Mysteries with Physics-Guided AI

AI and Machine Learning Transformations: Unlocking Insights through Integrated Facility Data and Digital Biomarker Analysis

Of Algorithms and Minds: Navigating the AI-Human Partnership #10 Exploring The Dynamic Synergy Between Artificial Intelligence And Humans

A New One-stop LLM for Chemical and Biomedical Tasks

How AI Has Supercharged Molecular Simulation and Molecular Dynamics

Neuro-Evolution of Augmenting Topologies Algorithm

Addressing Data Bottlenecks in the Era of AI-driven Drug Discovery

AI's Divergent Concept of Reality: Redefining Our Role as Knowers

In this issue:

1. Accurate structure prediction of biomolecular interactions with AlphaFold?3

2. xLSTM: Extended Long Short-Term Memory

领英推荐

3. Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models

Papers of the Week:

LLM Watch

53,821 位关注者

Pascal Biese的更多文章

?? Quantum-Enhanced AI - It's Here

?? Search-R1, Gemini Embeddings & Controlled Reasoning with L1

?? QwQ-32B: 20x smaller than DeepSeek-R1

OpenAI Can Not Be Happy About This

?????? One Giant Leap for AI Optimization

LLM Watch#74: DeepSeek-R1 Was Only The Beginning

?? Massive Progress in Reasoning Models

??? Automatic Prompt Engineering 2.0

?? This AI Makes Big Tech Panic

?? Google Releases Transformer 2.0

社区洞察

其他会员也浏览了

Of Algorithms and Minds: Navigating the AI-Human Partnership #14 Exploring The Dynamic Synergy Between Artificial Intelligence And Humans

Blending Biology and AI: Dr. Markus Gershater on the Future of Life Sciences

Unlocking Biological Mysteries with Physics-Guided AI

AI and Machine Learning Transformations: Unlocking Insights through Integrated Facility Data and Digital Biomarker Analysis

Of Algorithms and Minds: Navigating the AI-Human Partnership #10 Exploring The Dynamic Synergy Between Artificial Intelligence And Humans

A New One-stop LLM for Chemical and Biomedical Tasks

How AI Has Supercharged Molecular Simulation and Molecular Dynamics

Neuro-Evolution of Augmenting Topologies Algorithm

Addressing Data Bottlenecks in the Era of AI-driven Drug Discovery

AI's Divergent Concept of Reality: Redefining Our Role as Knowers