?? LLM Research Roundup: Tuesday Highlights
Hyun Ho Park
Quantum Algorithm Developer | Data Scientist | Professional at Computer Vision and Gen AI.
The Top LLM Papers (17 February - 23 February)
Explore the latest and most intriguing research papers in the world of Large Language Models. Whether you’re a researcher, enthusiast, or just curious, these papers offer fresh insights and developments in the field.
(1) Reasoning on a Spectrum: Aligning LLMs to System 1 and System 2 Thinking - Investigates LLM reasoning flexibility by aligning models to intuitive (System 1) and analytical (System 2) thinking. Constructs a dataset with dual reasoning answers and evaluates LLMs across benchmarks, revealing an accuracy-efficiency trade-off. Demonstrates that interpolating between reasoning styles enhances adaptability, challenging the assumption that step-by-step reasoning is always optimal.
Read More : https://arxiv.org/abs/2502.12470
(2) How Should We Build A Benchmark? Revisiting 274 Code-Related Benchmarks For LLMs - Introduces How2Bench, a 55-criteria checklist for evaluating code-related benchmarks. Analyzes 274 existing benchmarks, exposing widespread data quality issues, lack of open sourcing, and methodological flaws. Conducts a human study revealing gaps in awareness regarding data reliability and transparency, advocating for rigorous benchmarking standards.
Read More : https://arxiv.org/abs/2501.10711
(3) Baichuan-M1: Pushing the Medical Capability of Large Language Models - Introduces Baichuan-M1, a domain-specific LLM optimized for medical applications, trained from scratch on 20 trillion tokens. Balances general and medical expertise, outperforming general-purpose models in specialized medical tasks. Open-sources Baichuan-M1-14B, providing an advanced medical AI model for research and development.
Read More : https://arxiv.org/abs/2502.12671
(4) Aspect-Guided Multi-Level Perturbation Analysis of Large Language Models in Automated Peer Review - Proposes an aspect-guided perturbation framework to assess LLM robustness in automated peer review. Analyzes biases in paper, review, and rebuttal manipulation, revealing vulnerabilities such as misleading reviews influencing meta-reviews. Highlights the need for more reliable automated reviewing systems and critical evaluation methods.
Read More : https://arxiv.org/abs/2502.12510
(5) Leveraging Dual Process Theory in Language Agent Framework for Real-time Simultaneous Human-AI Collaboration - Develops DPT-Agent, a language agent framework integrating System 1 (fast, intuitive) and System 2 (deliberative, reasoning-based) decision-making. Uses a finite-state machine for real-time AI collaboration and Theory of Mind for human intent inference. Demonstrates superior performance in real-time tasks, enabling autonomous human-AI interaction.
Read More : https://arxiv.org/abs/2502.11882
That’s a wrap for this week’s edition of LLM Insights!
Hope you found these papers as fascinating and insightful. Stay tuned for next week’s roundup of the latest advancements in Large Language Models. Until then, happy reading and exploring the world of LLMs!
If you have any feedback or suggestions for future editions, feel free to reach out to me.
Best regards,
Hyunho