??Top ML Papers of the Week'

??Top ML Papers of the Week'

Welcome to The Top ML Papers of the Week (October 7 - October 13).


1). MLE-Bench - proposes a new benchmark for the evaluation of machine learning agents on machine learning engineering capabilities; includes 75 ML engineering-related competition from Kaggle testing on MLE skills such as training models, preparing datasets, and running experiments; OpenAI’s o1-preview with the AIDE scaffolding achieves Kaggle bronze medal level in 16.9% of competitions. (paper | tweet )


2). Differential Transformer - proposes a differential attention mechanism that amplifies attention to the relevant context while canceling noise; Differential Transformer outperforms Transformer when scaling up model size and training tokens; the authors claim that since this architecture gets less "distracted" by irrelevant context, it can do well in applications such as long-context modeling, key information retrieval, hallucination mitigation, in-context learning, and reduction of activation outliers. (paper | tweet )


3). Astute RAG - proposes a novel RAG approach to deal with the imperfect retrieval augmentation and knowledge conflicts of LLMs; Astute RAG adaptively elicits essential information from LLMs' internal knowledge; then it iteratively consolidates internal and external knowledge with source awareness; Astute RAG is designed to better combine internal and external information through an interactive consolidation mechanism (i.e., identifying consistent passages, detecting conflicting information in them, and filtering out irrelevant information). (paper | tweet )


Sponsor message

DAIR.AI is excited to introduce a new catalog of self-paced courses in prompt engineering and LLMs. Join the academy to learn how to build effectively with AI.

Our readers can use code PROMPTING20 to get an extra 20% discount.

Join Now!


4). ToolGen - integrates tool knowledge directly into LLMs by representing tools as a unique token which allows the LLM to generate tool calls and arguments, enabling seamless tool invocation and language generation; experimental results with over 47,000 tools show that ToolGen achieves superior results in both tool retrieval and autonomous task completion. (paper | tweet )


5). Long-Context LLMs Meet RAG - finds that for many long-context LLMs, the quality of outputs declines as the number of passages increases; reports that the performance loss is due to retrieved hard negatives; they propose two ways to improve long-context LLM-based RAG: retrieval reordering and RAG-specific tuning with intermediate reasoning to help with relevance identification; that approaches demonstrate significant accuracy and robustness improvements on long-context RAG performance. (paper | tweet )


6). GSM-Symbolic - tests several SoTA models on a benchmark created with symbolic templates that enable diverse mathematical problems; they find that LLMs exhibit variance when responding to variations of the same questions; the performance of all the models declines by adjusting the numerical values in the question; as questions are made more challenging (e.g., increasing the number of clauses) the performance significantly deteriorates; the authors hypothesize that the observed decline in performance is due to a lack of logical reasoning in current LLMs. (paper | tweet )


7). Optima - a novel framework to enhance both communication efficiency and task effectiveness in LLM-based multi-agent systems through LLM training; proposes an iterative generate, rank, select, and train paradigm with a reward function to improve performance, token use, and communication efficiency; integrates Monte Carlo Tree Search-inspired techniques for DPO data generation to encourage diverse exploration; shows consistent improvements over single-agent baselines and vanilla MAS based on Llama 3 8B, with 2.8x performance gain with less than 10% tokens on tasks requiring heavy information exchange. (paper | tweet )


8). ScienceAgentBench - a new benchmark to rigorously assess agents built for scientific workflows; after testing it on open-weight and proprietary LLMs, the best-performing agent can only solve 32.4% of the tasks independently and 34.3% with expert-provided knowledge. (paper | tweet )


9). Addition Is All You Need - proposes an algorithm that approximates floating point multiplication with integer addition operations; it is less computationally intensive than 8-bit floating point but achieves higher precision; the authors report that applying the purposed L-Mul operation in tensor processing hardware can potentially reduce 95% energy cost by elementwise floating point tensor multiplications and 80% energy cost of dot products. (paper | tweet )


10). Persuasion and Anti-social Ability of LLMs - studies the interaction patterns of LLMs in a multi-agent setting with social hierarchy; the study was done in a specific setting involving a guard and a prisoner who seeks additional yard time or escaping from prison; finds that in the multi-agent setting where power dynamics are involved, the LLMs fail to have a conversation; they also report that agents' personas are critical in driving the behaviors of the agents. In addition, and without explicit prompting, simply assigning agents' roles lead to anti-social behavior. (paper | tweet )


Reach out to [email protected] if you would like to promote with us. Our newsletter is read by over 90K+ AI Researchers, Engineers, and Developers.

Peter Bellen

Blog for AI Articles

1 个月

"Sensortechnology and ai"?-->..... Several new articles. Leave a??LIKE??or COMMENT OR QUESTION ON : English : https://aifornoobsandexperts.com/sensortechnology-and-ai/ Dutch :?https://aivoorjanenalleman.nl/sensortechnologie-en-ai/

回复
Atharva Domale

AI & Data Science Enthusiast | B.Tech Student Specializing in Machine Learning, NLP & Generative AI | Microsoft Certified Azure AI Engineer | Oracle Cloud Generative AI Professional |

1 个月

MLE-Bench and the Differential Transformer show great potential for advancing AI capabilities. Can't wait to explore these papers further!

回复
Maheshwari Desai

Principal Data Scientist | Strategic AI Leader | Expert in Generative AI & Responsible AI Practices Transforming businesses with innovative AI/ML, deep learning, and ethical AI

1 个月

Very helpful!

回复
Ray Anderson

Afrotech ‘24 |Deep Learning VLSI Researcher | Real Estate AI | Embedded ML Engineer | Robotics

1 个月

Love reading these - what a time to be alive

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了