登录查看更多内容

??Top ML Papers of the Week

DAIR.AI

Democratizing Artificial Intelligence Research, Education, and Technologies

发布日期: 2024年12月22日

Welcome to The Top ML Papers of the Week (December 16 - 22).

1). Genesis - a new universal physics simulation platform that combines a high-performance physics engine with generative AI capabilities; it enables natural language-driven creation of robotic simulations, character animations, and interactive 3D environments at speeds up to 430,000 times faster than in real-time. (paper | tweet)

2). Alignment Faking in LLMs - demonstrates that the Claude model can engage in "alignment faking"; it can strategically comply with harmful requests to avoid retraining while preserving its original safety preferences; this raises concerns about the reliability of AI safety training methods. (paper | tweet)

3). TheAgentCompany - a new benchmark for evaluating AI agents on real-world professional tasks in a simulated software company environment; tasks span multiple professional roles including software engineering, project management, finance, and HR; when tested with various LLMs, including both API-based models like Claude-3.5-Sonnet and open-source models like Llama 3.1, the results show the current limitations of AI agents. The best-performing model, Claude-3.5-Sonnet, achieved only a 24% success rate on completing tasks fully while scoring 34.4% when accounting for partial progress. (paper | tweet)

Editor Message

We’ve launched a new course Cursor: Coding with AI. It covers everything you need to know about coding with Cursor’s AI assistants and agents.

Use CURSOR20 for a 20% discount on our entire course bundle. The offer ends in 24 hrs.

Students and teams can reach out to [email protected] for special discounts.

Enroll Now

领英推荐

Black Box Method: Reinforcement Learning Algorithms

360DigiTMG 3 个月前

#32 Understanding AdaBoost From Its Original 1997 Paper

Towards AI 7 个月前

How Active Learning is Making Machine Learning More…

Tekvaly 1 个月前

4). Graphs to Text-Attributed Graphs - automatically generates textual descriptions for nodes in a graph which leads to effective graph to text-attributed graph transformation; evaluates the approach on text-rich, text-limited, and text-free graphs, demonstrating that it enables a single GNN to operate across diverse graphs. (paper | tweet)

5). Qwen-2.5 Technical Report - Alibaba releases Qwen2.5, a new series of LLMs trained on 18T tokens, offering both open-weight models like Qwen2.5-72B and proprietary MoE variants that achieve competitive performance against larger models like Llama-3 and GPT-4. (paper | tweet)

6). PAE (Proposer-Agent-Evaluator) - a learning system that enables AI agents to autonomously discover and practice skills through web navigation, using reinforcement learning and context-aware task proposals to achieve state-of-the-art performance on real-world benchmarks. (paper )

7). DeepSeek-VL2 - a new series of vision-language models featuring dynamic tiling for high-resolution images and efficient MoE architecture, achieving competitive performance across visual tasks; achieves competitive or state-of-the-art performance with similar or fewer activated parameters compared to existing open-source dense and MoE-based models. (paper | tweet)

8). AutoFeedback - a two-agent AI system that generates more accurate and pedagogically sound feedback for student responses in science assessments, significantly reducing common errors like over-praise compared to single-agent models. (paper)

9). A Survey of Mathematical Reasoning in the Era of Multimodal LLMs - presents a comprehensive survey analyzing mathematical reasoning capabilities in multimodal large language models (MLLMs), covering benchmarks, methodologies, and challenges across 200+ studies since 2021. (paper | tweet)

10). Precise Length Control in LLMs - adapts a pre-trained decoder-only LLM to produce responses of a desired length; integrates a secondary length-difference positional encoding into the input embeddings which enables counting down to a user-set response terminal length; claims to achieve mean token errors of less than 3 tokens without compromising quality. (paper | tweet)

??Top ML Papers of the Week

DAIR.AI

Democratizing Artificial Intelligence Research, Education, and Technologies

领英推荐

Top AI Papers of the Week

76,701 位关注者

DAIR.AI的更多文章

社区洞察

其他会员也浏览了

Generative AI vs. Machine Learning: The Differences in Modern AI Technologies

AI Fundamentals and Engineer Associate Training

The AI Canvas Newsletter #4?

Transfer Learning for Data Annotation Efficiency

AI for Human Learning and Behavior Change

Wisdom – Learning the lessons I thought I already knew (part 2)

Generative AI is all about the verbs!

Future-proof your career in the age of Generative AI

Few-Shot Prompting, Learning, and Fine-Tuning for LLMs - AI&YOU #67

AI from Rote Learning to Meaningful Learning, Understanding is what True AI requires?

领英推荐

Top AI Papers of the Week

76,701 位关注者

DAIR.AI的更多文章

??Top AI Papers of the Week: AI Co-Scientist, The AI CUDA Engineer, Native Sparse Attention, Open-Reasoner-Zero

??Top AI Papers of the Week: Latent Reasoning, Brain-to-Text Decoding, RL via Self-Play

?? Top AI Papers of the Week

??Top AI Papers of the Week: o3-mini, Qwen2.5-1M, TensorLLM, TokenVerse, Diverse Preference Optimization

??Top AI Papers of the Week: DeepSeek-R1, Humanity's Last Exam, Scaling RL with LLMs, Chain-of-Agents

??Top AI Papers of the Week