登录查看更多内容

??Top ML Papers of the Week

DAIR.AI

Democratizing Artificial Intelligence Research, Education, and Technologies

发布日期: 2024年10月20日

Welcome to the Top ML Papers of the Week (October 14 - 20).

1). Thinking LLMs - proposes a training method to equip LLMs with thinking abilities for general instruction-following without human-annotated data; uses an iterative search and optimization procedure to explore thought generation which enables the model to learn without direct supervision; thought candidates for each user instruction are scored with a judge model; only responses are evaluated by the Judge which determines the best and worst ones; then the corresponding full outputs are used as chosen and rejected pairs for DPO (referred to as Thought Preference Optimization in this paper). reports superior performance on AlpacaEval and Arena-Hard. (paper | tweet )

2). Model Swarms - propose a new collaborative search algorithm to adapt LLM via swarm intelligence; a pool of LLM experts collaboratively move in the weight space and optimize a utility function representing various adaptation objectives; experiments demonstrate that Model Swarms could flexibly adapt LLM experts to a single task, multi-task domains, reward models, as well as diverse human interests; improves over 12 model composition baselines by up to 21.0% across tasks and contexts. (paper | tweet )

3). First-Person Fairness in Chatbots - studies first-person fairness which involves fairness towards users interacting with ChatGPT; specifically, it measures the biases, if any, towards the users’ names; it leverages a model powered by GPT-4o to analyze patterns and name-sensitivity in the chatbot’s responses for different user names; claims that, overall, post-training significantly mitigate harmful stereotypes; also reports that in domains like entertainment and art, with open-ended tasks, demonstrate the highest level of bias (i.e., tendency to write stories with protagonists whose gender matches gender inferred from the user’s name). (paper | tweet )

Editor Message

We’ve launched the first issue of our new series, AI Agents Weekly, to help AI researchers, developers, and enthusiasts keep track of all the top developments in AI Agents. The paid weekly series will compliment our Top ML Papers of the Week newsletter and dive deeper into AI Agents research, industry tips, and developments.

Subscribe now

4). Introspection in LLMs - reports that LLMs can acquire knowledge through introspection that cannot be inferred from their training data; suggests that LLMs contain privileged information about themselves that can potentially lead to more interpretable and controllable systems; they report that this introspection ability is limited and models struggle to predict their behavior on tasks requiring reasoning over long outputs. (paper | tweet )

简柏特 11 个月前

All In On AI: How Smart Companies Win Big With…

Bernard Marr 1 年前

Overcoming the AI plateau

VentureBeat 6 个月前

5). Janus - proposes a unified autoregressive framework for multimodal understanding and generation; it decouples visual encoding into independent pathways and leverages a single transformer architecture to improve flexibility and performance on both visual understanding and generation; claims to alleviate trade-offs related to performing the vision tasks, something common in methods that rely on a single visual encoder; surpasses previous unified models and matches or exceeds the performance of task-specific models. (paper | tweet )

6). Inference Scaling for Long-Context RAG - uses two strategies to investigate scaling laws for RAG: in-context learning (DRAG) and iterative prompting (IterRAG); finds that RAG performance consistently improves with the expansion of the effective context length under optimal configurations; when optimally allocated, increasing inference computation can lead to linear gains in long-context RAG performance; this leads to the development of a computation allocation model that can provide practical guidance for optimal computation allocation in long-context RAG scenarios. (paper | tweet )

7). Agent S - a new open agentic framework that enables autonomous interaction with computers through a GUI; Agent S tackles challenges such as acquiring knowledge, planning over long-task horizons, and handling dynamic interfaces; it introduces experience-augmented hierarchical planning which leverages both search and retrieval; leverages an agent-computer interface to perform reasoning and control GUI agents; evaluation on the OSWorld benchmark shows that Agent S outperforms the baseline by 9.37% in success rate (an 83.6% relative improvement) and achieves a new state-of-the-art. (paper | tweet )

8). Model Kinship for Merging LLMs - proposes model kinship to measure the degree of similarity between LLMs; model kinship is used to build a model merging strategy (Top-k Greedy Merging with Model Kinship) which yields better performance; the authors find that this new criterion can be used to effectively and continuously perform model merging. (paper | tweet )

9). On the Planning Abilities of OpenAI’s o1 Models - reports that o1-preview is particularly strong in self-evaluation and constraint-following; also mentions that these o1 models demonstrate bottlenecks in decision-making and memory management, which are more pronounced in spatial reasoning; in particular, the models produce redundant action and struggle to generalize in spatially complex tasks. (paper | tweet )

10). CoTracker3 - proposes a new point tracking model and a new semi-supervised training recipe; enables usage of real videos without annotations during training by generating pseudo-labels using off-the-shelf teachers; the approach is simpler in architecture and training scheme leading to better results while using 1000x less data. (paper | tweet )

Reach out to [email protected] if you would like to promote with us. Our newsletter is read by over 90K+ AI Researchers, Engineers, and Developers.