登录查看更多内容

??Top ML Papers of the Week

DAIR.AI

Democratizing Artificial Intelligence Research, Education, and Technologies

发布日期: 2024年10月6日

Welcome to The Top ML Papers of the Week (September 30 - October 6).

1). Movie Gen - a set of foundation models to generate high-quality, 1080p HD videos, including different aspect ratios and synchronized audio; the 30B parameter model supports a context length of 73K video tokens, which enables generation of 16-second videos at 16fps; it also presents a 13B parameter video-to-audio generation model and a novel video editing model that’s attained via post-training; achieves state-of-the-art performance on tasks such as text-to-video synthesis, video personalization, video-to-audio generation and more. (paper | tweet )

2). Were RNNs All We Needed? - revisits RNNs and shows that by removing the hidden states from input, forget, and update gates RNNs can be efficiently trained in parallel; this is possible because with this change architectures like LSTMs and GRUs no longer require backpropagate through time (BPTT); they introduce minLSTMs and minGRUs that are 175x faster for a 512 sequence length. (paper | tweet )

3). LLMs Know More Than They Show - finds that the "truthfulness" information in LLMs is concentrated in specific tokens; this insight can help enhance error detection performance and further mitigate some of these issues; they also claim that internal representations can be used to predict the types of errors the LLMs are likely to make. (paper | tweet )

4). Architecture Search Framework for Inference-Time Techniques - introduces a modular framework for building and optimizing LLMs by combining multiple inference-time techniques; this approach reframes the challenge of LLM system design as a hyperparameter optimization problem; tested on benchmarks including MT-Bench and CodeContests, Archon surpasses leading models such as GPT-4o and Claude 3.5 Sonnet, achieving a 15.1% average accuracy improvement. (paper | tweet )

Sponsor message

DAIR.AI is excited to introduce a new catalog of self-paced courses in prompt engineering and LLMs. Join the academy to learn how to build effectively with AI.

Use code PROMPTING20 to get an extra 20% discount. Only valid to the first 500 enrollments.

Join Now!

Towards AI 2 个月前

Celebrating a crazy month of Open Multimodal LLM…

Thomas Wolf 1 个月前

Artificial Intelligence #210

Andriy Burkov 9 个月前

5). RATIONALYST - a model for process-supervision of reasoning that enables generalization across diverse reasoning tasks; this process is achieved with pre-training on a collection of 79k rationales from the Pile and a combination of reasoning datasets with minimal human intervention; fine-tuned from LLaMa-3-8B, the proposed model improves the accuracy of reasoning by an average of 3.9% on 7 reasoning benchmarks. (paper )

6). An Analysis of o1-preview - reports that large reasoning models like o1-preview, while improving on more difficult tasks, display similar qualitative trends as previous LLMs; o1 is sensitive to the probability of examples and tasks, performing better and requiring fewer “thinking tokens” in high-probability settings than in low-probability ones. (paper | tweet )

7). FRAMES - a unified framework to evaluate an LLM’s ability to provide factual responses, assess retrieval capabilities, and the reasoning required to generate final responses; includes multi-hop questions that require the integration of information from multiple sources; reports that state-of-the-art LLMs struggle on the task and only achieve 40% accuracy with no retrieval; the proposed multi-step retrieval approach improves performance to 66% accuracy. (paper | tweet )

8). Not All LLM Reasoners Are Created Equal - investigates in depth the grade-school math problem-solving capabilities of LLMs; reports that LLMs show a significant gap in reasoning; finds that LLMs display a huge performance difference when solving compositional pairs and solving questions independently. (paper | tweet )

9). Evaluation of o1 - provides a comprehensive evaluation of OpenAI's o1-preview LLM; shows strong performance across many tasks such as competitive programming, generating coherent and accurate radiology reports, high school-level mathematical reasoning tasks, chip design tasks, anthropology and geology, quantitative investing, social media analysis, and many other domains and problems. (paper | tweet )

10). Designing Priors for Better Few-Shot Image Synthesis - training generative models like GAN with limited data is difficult; current Implicit Maximum Likelihood Estimation approaches (IMLE) have an inadequate correspondence between latent code selected for training and those selected during inference; the proposed approach, RS-IMLE, changes the prior distribution for training which improves test-time performance and leads to higher quality image generation. (paper | tweet )

Reach out to [email protected] if you would like to promote with us. Our newsletter is read by over 90K AI Researchers, Engineers, and Developers.