登录查看更多内容

??Top ML Papers of the Week

DAIR.AI

Democratizing Artificial Intelligence Research, Education, and Technologies

发布日期: 2024年9月29日

Welcome to The Top ML Papers of the Week (September 23 - September 29).

1). Llama 3.2 - presents small and medium-sized vision LLMs (11B and 90B parameters), and lightweight, text-only models (1B and 3B); the text-only models are trained to support context length of 128K tokens and outperform other models in their class on a range of tasks; vision models exceed other models such as Claude 3 Haiku on image understanding tasks. (paper | tweet )

2). Molmo - presents a family of open, state-of-the-art multimodal AI models; the 72B model in the Molmo family outperforms others in the class of open weight and data models; it also compares favorably against proprietary models like GPT-4o, Claude 3.5, and Gemini 1.5 on several benchmarks. (paper | tweet )

Sponsor message

DAIR.AI is excited to introduce a new catalog of self-paced courses in prompt engineering and LLMs. Join the academy to learn how to build effectively with AI.

Use code PROMPTING20 to get an extra 20% discount. Only valid to the first 500 enrollments.

Join Now!

3). AlphaChip - a reinforcement learning-based method trained to design the physical layout of chips; AlphaChip is reportedly used in three additional generations of Google’s TPU; this release includes an open-source implementation of the method to help pre-train on a variety of chip blocks to apply to new blocks; also releases a model checkpoint pre-trained on 20 TPU blocks. (paper | tweet )

Generative AI 2 个月前

Issue #284 - The ML Engineer ??

Alejandro Saucedo 6 个月前

Artificial Intelligence #121

Andriy Burkov 2 年前

4). LLMs Still Can’t Plan - evaluates whether large reasoning models such as o1 can plan; finds that a domain-independent planner can solve all instances of Mystery Blocksworld but LLMs struggle, even on small instances; o1-preview is effective on the task but tend to degrade in performance as plan length increases, concludes that while o1 shows progress on more challenging planning problems, the accuracy gains cannot be considered general or robust. (paper | tweet )

5). Scaled-up Instructable Model Become Less Reliable - suggests that larger and more instructable LLMs may become less reliable; investigates LLMs across three elements: difficulty concordance, task avoidance, and prompting stability; finds that early models often avoid user questions but scaled-up, shaped-up models tend to give an apparently sensible yet wrong answer much more often, including errors on difficult questions that human supervisors frequently overlook. (paper | tweet )

6). Logic-of-Thought - proposes a new prompting technique called Logic-of-Thought (LoT) which employs propositional logic to generate and inject expanded logical information from the input context; it enhances CoT performance on the ReClor dataset by +4.35%; it improves CoT+SelfConsistency’s performance on LogiQA by +5%; it also boosts the performance of ToT on the ProofWriter dataset by +8%. (paper | tweet )

7). RAG and Beyond - presents a survey that introduces a RAG task categorization method that helps to classify user queries into four levels according to the type of external data required and the focus of the task; summarizes key challenges in building robust data-augmented LLM applications and the most effective techniques for addressing them. (paper | tweet )

8). A Preliminary Study of o1 in Medicine - provides a preliminary exploration of the o1-preview model in medical scenarios; shows that o1 surpasses the previous GPT-4 in accuracy by an average of 6.2% and 6.6% across 19 datasets and two newly created complex QA scenarios; identifies hallucination, inconsistent multilingual ability, and discrepant metrics for evaluation. (paper | tweet )

9). Small Language Models Survey - a comprehensive survey on small language models (SLMs) across architectures, training datasets, and training algorithms; analyzes 59 state-of-the-art open-source SLMs and capabilities such as reasoning, in-context learning, maths, and coding; other discussions include on-device runtime costs, latency, memory footprint, and valuable insights. (paper | tweet )

10). Minstrel - a multi-generative agent system with reflection capabilities to automate structural prompt generation; it presents LangGPT, an extensible framework for designing prompts; Minstrel is built on top of LangGPT and experiments demonstrate that structural prompts (either generated by Minstrel or written manually) perform better in guiding LLMs to perform tasks. (paper | tweet )

Reach out to [email protected] if you would like to promote with us. Our newsletter is read by over 90K AI Researchers, Engineers, and Developers.

Top ML Papers of the Week

72,608 位关注者

Anzhou Zhang

Principal Data Scientist @ Genentech | LLM, Generative AI

1 个月

some paper links in https://github.com/dair-ai/ML-Papers-of-the-Week/tree/main#top-ml-papers-of-the-week-september-23---september-29---2024 seem not match the papers

1 次回应

Peter Bellen

Blog for AI Articles

1 个月

A more technical article: "Self-correcting LLM"..... Read this article. LEAVE A COMMENT OR QUESTION ON THE ARTICLE SITE. Thanks. Any interaction on the Article Site is welcome If you have an idea for a new article; tell me; Thanks. English : https://aifornoobsandexperts.com/self-correcting-llm/ Dutch :?https://aivoorjanenalleman.nl/zelfcorrigerende-llm/

Habiba Zaman

Sales And Marketing Specialist at Amazon virtual assistant and freelancer

1 个月

Interesting

Leandro Cunha

Senior Consultant @ PSC Consulting | Data & AI Engineer | Advanced Analytics | ex-McKinsey QuantumBlack

1 个月

Incredible week for ML breakthroughs! ?? Big thanks to DAIR.AI for the roundup

Godwin Josh

Co-Founder of Altrosyn and DIrector at CDTECH | Inventor | Manufacturer

1 个月

Llama 3.2's fine-tuning approach on diverse datasets is notable. AlphaChip's hardware design for AI inference presents a compelling path toward efficiency gains. The "Logic-of-Thought" paper delves into the intriguing realm of reasoning within LLMs, raising questions about symbolic representation integration. How might we bridge the gap between symbolic reasoning and the statistical nature of current LLMs?

1 次回应

查看更多评论

要查看或添加评论，请登录

??Top ML Papers of the Week

DAIR.AI

Democratizing Artificial Intelligence Research, Education, and Technologies

领英推荐

Top ML Papers of the Week

72,608 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

Artificial Intelligence #117

Artificial Intelligence #107

Artificial Intelligence #87

"Causal Fundamentalism": AI/ML/LLMs/GenAI/AGI/ASI/Robotics Fundamentals

A M?bius Loop of Machine Learning: Can Models Learn from Each Other? ????

The world of classification in Machine Learning

How Machine Learning will transform complex product engineering

Embedded Machine Learning enables Artificial Intelligent Machines - 2 / 10

Machine Learning Simplified in 4 Minutes

What is AutoML, How it can help our Citizen Developers?

领英推荐

Top ML Papers of the Week

72,608 位关注者

??Top ML Papers of the Week

2024年11月17日

??Top ML Papers of the Week

2024年11月10日

??Top ML Papers of the Week

2024年11月3日

??Top ML Papers of the Week

2024年10月27日

??Top ML Papers of the Week

2024年10月20日

??Top ML Papers of the Week'

2024年10月13日

??Top ML Papers of the Week

2024年10月6日

??Top ML Papers of the Week

2024年9月22日

??Top ML Papers of the Week

2024年9月15日

??Top ML Papers of the Week

2024年9月8日

社区洞察

其他会员也浏览了

Artificial Intelligence #117

Artificial Intelligence #107

Artificial Intelligence #87

"Causal Fundamentalism": AI/ML/LLMs/GenAI/AGI/ASI/Robotics Fundamentals

A M?bius Loop of Machine Learning: Can Models Learn from Each Other? ????

The world of classification in Machine Learning

How Machine Learning will transform complex product engineering

Embedded Machine Learning enables Artificial Intelligent Machines - 2 / 10

Machine Learning Simplified in 4 Minutes

What is AutoML, How it can help our Citizen Developers?