登录查看更多内容

Top LLM Papers of the Week (September Week 4, 2024)

Kalyan KS

发布日期: 2024年9月27日

[1] Performance of OpenAI’s o1 in Medical Domain

OpenAI's o1 is the first LLM with an internalized chain-of-thought technique using reinforcement learning strategies. This paper evaluates o1 in medical domain? examining 3 key aspects: understanding, reasoning, and multilinguality. Experiment results show that o1 surpasses GPT-4 in accuracy by an average of 6.2% 19 datasets.? [Tweet] and [Paper]

[2] Distilling Better CoT Capabilities into Small Language Models

This paper introduces SKIntern, a novel approach to distill better CoT capabilities into small language models. SKIntern? reduces computational overhead and speeds up the reasoning process by focusing solely on the question during inference. It outperforms SOA baselines by over 5%, while reducing inference costs by up to 4× across a wide range of SLMs.??[Tweet] and [Paper]

[3] Stress Prompting Enhances LLM Performance

The paper introduces a novel set of prompts called StressPrompt and evaluate them on multiple LLMs across several tasks. The findings suggest that LLMs, like humans, perform optimally under moderate stress and? their performance declines under both low and high-stress conditions.? [Tweet] and [Paper]

[4] State-of-the-Art Financial LLMs

This paper introduces KodeXv0.1, a family of SOTA LLMs for finance domain. KodeX-8Bv0.1 and KodeX-70Bv0 built on the top of Llama 3.1 8B and 70B outperform GPT-4 in financial question answering. These LLMs are developed by performing RAG-aware 4bit LoRA instruction tuning over high-quality, synthetic dataset consisting of Context-Question-Answer triplets generated from publicly available financial documents such as earnings calls and business reports. [Tweet] and [Paper]

[5] Small Language Models in the LLMs Era (Survey)

This paper provides? a? comprehensive survey of small language models discussing their capabilities, runtime costs, and innovations in recent years. By analyzing 59 state-of-the-art open-source SLMs? across three axes: architectures, training datasets, and training algorithms, the paper presents valuable insights? and potential future research directions.? [Tweet] and [Paper]

Sebastian Raschka, PhD 1 年前

To Data & Beyond Week 22 Summary

Youssef Hosni 5 个月前

ChatGPT vs Gemini; Uncertainty Quantification in…

Danny Butvinik 9 个月前

[6] FlexRAG

RAG systems need to encode the lengthy retrieved contexts before responding to the input tasks, which imposes substantial computational overhead. As a solution, this papers presents FlexRAG which compresses retrieved contexts into compact embeddings before being encoded by the LLMs. FlexRAG achieves superior generation quality while significantly reducing running costs.? [Tweet] and [Paper]

[7] Contextual Compression in RAG Systems (Survey)

RAG addresses LLM limitations like hallucination, outdated knowledge etc. However RAG has its own limitations like limited context window, irrelevant information, and the high processing overhead for extensive contextual data. This paper presents? a comprehensive survey of contextual compression in RAG systems and concludes with some interesting future research directions.? [Tweet] and [Paper]

[8] Low-bit Large Language Models (Survey)

Low-bit quantization helps to address LLM deployment challenges by reducing the bit-width of model parameters, activations, and gradients which subsequently decreases memory usage and computational demands. This paper presents a comprehensive survey of low-bit quantization methods for LLMs, covering the fundamental principles, system implementations, and algorithmic strategies.? [Tweet] and [Paper]

[9] EMMA-500: Multilingual LLM for 500+ Languages

This paper introduces introduce EMMA-500, a large-scale multilingual language model based on Llama 2-7B model and continually trained on texts across 546 languages. EMMA-500 exhibits robust performance across a wide collection of benchmarks, including a comprehensive set of multilingual tasks and PolyWrite.? [Tweet] and [Paper]

[10] Enhancing Pre-training Data Quality at Scale

This paper? introduces Programming Every Example (ProX), a novel framework that treats data refinement as a programming task. Experimental results show that models pre-trained on ProX-curated data outperform either original data or data filtered by other selection methods by more than 2% across various downstream benchmarks. [Tweet] and [Paper]

Do subscribe to the newsletter so that you won't miss reading interesting LLM papers.