Top LLM Papers of the Week (August Week 1, 2024)
[1] ShieldGemma (Content moderation LLM)
This paper introduces ShieldGemma, a suite of safety content moderation models based on Gemma2, designed to detect various types of harmful content (sexually explicit, dangerous content, harassment, hate speech) in both user inputs and LLM-generated outputs. The models demonstrate superior performance compared to existing models like Llama Guard and WildCard across public and internal benchmarks.?[Paper]
[2] Modular RAG
This paper introduces a modular Retrieval-augmented Generation (RAG) framework to address the increasing complexity of RAG systems for Large Language Models.? This framework decomposes RAG systems into independent modules and operators, allowing for a highly reconfigurable system beyond the traditional linear "retrieve-then-generate" process. [Paper]
[3] Light Weight Gemma 2 LLMs
Gemma 2 models range from 2 billion to 27 billion parameters and include several architectural improvements to the Transformer, including interleaved local-global attentions and group-query attention. For the smaller 2B and 9B models, knowledge distillation was used instead of traditional next token prediction during training. The resulting models deliver the best performance for their size, and even offer competitive alternatives to models that are 2-3 times bigger.?[Paper]
[4] Legal Domain SOTA LLMs
This paper introduces SaulLM-54B and SaulLM-141B, two legal domain-specific LLMs based on the Mixtral architecture. The models were developed using a three-stage approach: continued pretraining on a vast corpus of legal texts, implementation of a specialized legal instruction-following protocol, and alignment with human preferences in legal interpretations. The authors released base, instruct, and aligned versions under the MIT License. [Paper]
[5] Efficient Continual Pre-training of Large Language Models
This paper demonstrates the effectiveness of continual pretraining in enhancing the Chinese language ability and scientific reasoning ability of Llama-3 model? while maintaining general performance. The resulting model, named Llama-3-SynE, showed significant improvements in both general abilities and scientific reasoning on various benchmarks (such as C-Eval, CMMLU, MATH, and SciEval) without compromising its original capabilities.?[Paper]
[6] Iterative RAG System
The authors introduce i-MedRAG, an iterative Retrieval-Augmented Generation (RAG) framework which addresses the limitations of traditional RAG systems by allowing LLMs to ask follow-up queries based on previous information-seeking attempts, enabling more complex reasoning.?[Paper]
领英推荐
[7] RAGate
This paper introduces RAGate, a gating model designed to determine whether Retrieval Augmented Generation (RAG) is necessary for each turn in a conversational system. Results demonstrate that RAGate effectively identifies when RAG should be applied in conversational systems, resulting in high-quality responses and improved generation confidence. [Paper]
[8] LLM Safety Survey
With the wide adoption of LLMs in various applications, there is a growing concern regarding the safety risks associated with LLMs. This survey paper presents a comprehensive overview of AI safety research in the context of LLMs. It explores the risks, challenges, and safety concerns associated with these models. [Paper]
[9] Apple Intelligence Foundation Language Models
This report introduces foundation language models developed by Apple for their Intelligence features. It describes two key models: a compact ~3 billion parameter model designed for efficient on-device performance, and a larger server-based model for Private Cloud Compute. [Paper]
[10] Medical Domain-Specific RAG
This paper introduces Bailicai framework which integrates Retrieval-Augmented Generation (RAG) with Large Language Models (LLMs) optimized for medical applications. It outperforms existing medical domain LLMs and GPT-3.5 on various medical benchmarks and also effectively reduces noise-related challenges in processing medical information. [Paper]
If you like this, do subscribe to the newsletter so that you won't miss any of the interesting LLM papers.