?? Top LLM Papers of the Week (November Week 3, 2024)
[1] Predictive Cache for LLM Serving
This paper introduces InstCache, a predictive cache for LLM serving and this is based on the observation that most instructions are short, repetitive and predictable by LLMs. This involves predicting user-instructions by an instruction-aligned LLM and storing them in a predictive cache. InstCache is implemented as a hash table with minimal lookup latency for deployment and results show that InstCache can achieve up to 51.34% hit rate on LMSys dataset, which corresponds to a 2x speedup, at a memory cost of only 4.5GB. [Tweet] and [Paper]
[2] Marco-o1 Open Reasoning LLM
This paper introduces Marco-o1 LLM which is an open reasoning model for open-ended solutions. Marco-o1 is powered by Chain-of-Thought (CoT) fine-tuning, Monte Carlo Tree Search (MCTS), reflection mechanisms, and innovative reasoning strategies -- optimized for complex real-world problem-solving tasks. Marco-o1 not only focuses on disciplines with standard answers, such as mathematics, physics, and coding -- which are well-suited for reinforcement learning (RL) -- but also places greater emphasis on open-ended resolutions. [Tweet] and [Paper]
[3] Lightweight Data Pipeline for LLMs
This paper introduces the Lightweight, Purpose-driven (LP) Data Pipeline, a framework that operates entirely on CPUs to streamline the processes of dataset extraction, filtering, and curation. LP Data Pipeline significantly reduces preparation time and cost while maintaining high data quality. This pipeline will lower the barriers to LLM development, enabling a wide range of organizations to access LLMs more easily. [Tweet] and [Paper]
[4] Lightweight LLM Safety Guardrails
Using fine-tuned LLMs as guardrails introduces increased latency and higher maintenance costs, which may not be practical or scalable for cost-efficient deployments. This paper introduces lightweight safety guardrails based on Sentence-BERT. This reduces the model size from LlamaGuard's 7 billion parameters to approximately 67 million, while maintaining comparable performance. [Tweet] and [Paper]
[5] 1B-scale Multilingual LLM
This paper introduces Xmodel-1.5, a novel 1-billion-parameter multilingual large model pretrained on approximately 2 trillion tokens. Xmodel-1.5 demonstrates strong performance across several languages, with particularly notable results in Thai, Arabic, and French, alongside its effectiveness in Chinese and English.? [Tweet] and [Paper]
"Top LLM Papers of the Week" newsletter is read by over 21k+ AI Researchers, Engineers and Developers. If you would like to promote with us, contact Kalyan KS
[6] Efficient LLM Inference Methods (Survey)
This paper provides a comprehensive survey of speculative decoding methods, categorizing them into draft-centric and model-centric approaches. The paper discusses key ideas associated with each method, highlighting their potential for scaling LLM inference.? [Tweet] and [Paper]
[7] Efficient Framework for Chatbot Preference-Tuning
This paper introduces LoRA-LiteE, a computationally efficient framework for chatbot preference-tuning. LoRA-Lite Ensemble (LoRA-LiteE) combines Supervised Fine-tuning (SFT) with Low-Rank Adaptation (LoRA) and Ensemble Learning techniques to effectively aggregate predictions of lightweight models. Results demonstrate that the proposed LoRA-LiteE model achieves comparable performance to un-finetuned GPT-4.? [Tweet] and [Paper]
[8] Efficient SLM for On-Device Document Assistance
This paper introduces SlimLM, a series of SLMs optimized for document assistance tasks on mobile devices. SlimLM is pre-trained on SlimPajama-627B and fine-tuned on DocAssist, a dataset for summarization, question answering and suggestion tasks. SlimLM demonstrates comparable or superior performance and offering a benchmark for future research in on-device language models. [Tweet] and [Paper]
[9] Preliminary Case Study with Claude 3.5 Computer Use
Claude 3.5 Computer Use, stands out as the first frontier AI model to offer computer use in public beta as a graphical user interface (GUI) agent.? The authors introduce? a collection of carefully designed tasks spanning a variety of domains and software to investigate the effectiveness of Claude 3.5 Computer Use.? The authors also provide an agent framework for deploying API-based GUI automation models with easy implementation. This preliminary exploration will inspire future research into the GUI agent community. [Tweet] and [Paper]
[10] Multilingual Large Language Models (Survey)
This paper provides a comprehensive survey of the latest research on multilingual large language models (MLLMs). This survey paper covers (1) architecture and pre-training objectives of MLLMs (2) construction of multilingual pre-training and alignment datasets (3) detailed taxonomy and roadmap covering the assessment of MLLMs' cross-lingual knowledge (4) use of LLMs themselves as multilingual evaluators (5) interpretability of multilingual capabilities, cross-lingual transfer and language bias within these models and (6) challenges and opportunities in deploying MLLMs.? [Tweet] and [Paper]
[11] Intent Discovery with LLMs (IntentGPT)
This paper introduces IntentGPT, a novel training-free method that effectively prompts LLM for intent classification. IntentGPT comprises (1) an In-Context Prompt Generator, which generates informative prompts for In-Context Learning, (2) an Intent Predictor for classifying and discovering user intents from utterances, and (3) a Semantic Few-Shot Sampler that selects relevant few-shot examples and a set of known intents to be injected into the prompt. Results show that IntentGPT outperforms existing approaches on popular benchmarks, including CLINC and BANKING. [Tweet] and [Paper]
[12] Open Dataset for LLM Training
This paper introduces RedPajama,? an open dataset for training large language models. Specifically, the authors release (1) RedPajama-V1, an open reproduction of the LLaMA training dataset and (2) RedPajama-V2, a massive web-only dataset consisting of raw, unfiltered text data together with quality signals and metadata. Together, the RedPajama datasets comprise over 100 trillion tokens spanning multiple domains. [Tweet] and [Paper]
Do subscribe to the newsletter so that you won't miss interesting updates related to Generative AI, LLMs, Agents and RAG.
Kalyan KS, Research Scientist(NLP) at Akmmus AI Labs
Top LLM Papers of the week (November Week 2, 2024) - https://www.dhirubhai.net/pulse/top-llm-papers-week-november-2-2024-kalyan-ks-eyphc