Top LLM Papers of the Week (November Week 2, 2024)

Top LLM Papers of the Week (November Week 2, 2024)

[1] Practical Guide to Fine-tuning with Limited Data

This paper presents a practical guide to fine-tuning modes with limited data. The paper covers (1) initial and continued pre-training strategies to better leverage prior knowledge in unseen domains and languages (2) how to maximize the utility of limited data during fine-tuning and few-shot learning and (3) models and methods suited for different levels of data scarcity. [Tweet] and [Paper]


[2] Efficient inference for Long-Context LLMs

This paper presents recycled attention, an efficient inference for long-context LLMs. Recycled Attention alternates between full context attention and attention over a subset of input tokens and chooses tokens that are relevant to the current decoding step. It achieves comparable speedup to baselines which only consider local context while improving the performance by 2x.??[Tweet] and [Paper]


[3] Fox-1 (series of SLMs)

This paper introduces Fox-1, a series of small language models (SLMs) consisting of Fox-1-1.6B and Fox-1-1.6B-Instruct-v0.1. These models are pre-trained on 3 trillion tokens of web-scraped document data and fine-tuned with 5 billion tokens of instruction-following and multi-turn conversation data.? Fox-1 achieves better or on-par performance in various benchmarks compared to StableLM-2-1.6B, Gemma-2B, Qwen1.5-1.8B, and OpenELM1.1B, with competitive inference speed and throughput. [Tweet] and [Paper]


[4] Do commercial fine-tuning APIs infuse knowledge into LLMs?

This paper introduces FineTuneBench, an evaluation framework and dataset to evaluate how well commercial fine-tuning APIs infuse knowledge into LLMs. The paper investigates five frontier LLMs with commercially available fine-tuning APIs. Results show that (1) fine-tuning GPT-4o mini is the most effective for infusing new knowledge and updating knowledge, followed by GPT-3.5 Turbo and GPT-4o (2) the fine-tuning APIs for Gemini 1.5 Flesh and Gemini 1.5 Pro are unable to learn new knowledge or update existing knowledge. Results highlight a major shortcoming in using current commercial fine-tuning services to achieve reliable knowledge infusion in common scenarios. [Tweet] and [Paper]


"Top LLM Papers of the Week" newsletter is read by over 20k+ AI Researchers, Engineers and Developers. If you would like to promote with us, contact Kalyan KS


[5] LLM Knowledge Distillation for Text Classification

This paper introduces Performance-Guided Knowledge Distillation (PGKD), a cost-effective and high throughput solution for production text classification applications. PGKD utilizes teacher student Knowledge Distillation to distill the knowledge of LLMs into smaller, task-specific models. Results reveal that? models finetuned with PGKD are up to 130X faster and 25X less expensive than LLMs for inference on the same classification task. PGKD is a versatile framework can be extended to any LLM distillation task, including language generation. [Tweet] and [Paper]


[6] Analyzing RAG Systems based on Sufficient Context

This paper analyzes RAG systems through a new lens called “sufficient context”. Results reveal that that proprietary LLMs (Gemini, GPT, Claude) excel at answering queries when the context is sufficient, but often output incorrect answers instead of abstaining when the context is not. On the other hand, open-source LLMs (Llama, Mistral, Gemma) hallucinate or abstain often, even with sufficient context.? [Tweet] and [Paper]


[7] Parameter Efficient Knowledge Distillation for LLMs

This work introduces LLM-Neo, a parameter efficient knowledge distillation framework for LLMs. LLM-Neo combines LoRA and KD to enhance the efficiency of knowledge transfer. Experimental results on compressing Llama 2 and Llama 3 show that LLM-Neo outperforms various baselines. Further analysis demonstrates the robustness of the proposed LLM-Neo on variants of LoRA. [Tweet] and [Paper]


[8] Evaluating LLMs on Enterprise Text-to-SQL Workflows

This paper introduces Spider 2.0, an evaluation framework comprising 632 real-world text-to-SQL workflow problems derived from enterprise-level database use cases. Solving problems in Spider 2.0 frequently requires understanding and searching through database metadata, dialect documentation, and even project-level codebases. Results show that LLM require significant improvement in order to achieve adequate performance for real-world enterprise usage.? [Tweet] and [Paper]


[9] Automating Machine Learning Tasks with Multi-Agents

This paper introduces BudgetMLAgent, a cost-effective LLM multi-agent system for automating machine learning tasks. Analysis show that? (1) no-cost and low-cost models such as Gemini-Pro, Mixtral and CodeLlama perform far worse than GPT-4 in a single-agent setting (2) the proposed system with 94.2% reduction in the cost is able to yield better average success rate of 32.95% as compared to GPT-4 single-agent system yielding 22.72% success rate. [Tweet] and [Paper]


[10] Accelerating Long Context LLM Inference

The inference costs increase linearly with the length of the input prompt. To address this, the paper introduces squeezed attention which can accelerate LLM inference with lengthy inputs. The proposed squeezed attention achieves a 3x reduction in KV cache budget without accuracy loss and up to an 8x reduction with <0.5 point accuracy gap for various models. [Tweet] and [Paper]

Do subscribe to the newsletter so that you won't miss interesting updates related to Generative AI, LLMs and RAG.

Kalyan KS, Research Scientist(NLP) at Akmmus AI Labs


Hrijul Dey

AI Engineer| LLM Specialist| Python Developer|Tech Blogger

3 个月

Excited to explore Scikit-LLM – harnessing AI & machine learning with NLTK and sklearn! Next stop: unlocking insights from our NLP data. https://www.artificialintelligenceupdate.com/scikit-llm-sklearn-meets-large-language-models-for-nlp/riju/ #learnmore #AI&U #NLP #MachineLearning #AI

回复

要查看或添加评论,请登录

Kalyan KS的更多文章

社区洞察

其他会员也浏览了