?? Moving beyond RAG

?? Moving beyond RAG

In this issue:

  1. 2.9x Lower Latency with Prompt Compression
  2. Unified Structure Learning
  3. Is it RAG? Is it FT? No, it’s RAFT!


Meet your new AI-powered data analyst!

Telescope Labs makes quality insights and Data Science more accessible by simplifying the "data to action" journey for everyone.

Want to empower your teams to develop better products and services with the help of AI? Click on the button below and try it out for free.


1. LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression

Watching: LLMLingua-2 (paper)

What problem does it solve? Prompts are a crucial component in interacting with Large Language Models (LLMs). However, as prompts become more complex and detailed to guide the model effectively, they also become longer. This increased length can lead to redundancy and inefficiency in the prompts. Existing approaches to compress prompts often rely on information entropy obtained from a causal language model, but this method has limitations in capturing all essential information and aligning with the prompt compression objective.

How does it solve the problem? The proposed approach addresses the limitations of existing prompt compression methods by introducing a data distillation procedure. This procedure derives knowledge from an LLM to compress prompts without losing crucial information. Additionally, the authors introduce an extractive text compression dataset to support the compression task. By formulating prompt compression as a token classification problem and using a Transformer encoder architecture, the model captures essential information from the full bidirectional context, ensuring the faithfulness of the compressed prompt to the original one.

What's next? As prompt-based interaction with LLMs becomes increasingly prevalent, efficient and effective prompt compression techniques will be essential for maintaining performance while minimizing computational costs. Further research could explore the application of this approach to a wider range of tasks and LLMs, as well as investigating the potential for integrating prompt compression into the LLM training process itself.


2. mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding

Watching: DocOwl 1.5 (paper/code)

What problem does it solve? Multimodal Large Language Models (MLLMs) have shown impressive capabilities in understanding and reasoning about visual documents like forms, receipts, charts, and webpages. However, current MLLMs often struggle with fully capturing the rich structural information present in these documents. Understanding the layout, spatial relationships, and hierarchical organization of elements is crucial for accurately interpreting the semantics of text-rich images.

How does it solve the problem? The researchers propose Unified Structure Learning, which combines structure-aware parsing tasks and multi-grained text localization tasks across various domains. They introduce H-Reducer, a vision-to-text module that preserves layout information while efficiently reducing the length of visual features. This enables the LLM to process high-resolution images more effectively. Additionally, they construct DocStruct4M, a comprehensive training set with structure-aware text sequences and multi-grained text-bounding box pairs, and DocReason25K, a high-quality reasoning tuning dataset for detailed explanations in the document domain.

What's next? The proposed DocOwl 1.5 model achieves state-of-the-art performance on 10 visual document understanding benchmarks, significantly outperforming previous MLLMs with a 7B LLM. This demonstrates the importance of incorporating structure learning in MLLMs for text-rich image understanding. Future research could explore extending this approach to other domains, such as scientific literature, medical records, or legal documents, where structure plays a vital role in comprehension. Additionally, investigating more efficient architectures and training strategies for structure-aware MLLMs could further enhance their practicality and scalability.


3. RAFT: Adapting Language Model to Domain Specific RAG

Watching: RAFT (paper)

What problem does it solve? Large Language Models (LLMs) are typically pretrained on vast amounts of general-domain data. However, when applying these models to specific domains or tasks, it is often necessary to incorporate additional knowledge that is not present in the pretraining data. This can be achieved through techniques like Retrieval-Augmented Generation (RAG) or fine-tuning. The challenge lies in finding the most effective way to integrate this new knowledge into the pretrained model to improve its performance on the target task.

How does it solve the problem? Retrieval Augmented FineTuning (RAFT) is a training approach that enhances the model's ability to answer questions in an "open-book" in-domain setting. Given a question and a set of retrieved documents, RAFT trains the model to disregard documents that are not relevant to answering the question, referred to as "distractor documents." It achieves this by explicitly citing the correct sequence from the relevant document that would assist in answering the question. Additionally, RAFT employs a chain-of-thought-style response, which helps improve the model's reasoning capabilities.

What's next? The effectiveness of RAFT in improving the performance of pretrained LLMs in domain-specific RAG tasks has been consistently demonstrated across various datasets, including PubMed, HotpotQA, and Gorilla. This suggests that RAFT could serve as a valuable post-training recipe for adapting pretrained LLMs to in-domain RAG tasks. Future research could explore the applicability of RAFT to a wider range of domains and investigate potential improvements to the technique, such as incorporating more sophisticated retrieval methods or exploring alternative ways of guiding the model's attention to relevant information within the retrieved documents.


Papers of the Week:

YUNSEOP IM

SE @OMSCS student at GaTech

5 个月

Good good

回复

Thank you for sharing!! Kudos!

回复
Patrick Ranger

Newly Qualified IT Specialist in System Integration with a Passion for AI | Creator of an AI Blog | Advancing in Cybersecurity | Eager to Drive Technological Innovation | DM for collab

6 个月

It's fascinating to see the evolution of language models and the ongoing quest for optimal methodologies.

要查看或添加评论,请登录

Pascal Biese的更多文章

  • ?? Chasing o1: Closing the Reasoning Gap

    ?? Chasing o1: Closing the Reasoning Gap

    In this issue: Closing LLMs’ reasoning gaps IBM’s data preparation framework Meta attempting to redefine RLHF Financial…

    4 条评论
  • ?? LLMs Are Improving Themselves

    ?? LLMs Are Improving Themselves

    In this issue: Self-correcting LLMs The 4 levels of RAG and beyond 9B model beating GPT-4o in RAG MLOps/GenAI World is…

    2 条评论
  • ?? A New Neural Architecture (Again)

    ?? A New Neural Architecture (Again)

    In this issue: The return of a controversial neural network architecture NVIDIA releasing open frontier models…

    4 条评论
  • ?? What Next-Gen RAG Is About

    ?? What Next-Gen RAG Is About

    In this issue: Dual-system RAG with photographic memory LLMs coming up with better ideas than humans Taking LLM Graph…

    5 条评论
  • ?? The Next Level of CoT Prompting

    ?? The Next Level of CoT Prompting

    In this issue: A more strategic way of prompting Closing the open source gap for MoE models The most powerful small…

    1 条评论
  • ?? Agents for Time Series Analysis

    ?? Agents for Time Series Analysis

    In this issue: Agents doing time series analysis Seamless migration from LLMs to SLMs Fitting your whole codebase into…

    11 条评论
  • ??? Agent-ception: When Agents Are Creating Agents

    ??? Agent-ception: When Agents Are Creating Agents

    Foreword I've started a new format called "Executive Summaries", where I'll take the time to guide you through…

    6 条评论
  • ?? Apple's Answer to Complex LLM Evaluation

    ?? Apple's Answer to Complex LLM Evaluation

    In this issue: We aren’t running out of data anytime soon A ToolSandbox for evaluating complex LLM applications…

    2 条评论
  • ?? The Downsides of Structured Outputs

    ?? The Downsides of Structured Outputs

    In this issue: The downsides of structured outputs From chaining thoughts to thinking on graphs Graph RAG for domain…

    18 条评论
  • ?????? Attention Is All Graphs Need

    ?????? Attention Is All Graphs Need

    In this issue: Attention might be all you need - even for graphs Mamba-2 is coming for multi-modal Transformers A…

    9 条评论

社区洞察

其他会员也浏览了