登录查看更多内容

Using offline coprocessor enhances LLM's KV-cache by adding extra "latent embeddings"

TuringPost

Newsletter about AI and ML. ?? Sign up for free to get your list of essential AI resources ??

发布日期: 2024年12月30日

Google DeepMind proposed a method that enhances LLMs with an offline coprocessor that works with the models' internal memory (kv-cache).

What's the coprocessor's role?

It enhances the model's KV-cache by adding extra "latent embeddings" (compressed representations) for more accurate outputs.

? What is good about it?

- The coprocessor operates independently, and the base LLM remains frozen.

- It operates offline and asynchronously, meaning it can improve the model’s memory in the background.

- If the coprocessor isn’t available or extra computation isn’t needed, the model still functions as usual.

- The model achieves lower perplexity.

- This method works across various tasks without additional fine-tuning.

Here are the details:

The interaction between the LLM and the coprocessor happens in 3 main steps:

KV-cache generation: The frozen LLM processes the input and creates a kv-cache representing its internal state for that input. The LLM itself remains unchanged.
Augmentation (the main process): The kv-cache is passed to the coprocessor, which shares the same architecture as the LLM but is trainable. The coprocessor also receives soft tokens (abstract, trainable prompts) that don’t represent actual words but guide the processing. Then the coprocessor uses the kv-cache and soft tokens to generate latent embeddings for added reasoning or context.
LLM generation with augmented context: The latent embeddings are added back to the original kv-cache. The LLM processes this augmented kv-cache along with the input to generate the final enhanced output.

Results of using coprocessor:

? Testing on reasoning-heavy tasks showed:

- 10.05% improvement on math reasoning (GSM8K).

- 4.70% improvement on multitask language understanding (MMLU) (multitask language understanding).

? These gains were achieved without any fine-tuning for specific tasks, highlighting the versatility of the method.

? This method also showed significantly lower perplexity compared to baseline models.

Paper: https://arxiv.org/pdf/2412.17747

Using offline coprocessor enhances LLM's KV-cache by adding extra "latent embeddings"

TuringPost

Newsletter about AI and ML. ?? Sign up for free to get your list of essential AI resources ??

Turing Post

2,406 位关注者

TuringPost的更多文章

社区洞察

其他会员也浏览了

??Top ML Papers of the Week

Introducing Mixtral-8x22B: The new open model from Mistral outperforms all existing open LLMs ??

??Top ML Papers of the Week

DeciLM-7B: The Fastest and Most Accurate 7 Billion-Parameter LLM to Date ??

Topic 19: Inside LLaVA-o1

How Good Are the Latest Open LLMs? And Is DPO Better Than PPO?

Geneea's AI Spotlight #2

Will GPT replace your developers?

The System Prompt Behind The Prompt Generator...

Top LLM APIs Compared: OpenAI, Llama, Gemini, Sonar, Claude (September-2024)

Turing Post

2,406 位关注者

TuringPost的更多文章

????#12: How Do Agents Learn from Their Own Mistakes? The Role of Reflection in AI

Topic 30: Everything You Need to Know about Knowledge Distillation

??#90: Why AI’s Reasoning Tests Keep Failing Us

SWE-RL, the first reinforcement learning (RL) method for software engineering

Topic 29: Inside the family of Smol models

Self-Optimizing Models, and Humanoid Robots Are Reshaping 2025

??#88: Can DeepSeek Inspire Global Collaboration?

????#10: Does Present-Day GenAI Actually Reason?

Topic 27: What are Chain-of-Agents and Chain-of-RAG?

Inside Eleven Labs’ Unicorn Journey: from a weekend project to $3.3 billion

社区洞察

其他会员也浏览了

??Top ML Papers of the Week

Introducing Mixtral-8x22B: The new open model from Mistral outperforms all existing open LLMs ??

??Top ML Papers of the Week

DeciLM-7B: The Fastest and Most Accurate 7 Billion-Parameter LLM to Date ??

Topic 19: Inside LLaVA-o1

How Good Are the Latest Open LLMs? And Is DPO Better Than PPO?

Geneea's AI Spotlight #2

Will GPT replace your developers?

The System Prompt Behind The Prompt Generator...

Top LLM APIs Compared: OpenAI, Llama, Gemini, Sonar, Claude (September-2024)