登录查看更多内容

The Secret Behind Large Language Models: Memorization Over Understanding?

SCRT Labs

Accelerating adoption of privacy-first Web3 tech

发布日期: 2023年4月23日

In this article, SCRT Labs CEO Guy Zyskind offers an interesting take on Large Language Models (LLMs) like #GPT4! The idea: It's their amazing memorization skills, not actual understanding, that make them seem so advanced. Let's explore this perspective.

When training LLMs, they're exposed to tons of data. They learn patterns, context, and relationships. But some of their knowledge comes from memorizing info. An interesting hypothesis is that memorization is the key to their impressive performance.

The intriguing take: As LLMs like GPT4 get bigger, they memorize more, making them seem even more impressive. This idea says it's this extra memorization, not generalization, that sets GPT4 apart from earlier models, or open-source models like Llama, GPT4ALL and StableLM

To see a concrete example, I asked both GPT-4 and StableLM to write a bio for me. The first gets it right, the other hallucinates with some nice crazy fiction

But both models understand and reason quite well. So what gives? Well, GPT-4 has a lot more memorized training data to lean on in zero-shot prompts like this, with no connection to the internet.

领英推荐

Explore the Future with Gen AI: Your Weekly Passport…

Perpetual Block - a Partex Company 1 年前

NewMind AI Journal #6

NewMind AI 2 个月前

Unlocking Precision: The Art of Fine-Tuning Language…

OnFinance AI 11 个月前

Many say that sure, LLMs memorize facts and phrases, but their real strength is generalizing from training data. However, this hypothesis challenges that idea, proposing that memorization plays a bigger role for models like GPT4.

Researchers are working on ways to assess how much LLMs memorize. This is super important for understanding a model's limits and making sure sensitive info from the training data isn't leaked. These leaks are not theoretical - https://usenix.org/system/files/sec21-carlini-extracting.pdf

In addition, you need to remember that LLMs don't "think" like humans. They use patterns and relationships to generate text. If larger models truly 'just' memorize more, it could lead to a more "impressive" but less context-aware model.

The future: Research is ongoing to improve LLMs, trying to reduce reliance on memorization while boosting their ability to reason and understand context.

The future (cont): This also has major consequences for privacy. Smaller models that memorize less and generalize well can run locally and protect against membership attacks (extracting sensitive information from training data).

I think that improving real-time inference based on browsing would be the way to level the playing field

The Secret Behind Large Language Models: Memorization Over Understanding?

SCRT Labs

Accelerating adoption of privacy-first Web3 tech

领英推荐

社区洞察

其他会员也浏览了

Large Language Models: an update for the perplexed

DeepMind’s Michelangelo Benchmark Reveals Limitations of Long-Context LLMs

The Orca Pulse - Quarter 3 - 2023 Newsletter

?? What Next-Gen RAG Is About

How Large Language Models 'Know'

A New Era of Open-Source LLMs Begins

?? 3 Ways to Efficient AI

Expanding Context Lengths in LLMs; Towards CausalGPT; Perplexity vs. Bard vs. GPT; Meet TinyLama; Leveraging qLoRA For Fine-Tuning; and More.

Retrieval-Augmented Generation (RAG) and Agentic RAG

DeepSeek R1 vs. OpenAI 4o vs. Claude 3.5 Sonnet vs. Llama 3.3: A Comparative Analysis of LLM