The Secret Behind Large Language Models: Memorization Over Understanding?
In this article, SCRT Labs CEO Guy Zyskind offers an interesting take on Large Language Models (LLMs) like #GPT4! The idea: It's their amazing memorization skills, not actual understanding, that make them seem so advanced. Let's explore this perspective.
When training LLMs, they're exposed to tons of data. They learn patterns, context, and relationships. But some of their knowledge comes from memorizing info. An interesting hypothesis is that memorization is the key to their impressive performance.
The intriguing take: As LLMs like GPT4 get bigger, they memorize more, making them seem even more impressive. This idea says it's this extra memorization, not generalization, that sets GPT4 apart from earlier models, or open-source models like Llama, GPT4ALL and StableLM
To see a concrete example, I asked both GPT-4 and StableLM to write a bio for me. The first gets it right, the other hallucinates with some nice crazy fiction
But both models understand and reason quite well. So what gives? Well, GPT-4 has a lot more memorized training data to lean on in zero-shot prompts like this, with no connection to the internet.
领英推荐
Many say that sure, LLMs memorize facts and phrases, but their real strength is generalizing from training data. However, this hypothesis challenges that idea, proposing that memorization plays a bigger role for models like GPT4.
Researchers are working on ways to assess how much LLMs memorize. This is super important for understanding a model's limits and making sure sensitive info from the training data isn't leaked. These leaks are not theoretical - https://usenix.org/system/files/sec21-carlini-extracting.pdf
In addition, you need to remember that LLMs don't "think" like humans. They use patterns and relationships to generate text. If larger models truly 'just' memorize more, it could lead to a more "impressive" but less context-aware model.
The future: Research is ongoing to improve LLMs, trying to reduce reliance on memorization while boosting their ability to reason and understand context.
The future (cont): This also has major consequences for privacy. Smaller models that memorize less and generalize well can run locally and protect against membership attacks (extracting sensitive information from training data).
I think that improving real-time inference based on browsing would be the way to level the playing field