Mass-Editing Memory In A Transformer
Kevin Meng, Arnab Sen Sharma, Alex Andonian, Yonatan Belinkov, and David Bau. "Mass Editing Memory in a Transformer." The Eleventh International Conference on Learning Representations (ICLR); 2023
The paper introduces Mass-Editing Memory In A Transformer (MEMIT), a method to directly edit factual memories stored within the parameters of large transformer language models like GPT-J (6B parameters; Wang & Komatsuzaki 2021) and GPT-NeoX (20B parameters; Black et al. 2022). Unlike prior work focused on updating just a few facts, MEMIT can robustly scale to thousands of edits simultaneously.
All experiments were run on workstations with NVIDIA A6000 GPUs. The language models were loaded using HuggingFace Transformers (Wolf et al., 2019), and PyTorch (Paszke et al., 2019) was used for executing the model editing algorithms on GPUs. GPT-J experiments fit into one 48GB A6000, but GPT-NeoX runs require at least two: one 48GB GPU for running the model in float16, and another slightly smaller GPU for executing the editing method.
Large neural language models exhibit factual knowledge, answering natural language queries about real-world concepts. However, they lack specialized or up-to-date information. The ability to rapidly customize models with new memories is vital for applications like question answering, search, and content generation.
Prior approaches for knowledge editing in language models include constrained fine-tuning, hypernetwork editors like KE ( Hase et al. 2021) and MEND (Mitchell et al., 2021), and direct rank-one model editing through ROME (Meng et al., 2022). However, these methods degrade beyond a few dozen edits at most. MEMIT pushes the boundaries to over an order of magnitude more facts edited concurrently.
The key insight behind MEMIT lies in recognizing feed-forward layers in transformers as neural key-value memories without normalization. The input representations get multiplied by parameter matrices of memory keys to produce coefficients over each memory. These coefficients then weigh the corresponding memory values to output weighted sums.
The paper investigates - what input patterns do the memory keys capture? What probability distributions over next tokens are stored in the values? And how does the full model aggregate information distribute across hundreds of these memories in each layer?
The keys correlate with human-interpretable input patterns, ranging from n-gram surface features in lower layers to more semantic topics in upper layers. This aligns with and explicates prior findings on hierarchical representations. The values store relevant output predictions, with clearer correlations in upper layers between stored continuation distributions and actual next tokens in triggering examples.
Causal mediation analysis spotlights a range of mid-layer MLPs as most influential for factual recall. MEMIT spreads edits over these identified critical layers. For each fact, optimization finds a target vector that fully represents the memory. Residual portions of these targets get inserted in sequence over the critical layers.
Experiments demonstrate MEMIT scaling linearly to 10,000 edits on both GPT-J and GPT-NeoX with over 96% efficacy and 80+ overall score. Fine-tuning fails catastrophically beyond 1,000 edits due to specificity collapse. Prior editors like MEND and ROME peak at 10-100 edits before degrading. MEMIT also shows consistent performance when editing different categories of facts.
The transparent intervention approach sets MEMIT apart from typical black box tuning methods. This interpretability paradigm allows explicitly editing associations stored in relevant parameters - proving essential for high scalability. Meanwhile, optimization constraints minimize embedding space interference, preserving specificity and fluency even with thousands of inserted memories.
When aggregating memories, hundreds of cells activate per layer. But outputs still compose distinct, compromise distributions rather than directly picking any single memory's prediction. Across layers, residuals enable gradual refinement of compositions. Most "hard" decisions lock in by early layers, with later tuning of probabilities.
By demonstrating extreme scale editing capacity, MEMIT opens promising directions like rapidly customizing models to user domains or current events. However, the editable associations still exhibit limitations - lacking complex inference chains or symmetric relations. Future work may address such broader reasoning within this transparent editing approach.
?The paper delivers three key contributions:?
领英推荐
By identifying relevant parameters corresponding to explicit factual knowledge, MEMIT attains extremely high efficacy for thousands of updates together. Meanwhile, optimization constraints ensure minimal interference - preserving specificity and fluency that otherwise collapse with baseline editing methods. Adopting this interpretable editing paradigm proves pivotal for successful scaling.?
Additional analyses investigate performance trade-offs with different hyperparameter configurations and content mixes. In all cases, MEMIT exhibits predictable behavior allowing principled extension. Such transparency allows avoiding failure modes that likely affect opaque black box tuning.
Case studies demonstrate practically injecting fresh real-world information or domain-specialized content into an existing general model. Such customized editing greatly expands the usefulness of large but static language models. However, this remains an early step with much scope for advancement.?
Future work may build on the connections illuminated between feed-forward memory patterns, their next token values, and compositional aggregation. Several promising research threads emerge around interpreting embedding transformations, generalizing beyond language models, and developing real-world applications.
?By scaling knowledge editing over 100x more than prior approaches, MEMIT pushes the boundaries of what is possible with editable neural networks. But the enhanced transparency and understanding of model internals proves equally valuable for characterizing edits, avoiding failures, and inspiring advances. Adopting this interpretable editing paradigm likely remains key to further progress.
In conclusion, this paper presents MEMIT, an interpretable method for directly editing factual memories stored within transformer parameters. By spreading changes over critical layers hosting relevant knowledge, MEMIT can robustly incorporate thousands of updates simultaneously - vastly outperforming prior state-of-the-art.
The insights around transparently treating substructures like feed-forward layers as key-value memories opens promising possibilities for customized editing. With knowledge representation still constrained, ample scope exists for progress. Nonetheless, MEMIT represents an important advance towards directly moldable neural knowledge bases with up-to-date factual recall.
By combining empirical analysis illuminating how models store and aggregate facts with an intervention approach leveraging those insights for scalable updating, this paper delivers an exciting step towards customizable, reliable, and accountable language models. The transparent editing paradigm pioneered here may prove indispensable.
The code and data are available at memit.baulab.info.