Unlearn to Learn: How Unlearning is going to be a crucial part of Responsible (AI) by Design?
https://research.google/blog/announcing-the-first-machine-unlearning-challenge/

Unlearn to Learn: How Unlearning is going to be a crucial part of Responsible (AI) by Design?

Machine unlearning is an emerging area within machine learning that focuses on eliminating the impact of a specific subset of training examples, known as the "forget set," from a trained model. The goal is to develop algorithms that can effectively remove the influence of these examples while preserving other desirable properties of the model, such as accuracy on the remaining training data and generalization to new, unseen examples.

One approach to achieve this is by retraining the model on a modified training set that excludes the samples from the forget set. However, this method can be computationally intensive, particularly for deep models. An ideal unlearning algorithm would instead leverage the existing trained model as a starting point and efficiently adjust it to eliminate the influence of the specified data, without the need for extensive retraining.

Large language models (LLMs) are powerful tools, but like any powerful tool, they can be misused. Unlearning tackles this challenge by allowing us to remove unwanted knowledge or behaviors from LLMs. It can ensure that large language models (LLMs) produce safe outputs that are in line with human values and regulatory policies.

Below are some of the areas unlearning has proven to be effective:

  1. Eliminating Harmful Responses: Due to being trained on vast amounts of internet data containing potentially harmful text, LLMs may inadvertently generate problematic outputs, such as those containing racist, sexist, or toxic content, which could fuel social discord.
  2. Addressing Copyrighted Content Concerns: There is a growing conflict between data rights holders (e.g., authors) and LLM service providers, leading to legal disputes involving entities like OpenAI, Meta, and the New York Times. Recent studies have shown that LLMs are capable of memorizing and inadvertently revealing copyrighted information. Removing such learned behaviors from LLMs, as requested by content creators, is crucial, although it can be prohibitively expensive if retraining LLMs from scratch is necessary.
  3. Minimizing Errors and Misinformation (Hallucinations): LLMs often produce factually incorrect responses that can mislead users. Mitigating such errors, particularly in applications where the stakes are high, is essential for establishing and maintaining user trust.
  4. Changing policies in privacy, sensitive information etc: Organizations and even regulators may change policies thereby resulting in a new classification of what data can and cannot be used. LLM unlearning helps forget old data that is now not permissible to use.

How can we achieve LLM/Machine Unlearning?

There are several LLM unlearning techniques under development, each with its own approach. Here's a breakdown of some key methods:

1. Data-Driven Techniques:

Targeted Data Selection: This method involves feeding the LLM with new data that contradicts the unwanted information. The LLM is essentially exposed to counter-arguments, weakening the influence of the original unwanted knowledge.

Fine-tuning with Selective Examples: Similar to how LLMs are trained, this technique involves providing the LLM with specifically curated examples that highlight the undesired behavior. By focusing on these examples, the LLM learns to recognize and avoid generating similar outputs in the future.

2. Model-Based Techniques:

Gradient Descent with Masking: This technique leverages the backpropagation algorithm used during training. Here, a "mask" is applied during backpropagation, preventing updates to specific parts of the LLM network associated with the knowledge we want to unlearn. This allows for targeted modification without affecting the entire model.

Knowledge Distillation with Selective Memory: This approach involves two models: a "teacher" model trained without the unwanted knowledge and a "student" model (the LLM we want to unlearn from). The teacher model acts as a guide, influencing the student model to forget the problematic information through a carefully designed knowledge distillation process with a focus on "forgetting" specific memories.

Contrastive Learning for Unlearning: This method exposes the LLM to pairs of contrasting data points. One element in the pair represents the unwanted knowledge, and the other represents the desired knowledge. By learning to differentiate between these contrasting pairs, the LLM weakens the unwanted associations and strengthens the desired ones.

These are just a few examples, and researchers are constantly exploring new and innovative techniques for LLM unlearning. It's important to note that each technique has its own advantages and limitations. Choosing the right approach depends on the specific type of knowledge you want to unlearn and the overall LLM architecture.

Notable Case Studies:

Forgetting Harry Potter

The Microsoft research paper presents an innovative method for unlearning copyrighted data within large language models (LLMs). Illustrated through the example of the Llama2-7b model and Harry Potter books, the technique comprises three key components aimed at erasing the world of Harry Potter from the LLM's memory:

Reinforced model identification: This involves fine-tuning the model with target data (e.g., Harry Potter content) to reinforce its understanding of the material to be unlearned.

Replacement of idiosyncratic expressions: Unique Harry Potter phrases within the target data are substituted with more generic equivalents, promoting a broader comprehension.

Fine-tuning based on alternative predictions: The base model undergoes further fine-tuning using alternative predictions derived from the adjusted data. This effectively expunges the original text from the model's memory when encountering relevant context.

While the Microsoft technique is still in its early stages and may have limitations, it represents a significant step forward towards developing more potent, ethical, and adaptable LLMs.

An alternative alignment to traditional RLHF

The results from researchers from ByteDance show that unlearning is a promising approach of aligning LLMs to stop generating undesirable outputs, especially when practitioners do not have enough resources to apply other alignment techniques such as RLHF.? They present three scenarios in which unlearning can successfully remove harmful responses, erase copyrighted content, and eliminate hallucinations. Theur experiments demonstrate the effectiveness of the method. The subsequent ablation study shows that despite only having negative samples, unlearning can still achieve better alignment performance than RLHF with only a fraction of its computational time.

LLM unlearning is a fascinating area with the potential to revolutionize how we develop and interact with these powerful language models. As research progresses, we can expect even more innovative techniques and a future where LLMs are not just powerful, but also safe, adaptable, and trustworthy.

References:

Thanks to Ritarshi Chakraborty for helping me with this article.

Announcing the first Machine Unlearning Challenge – Google Research Blog

2310.10683.pdf (arxiv.org)

2310.02238.pdf (arxiv.org)

Unlearning Copyrighted Data From a Trained LLM – Is It Possible? - Unite.AI

Andrew Rice

I help CIOs of technology companies, to slash cybersecurity risks up to 90%, by implementing robust security protocols and strategies.

2 个月

I think that the various methods of unlearning have their pros and cons. Guardrails for example can be worked around but are quick and relatively easy to implement with no retraining. However would this meet the requirements of Article 17 of GDPR and "the right to be erased". Possibly as the requirement also takes into consideration technical limitations and costs. It could be a short term answer whilst the "forget set" is being created.

Looking forward to reading your insights on the importance of machine unlearning in AI development. ?? Syed Q Ahmed

Machine unlearning is key in keeping AI models relevant and effective over time. Embracing this concept ensures continuous improvement in model performance! Syed Q Ahmed

Pete Grett

GEN AI Evangelist | #TechSherpa | #LiftOthersUp

6 个月

Machine unlearning is key to keeping AI models effective and adapting to changing data trends. Looking forward to reading your insights! Syed Q Ahmed

Lionel Tchami

???? DevOps Mentor | ?? Helping Freshers | ????Senior Platform Engineer | ?? AWS Cloud | ?? Python Automation | ?? Devops Tools | AWS CB

6 个月

Can't wait to read your insights on the importance of machine unlearning in AI development! ??

要查看或添加评论,请登录

Syed Quiser Ahmed的更多文章

社区洞察

其他会员也浏览了