How to Merge LLMs?

How to Merge LLMs?

The landscape of open-source LLMs is evolving rapidly, with models now handling trillions of tokens and billions of parameters. The open-source models are easily accessible from platforms such as Hugging Face and Ollama , which truly democratize AI and help revolutionize various sectors.

However, fine-tuning separate models for each task presents challenges:

  1. Storage and Deployment: Storing and deploying each model separately and the inability of independently trained models to leverage insights from related tasks
  2. Catastrophic Forgetting: Training models from scratch requires substantial investment. Further, fine-tuning can lead to catastrophic forgetting, degrading their general capabilities and performances across tasks.
  3. Resource constraints: Aligning models to respond favorably requires extensive human preference data, often unattainable for most teams.
  4. Limited Knowledge Transfer: Independently trained models often fail to leverage insights from related tasks.

Model merging addresses these challenges by consolidating the parameters of multiple models into a single entity. It allows for the combination of various models by merging them into one. In doing so, you not only retain quality, but you also get additional benefits.

MergeKit: Quick Overview

MergeKit is an open-source library designed to simplify and standardize merging LLMs. MergeKit features:

  1. Efficiency: The library is designed to execute merging operations efficiently, regardless of the hardware used.
  2. Extensibility: MergeKit provides a flexible framework that can accommodate new merging strategies and adapt to future advancements in the field.
  3. Community-Driven: The library encourages community involvement in developing and refining merging techniques.

Let's Merge

Step 1: Clone the repo and install the dependencies

!git clone https://github.com/cg123/mergekit.git
!cd mergekit && pip install -q -e .        

Step 2: Write merge configuration in the YAML file

import yaml

yaml_config = """
slices:
  - sources:
      - model: Weyaxi/OpenHermes-2.5-neural-chat-v3-3-Slerp
        layer_range: [0, 32]
      - model: HuggingFaceH4/zephyr-7b-beta
        layer_range: [0, 32]
merge_method: slerp
base_model: Weyaxi/OpenHermes-2.5-neural-chat-v3-3-Slerp
parameters:
  t:
    - filter: self_attn
      value: [0, 0.5, 0.3, 0.7, 1]
    - filter: mlp
      value: [1, 0.5, 0.7, 0.3, 0]
    - value: 0.5 # fallback for rest of tensors
dtype: bfloat16
"""

# Save config as yaml file
with open('config.yaml', 'w', encoding="utf-8") as f:
    f.write(yaml_config)        

Step 3: Run merging

!mergekit-yaml config.yaml merge --copy-tokenizer --allow-crimes --out-shard-size 1B --lazy-unpickle        

Step 4: Model card

!pip install -qU huggingface_hub

from huggingface_hub import ModelCard, ModelCardData
from jinja2 import Template

MODEL_NAME = "mistralai-7B-slerp-v0.1"
username = "irahulpandey"

template_text = """
---
license: apache-2.0
tags:
- merge
{%- for model in models %}
- {{ model }}
{%- endfor %}
---

# {{ model_name }}

{{ model_name }} is a merge of the following models using [mergekit](https://github.com/cg123/mergekit):

{%- for model in models %}
* [{{ model }}](https://huggingface.co/{{ model }})
{%- endfor %}

## ?? Configuration

```yaml
{{- yaml_config -}}
```
"""

# Create a Jinja template object
jinja_template = Template(template_text.strip())

# Get list of models from config
data = yaml.safe_load(yaml_config)
if "models" in data:
    models = [data["models"][i]["model"] for i in range(len(data["models"])) if "parameters" in data["models"][i]]
elif "parameters" in data:
    models = [data["slices"][0]["sources"][i]["model"] for i in range(len(data["slices"][0]["sources"]))]
elif "slices" in data:
    models = [data["slices"][i]["sources"][0]["model"] for i in range(len(data["slices"]))]
else:
    raise Exception("No models or slices found in yaml config")

# Fill the template
content = jinja_template.render(
    model_name=MODEL_NAME,
    models=models,
    yaml_config=yaml_config,
    username=username,
)

# Save the model card
card = ModelCard(content)
card.save('merge/README.md')        

Step 5: Push to Hugging Face

from google.colab import userdata
from huggingface_hub import HfApi

# Defined in the secrets tab in Google Colab
api = HfApi(token=userdata.get("HF_TOKEN"))

api.create_repo(
    repo_id=f"{username}/{MODEL_NAME}",
    repo_type="model"
)
api.upload_folder(
    repo_id=f"{username}/{MODEL_NAME}",
    folder_path="merge",
)        

The model is available: https://huggingface.co/irahulpandey/mistralai-7B-slerp-v0.1

Let's compare the performance of the new model to?HuggingFaceH4/zephyr-7b-beta,?which is one of the models used for merging.

There has been a slight performance improvement. We can similarly use different merging techniques to see how we can improve performance on various benchmarks.

In this article, we've explored a simple model merging technique. I believe that this method will soon be used to create more models, as it is cost-effective and allows for the combination of functional skills without fine-tuning.



要查看或添加评论,请登录

Rahul Pandey的更多文章

  • Concept: Building PromptLab with MCP and LangGraph

    Concept: Building PromptLab with MCP and LangGraph

    Anthropic's MCP is going to be a foundational standard for connecting AI systems to external tools. It allows the…

  • Concept: Building MLflow MCP Server

    Concept: Building MLflow MCP Server

    MLflow is a powerful ML platform for managing the entire machine learning lifecycle, making each phase traceable and…

    2 条评论
  • Byte-Sized Paper Summary: Week 9, 2025

    Byte-Sized Paper Summary: Week 9, 2025

    It is a go-to source for concise summaries of research papers, cutting-edge tech releases, and key industry updates…

  • Choosing the Right Evaluation Metrics for your ML Project

    Choosing the Right Evaluation Metrics for your ML Project

    Introduction In machine learning, choosing the right evaluation metric is crucial for assessing model performance and…

  • Byte-Sized Paper Summary: Week 8, 2025

    Byte-Sized Paper Summary: Week 8, 2025

    It is a go-to source for concise summaries of research papers, cutting-edge tech releases, and key industry updates…

  • From Second Brain to On-The-Go Audio: Transforming My Notes into Podcasts

    From Second Brain to On-The-Go Audio: Transforming My Notes into Podcasts

    In my last article, I described how I created a system to accelerate my learning and upgraded my terminal, which helped…

    1 条评论
  • Byte-Sized Paper Summary: Week 7, 2025

    Byte-Sized Paper Summary: Week 7, 2025

    It is a go-to source for concise summaries of research papers, cutting-edge tech releases, and key industry updates…

  • Byte-Sized Paper Summary: Week 4, 2025

    Byte-Sized Paper Summary: Week 4, 2025

    It is a go-to source for concise summaries of research papers, cutting-edge tech releases, and key industry updates…

  • Byte-Sized Paper Summary: Week 3, 2025

    Byte-Sized Paper Summary: Week 3, 2025

    It is a go-to source for concise summaries of research papers, cutting-edge tech releases, and key industry updates…

  • Byte-Sized Paper Summary: Week 2, 2025

    Byte-Sized Paper Summary: Week 2, 2025

    It is a go-to source for concise summaries of research papers, cutting-edge tech releases, and key industry updates…

社区洞察

其他会员也浏览了