登录查看更多内容

How to Merge LLMs?

Rahul Pandey

Unlocking Business Potential with AI Solutions | Senior Solutions Architect @ adidas | Certified Expert in Databricks, AWS & GCP | Writer & Speaker | MLflow Ambassador ??

发布日期: 2025年2月3日

The landscape of open-source LLMs is evolving rapidly, with models now handling trillions of tokens and billions of parameters. The open-source models are easily accessible from platforms such as Hugging Face and Ollama , which truly democratize AI and help revolutionize various sectors.

However, fine-tuning separate models for each task presents challenges:

Storage and Deployment: Storing and deploying each model separately and the inability of independently trained models to leverage insights from related tasks
Catastrophic Forgetting: Training models from scratch requires substantial investment. Further, fine-tuning can lead to catastrophic forgetting, degrading their general capabilities and performances across tasks.
Resource constraints: Aligning models to respond favorably requires extensive human preference data, often unattainable for most teams.
Limited Knowledge Transfer: Independently trained models often fail to leverage insights from related tasks.

Model merging addresses these challenges by consolidating the parameters of multiple models into a single entity. It allows for the combination of various models by merging them into one. In doing so, you not only retain quality, but you also get additional benefits.

MergeKit: Quick Overview

MergeKit is an open-source library designed to simplify and standardize merging LLMs. MergeKit features:

Efficiency: The library is designed to execute merging operations efficiently, regardless of the hardware used.
Extensibility: MergeKit provides a flexible framework that can accommodate new merging strategies and adapt to future advancements in the field.
Community-Driven: The library encourages community involvement in developing and refining merging techniques.

Let's Merge

Step 1: Clone the repo and install the dependencies

!git clone https://github.com/cg123/mergekit.git
!cd mergekit && pip install -q -e .

Step 2: Write merge configuration in the YAML file

领英推荐

Ness NewsFlash | October 2024

Ness Digital Engineering 5 个月前

Nitor Infotech’s April Tech Bulletin

Nitor Infotech, an Ascendion Company 10 个月前

Forte Spotlight: 2024 Tech Trends, Performance Test…

Forte Group 1 年前

import yaml

yaml_config = """
slices:
  - sources:
      - model: Weyaxi/OpenHermes-2.5-neural-chat-v3-3-Slerp
        layer_range: [0, 32]
      - model: HuggingFaceH4/zephyr-7b-beta
        layer_range: [0, 32]
merge_method: slerp
base_model: Weyaxi/OpenHermes-2.5-neural-chat-v3-3-Slerp
parameters:
  t:
    - filter: self_attn
      value: [0, 0.5, 0.3, 0.7, 1]
    - filter: mlp
      value: [1, 0.5, 0.7, 0.3, 0]
    - value: 0.5 # fallback for rest of tensors
dtype: bfloat16
"""

# Save config as yaml file
with open('config.yaml', 'w', encoding="utf-8") as f:
    f.write(yaml_config)

Step 3: Run merging

!mergekit-yaml config.yaml merge --copy-tokenizer --allow-crimes --out-shard-size 1B --lazy-unpickle

Step 4: Model card

!pip install -qU huggingface_hub

from huggingface_hub import ModelCard, ModelCardData
from jinja2 import Template

MODEL_NAME = "mistralai-7B-slerp-v0.1"
username = "irahulpandey"

template_text = """
---
license: apache-2.0
tags:
- merge
{%- for model in models %}
- {{ model }}
{%- endfor %}
---

# {{ model_name }}

{{ model_name }} is a merge of the following models using [mergekit](https://github.com/cg123/mergekit):

{%- for model in models %}
* [{{ model }}](https://huggingface.co/{{ model }})
{%- endfor %}

## ?? Configuration

```yaml
{{- yaml_config -}}
```
"""

# Create a Jinja template object
jinja_template = Template(template_text.strip())

# Get list of models from config
data = yaml.safe_load(yaml_config)
if "models" in data:
    models = [data["models"][i]["model"] for i in range(len(data["models"])) if "parameters" in data["models"][i]]
elif "parameters" in data:
    models = [data["slices"][0]["sources"][i]["model"] for i in range(len(data["slices"][0]["sources"]))]
elif "slices" in data:
    models = [data["slices"][i]["sources"][0]["model"] for i in range(len(data["slices"]))]
else:
    raise Exception("No models or slices found in yaml config")

# Fill the template
content = jinja_template.render(
    model_name=MODEL_NAME,
    models=models,
    yaml_config=yaml_config,
    username=username,
)

# Save the model card
card = ModelCard(content)
card.save('merge/README.md')

Step 5: Push to Hugging Face

from google.colab import userdata
from huggingface_hub import HfApi

# Defined in the secrets tab in Google Colab
api = HfApi(token=userdata.get("HF_TOKEN"))

api.create_repo(
    repo_id=f"{username}/{MODEL_NAME}",
    repo_type="model"
)
api.upload_folder(
    repo_id=f"{username}/{MODEL_NAME}",
    folder_path="merge",
)

The model is available: https://huggingface.co/irahulpandey/mistralai-7B-slerp-v0.1

Let's compare the performance of the new model to?HuggingFaceH4/zephyr-7b-beta,?which is one of the models used for merging.

There has been a slight performance improvement. We can similarly use different merging techniques to see how we can improve performance on various benchmarks.

In this article, we've explored a simple model merging technique. I believe that this method will soon be used to create more models, as it is cost-effective and allows for the combination of functional skills without fine-tuning.

iBiteByte

379 位关注者

要查看或添加评论，请登录

Rahul Pandey的更多文章

Concept: Building PromptLab with MCP and LangGraph

2025年3月23日

Concept: Building PromptLab with MCP and LangGraph

Anthropic's MCP is going to be a foundational standard for connecting AI systems to external tools. It allows the…
Concept: Building MLflow MCP Server

2025年3月22日

Concept: Building MLflow MCP Server

MLflow is a powerful ML platform for managing the entire machine learning lifecycle, making each phase traceable and…

2 条评论
Byte-Sized Paper Summary: Week 9, 2025

2025年3月3日

Byte-Sized Paper Summary: Week 9, 2025

It is a go-to source for concise summaries of research papers, cutting-edge tech releases, and key industry updates…
Choosing the Right Evaluation Metrics for your ML Project

2025年2月28日

Choosing the Right Evaluation Metrics for your ML Project

Introduction In machine learning, choosing the right evaluation metric is crucial for assessing model performance and…
Byte-Sized Paper Summary: Week 8, 2025

2025年2月24日

Byte-Sized Paper Summary: Week 8, 2025

It is a go-to source for concise summaries of research papers, cutting-edge tech releases, and key industry updates…
From Second Brain to On-The-Go Audio: Transforming My Notes into Podcasts

2025年2月22日

From Second Brain to On-The-Go Audio: Transforming My Notes into Podcasts

In my last article, I described how I created a system to accelerate my learning and upgraded my terminal, which helped…

1 条评论
Byte-Sized Paper Summary: Week 7, 2025

2025年2月16日

Byte-Sized Paper Summary: Week 7, 2025

It is a go-to source for concise summaries of research papers, cutting-edge tech releases, and key industry updates…
Byte-Sized Paper Summary: Week 4, 2025

2025年1月27日

Byte-Sized Paper Summary: Week 4, 2025

It is a go-to source for concise summaries of research papers, cutting-edge tech releases, and key industry updates…
Byte-Sized Paper Summary: Week 3, 2025

2025年1月20日

Byte-Sized Paper Summary: Week 3, 2025

It is a go-to source for concise summaries of research papers, cutting-edge tech releases, and key industry updates…
Byte-Sized Paper Summary: Week 2, 2025

2025年1月13日

Byte-Sized Paper Summary: Week 2, 2025

It is a go-to source for concise summaries of research papers, cutting-edge tech releases, and key industry updates…

See all articles

How to Merge LLMs?

Rahul Pandey

Unlocking Business Potential with AI Solutions | Senior Solutions Architect @ adidas | Certified Expert in Databricks, AWS & GCP | Writer & Speaker | MLflow Ambassador ??

MergeKit: Quick Overview

Let's Merge

领英推荐

iBiteByte

379 位关注者

Rahul Pandey的更多文章

社区洞察

其他会员也浏览了

Evaluating Evals: Who will win AI’s reliability race?

Observability 2025: Navigating Costs, Complexity, and The Rise of AI

Exploring RAG System Architectures: A Comparative Analysis

GenAI-Powered Observability: What SREs Need to Know

Negative Time to Resolution; Preventing Outages Before They Happen

AutoGenesisAgent: Self-Generating Multi-Agent Systems for Complex Tasks

The Intelligent Matrix of AI and DevSecOps in Software Development

Harnessing the Power of Behavior Trees: TAI's Core Technology Strategy

ML pipeline architecture design patterns + other resources

MergeKit: Quick Overview

Let's Merge

领英推荐

iBiteByte

379 位关注者

Rahul Pandey的更多文章

Concept: Building PromptLab with MCP and LangGraph

Concept: Building MLflow MCP Server

Byte-Sized Paper Summary: Week 9, 2025

Choosing the Right Evaluation Metrics for your ML Project

Byte-Sized Paper Summary: Week 8, 2025

From Second Brain to On-The-Go Audio: Transforming My Notes into Podcasts

Byte-Sized Paper Summary: Week 7, 2025

Byte-Sized Paper Summary: Week 4, 2025

Byte-Sized Paper Summary: Week 3, 2025

Byte-Sized Paper Summary: Week 2, 2025

社区洞察

其他会员也浏览了

Evaluating Evals: Who will win AI’s reliability race?

Observability 2025: Navigating Costs, Complexity, and The Rise of AI

Exploring RAG System Architectures: A Comparative Analysis

GenAI-Powered Observability: What SREs Need to Know

Negative Time to Resolution; Preventing Outages Before They Happen

AutoGenesisAgent: Self-Generating Multi-Agent Systems for Complex Tasks

The Intelligent Matrix of AI and DevSecOps in Software Development

Harnessing the Power of Behavior Trees: TAI's Core Technology Strategy

ML pipeline architecture design patterns + other resources