登录查看更多内容

Domain-Specific SLMs: A Strategic Alternative to Fine-Tuning Large Language Models

Assem Hijazi

发布日期: 2024年11月25日

Article 3: small language model series, previous article Comparing Small Language Models (SLMs) and Large Language Models (LLMs) | LinkedIn

Introduction: Should Organizations Create a Private Small Language Model (SLM)?

As organizations explore Generative AI, the idea of creating a private Small Language Model (SLM) has gained traction. However, this approach requires careful consideration due to high costs and significant challenges. Unlike Large Language Models (LLMs) like ChatGPT, SLMs are not suitable for general-purpose use and should instead focus on domain-specific or subdomain-specific applications.

This article discusses the key factors, challenges, and considerations for creating an SLM, helping organizations determine whether this approach aligns with their needs.

Top key Points to Consider for SLM Creation

1. The Idea of a Private SLM for Organizations:

Private SLMs enable organizations to leverage AI while maintaining control over sensitive data. Creating an SLM is complex and costly, making it suitable only in specific scenarios.

2. Costs and Challenges:

Building an SLM involves substantial expenses, including dataset acquisition, hiring experts, and infrastructure setup. Even an SLM, while smaller than an LLM, requires significant investment to achieve meaningful results.

3. Why Generic SLMs are Not Recommended:

Generic SLMs cannot compete with general-purpose LLMs in handling diverse prompts and broad functionality. Creating a generic SLM is impractical and inefficient, as it lacks the scale and dataset diversity of LLMs.

4. The Focus of SLMs:

SLMs are most effective when tailored to specific domains or subdomains. Examples of suitable use cases include: Mission-critical organizations (e.g., military, police). Financial institutions (e.g., banks seeking custom fintech solutions). Organizations handling sensitive, proprietary data.

5. Key Questions for Organizations to Evaluate:

What are the goals and purpose of the SLM? How does the SLM align with the organization’s broader AI strategy? What architectural considerations and data privacy measures are required? What are the cost implications compared to fine-tuning an LLM?

6. The Role of an AI Center of Excellence:

An AI Center of Excellence can guide strategic decisions, ensuring alignment with organizational objectives. Such a framework can evaluate whether fine-tuning an LLM or creating an SLM is the optimal approach.

Note: we will cover this in future articles in different series of AI Strategy for organizations

7. Important to consider:

The decision to create an SLM or fine-tune an LLM is not one-size-fits-all. Organizations must evaluate their goals, resources, and data privacy needs to ensure the chosen approach aligns with their long-term strategy.

Detailed considerations for Creating or Adopting an SLM

1. Cost Implications of Building an SLM from Scratch

Dataset Acquisition Challenges: Acquiring high-quality, domain-specific datasets is expensive and time-intensive. Requires partnerships with data providers or significant in-house data generation efforts.
Hiring Expertise: Building an SLM requires specialized AI/ML engineers, domain experts, and data scientists. Costs for assembling and maintaining such a team can exceed initial estimates.
High Initial Costs: Training infrastructure (GPUs/TPUs), algorithm development, and ongoing iteration make building an SLM from scratch significantly costlier than leveraging existing solutions.
Recommendation: Building an SLM from scratch should only be considered when there’s a highly specific scenario that cannot be met by existing models.

2. Domain and Subdomain Specificity

Narrow Focus is Essential: Building an effective SLM requires targeting specific domains or even subdomains. General-purpose SLMs are inefficient and dilute the advantages of specialization.
Requirement for Domain Expertise: Deciding the domain or subdomain and curating relevant datasets require collaboration with subject matter experts (SMEs). SMEs must also help validate the performance of the SLM in practical use cases.
Impact on Costs and Feasibility: Without clarity on the domain, development costs and timelines can spiral out of control. Misaligned datasets or domain definitions can render the SLM ineffective.

3. Challenges with Fine-Tuning on LLMs

Transparency Issues: Companies cannot easily monitor or control how fine-tuned knowledge interacts with the LLM's base parameters. Providers often claim no access to fine-tuning data, but there’s little visibility into what happens during fine-tuning or model updates.
Knowledge Exchange Risks: While fine-tuning claims to isolate proprietary data, organizations can’t fully audit what the major LLM “learns” from the fine-tuning.
Fine-Tuning Suitability: While fine-tuning works well for some scenarios, it may not align with organizational requirements for privacy, control, or specificity.

4. Importance of Starting with Pre-Existing SLMs

Leverage Existing Models: Instead of building from scratch, organizations can start with pre-existing SLMs designed for specific domains (e.g., financial services, healthcare). These models often come with curated datasets and pre-tuned capabilities.
Transparent Datasets: The transparency of datasets used to train pre-existing SLMs is critical. Organizations must have access to and control over the training datasets to ensure alignment with their use cases and data governance policies.
Customization Potential: Ready-made SLMs should allow easy integration of organizational datasets to fine-tune or expand capabilities without retraining the entire model.

5. Compatibility of Algorithms

Generic vs. Domain-Specific Algorithms: Most pre-trained SLMs rely on generic algorithms optimized for broad applications. Organizations may require specific algorithms tailored to niche requirements or compliance needs.
Flexibility for Algorithmic Changes: The SLM must support adding or modifying algorithms, including integrating public or custom algorithms for domain-specific tasks.
Technical Expertise Required: Modifying algorithms in pre-existing SLMs requires skilled developers who understand both the model’s architecture and the targeted domain.

领英推荐

Why Multi-Agent Systems like Sparse Mixture-of-Agents…

WalkingTree Technologies 3 个月前

Exploring LLMs: Insights from AI Open Day in Skopje

HTEC 3 个月前

Emergence of Small Language Models

Navdeep Singh Gill 1 年前

6. Integration of Organizational Datasets

Ease of Adding Data: The ease with which organizational datasets can be added to the SLM is a critical factor for success. Organizations should prioritize models that offer intuitive interfaces for integrating proprietary datasets.
Data and Model Alignment: Not all datasets are inherently compatible with the existing model’s training data or algorithms. Mismatched datasets can degrade model performance or require additional preprocessing efforts.

Fine-tuning Vs. private SLM

The following table summarize the top-level key consideration on quick summary for SLM versus fine-Tuning

Ready-made Small Language Models (SLMs) available in the market

Note:

Based on quick searching
Does not cover specific domain focus

Understanding the Costs of Building or Adopting a Small Language Model (SLM)

Estimating the costs of building or adopting an SLM provides a general sense of the financial commitment required. However, the figures presented here are based on quick searches and general market research, not detailed consultation with experts.

Actual costs may vary significantly depending on factors such as domain specificity, dataset availability, and infrastructure requirements.

1. Dataset Acquisition and Curation Costs

Acquiring High-Quality Data: For domain-specific or subdomain-specific SLMs, obtaining relevant datasets is essential. Costs can range from $50,000 to $200,000, depending on the domain, size, and exclusivity of the dataset.
Data Cleaning and Preprocessing: Ensures that datasets are usable and high-quality. Requires data engineers and domain experts, costing an additional $10,000 to $50,000.
Synthetic Data Generation: In cases where existing data is insufficient, synthetic data generation might be necessary, adding $50,000 to $150,000.

2. Expert Hiring Costs

Domain Experts: Specialists are needed to define the domain or subdomain and validate the SLM’s performance. Salaries for subject matter experts (SMEs) range from $80,000 to $150,000 per year.
AI/ML Engineers and Data Scientists: Experienced professionals to design, train, and maintain the model. Salaries for AI/ML engineers can range from $100,000 to $200,000 annually.

3. Infrastructure and Training Costs

On-Premise Infrastructure: Setting up GPUs or TPUs for training can cost $50,000 to $200,000, depending on model size.
Cloud-Based Infrastructure: Using cloud services for training and hosting can cost $10,000 to $30,000 per month, depending on usage.
Training Costs: Training an SLM, even at a smaller scale (e.g., 1 billion parameters), requires significant compute power. Training costs alone can range from $20,000 to $100,000, depending on the complexity of the model.

4. Fine-Tuning and Iteration Costs

Periodic Fine-Tuning: Adjustments to the model to incorporate new data or improve performance can cost $20,000 to $50,000 per iteration.
Ongoing Updates and Monitoring: Maintenance and monitoring require dedicated resources, costing $50,000 to $100,000 annually.

5. Pre-Existing SLM Customization Costs

Using pre-existing SLMs reduces initial costs but still involves customization:

Model Licensing: Open-source models (e.g., Mistral, TinyLlama) may be free, but some require licensing fees, which can range from $10,000 to $100,000.
Data Integration: Integrating organizational data into pre-existing SLMs requires preprocessing and adaptation, costing $10,000 to $50,000.
Algorithm Customization: Modifying or adding domain-specific algorithms can cost $20,000 to $100,000, depending on complexity.

Total estimated cost

Factors Influencing Costs

Domain Complexity: Specialized domains like healthcare or defense increase data acquisition and expertise costs.
Scale of the Model: Larger models (more parameters) increase training and deployment expenses.
Customization Needs: Adding or modifying algorithms and datasets impacts development and maintenance costs.
Infrastructure Choice: Cloud-based systems offer scalability but incur recurring costs, while on-premise setups require significant upfront investment.

Recommendations to Optimize Costs

Leverage Pre-Existing SLMs: Start with open-source or commercially available SLMs to minimize initial investment.
Focus on Domain-Specificity: Narrowing the scope to a specific domain or subdomain reduces dataset and algorithm customization costs.
Prioritize Dataset Transparency: Use models with accessible and validated training datasets to avoid compatibility issues.
Evaluate Long-Term ROI: Ensure that the investment aligns with the organization’s strategic goals and offers measurable benefits.

Mawahid Shaik

Manager - Data Intelligence | Data Analytics | Data Governance | Reporting Insights & Dashboard

3 个月

Very insightful information

1 次回应

Lorenzo Mari ???

Digital Product Owner | Blockchain Solutions | Driving Product Strategy, Maximizing Product Value, and Creating Roadmaps that Align with Business Objectives and Customer Needs | Agile Methodologies

3 个月

Assem, insightful breakdown on the SLM vs. LLM dilemma. The cost factor you highlighted is often underestimated. Organizations need to weigh the allure of a "private" solution against the potential ROI. A robust cost-benefit analysis, factoring in not just development but also maintenance and potential obsolescence, is crucial. Great point about the AI Center of Excellence - strategic alignment is key for any AI initiative to succeed. ??

1 次回应

查看更多评论

要查看或添加评论，请登录

Assem Hijazi的更多文章

Build Your Vision: Inspired by Sheikh Mohammed bin Rashid

2025年3月4日

Build Your Vision: Inspired by Sheikh Mohammed bin Rashid

In today’s fast-changing world, individuals and organizations — especially in the government sector — often struggle…
Are Governments Building Fake AI Data Centers? The Ugly Truth Behind the Hype

2025年3月3日

Are Governments Building Fake AI Data Centers? The Ugly Truth Behind the Hype

Introduction and executive summary— A Real-Life Reflection from Experience Lately, I’ve been watching the global race…
What Enterprise AI Agents Are Missing — Taking CoPilot as an Example

2025年2月28日

What Enterprise AI Agents Are Missing — Taking CoPilot as an Example

In a previous article, I talked about my vision for CoPilot ERP and how Microsoft could improve the way AI works with…
How Open Source can fuel the LLMs to produce new era of App Development ?

2025年2月27日

How Open Source can fuel the LLMs to produce new era of App Development ?

Introduction as executive summary Open-source software has always been seen as a game-changer. Free, flexible, and…
Who Will Win the Generative AI Race? The Answer Lies in the Ecosystem

2025年2月24日

Who Will Win the Generative AI Race? The Answer Lies in the Ecosystem

The AI Race Is Not Just About Models The race between generative AI models like ChatGPT, DeepSeek, CoPilot, Gemini, and…

2 条评论
ChatGPT vs. DeepSeek and Others: Who is the Winner? Are We Missing Something?

2025年1月29日

ChatGPT vs. DeepSeek and Others: Who is the Winner? Are We Missing Something?

In recent days, the buzz around generative AI engines has been louder than ever. Everyone is talking about ChatGPT…

1 条评论
ChatGPT Operator: A Digital Proxy for Human Interaction?

2025年1月27日

ChatGPT Operator: A Digital Proxy for Human Interaction?

Recently, OpenAI announced the release of Operator, a new feature integrated into ChatGPT. When I first heard about it,…

4 条评论
Why the Future of Data Science Lies in Generative AI Skills

2025年1月25日

Why the Future of Data Science Lies in Generative AI Skills

I found myself wondering one day: Is all my study and knowledge in data science still valid in this new era of…
Dedicated Features for Every User: The Future of LLM-Driven ALM Apps

2025年1月24日

Dedicated Features for Every User: The Future of LLM-Driven ALM Apps

Article no 8 the AI future LLM series, previous article Robotics-Based LLM driven apps (ALM Apps) The future generation…
Customizing ChatGPT for Parenting: Take Control of Your Kids’ Video Games

2025年1月21日

Customizing ChatGPT for Parenting: Take Control of Your Kids’ Video Games

Article no 2 the Parents Control and Video Games Series, previous article ChatGPT and Video Games: Empowering Parents…

See all articles

Introduction: Should Organizations Create a Private Small Language Model (SLM)?

Top key Points to Consider for SLM Creation

1. The Idea of a Private SLM for Organizations:

2. Costs and Challenges:

3. Why Generic SLMs are Not Recommended:

4. The Focus of SLMs:

5. Key Questions for Organizations to Evaluate:

6. The Role of an AI Center of Excellence:

7. Important to consider:

Detailed considerations for Creating or Adopting an SLM

1. Cost Implications of Building an SLM from Scratch

2. Domain and Subdomain Specificity

3. Challenges with Fine-Tuning on LLMs

4. Importance of Starting with Pre-Existing SLMs

5. Compatibility of Algorithms

领英推荐

6. Integration of Organizational Datasets

Fine-tuning Vs. private SLM

Ready-made Small Language Models (SLMs) available in the market

Understanding the Costs of Building or Adopting a Small Language Model (SLM)

1. Dataset Acquisition and Curation Costs

2. Expert Hiring Costs

3. Infrastructure and Training Costs

4. Fine-Tuning and Iteration Costs

5. Pre-Existing SLM Customization Costs

Total estimated cost

Factors Influencing Costs

Recommendations to Optimize Costs

Assem Hijazi的更多文章

Build Your Vision: Inspired by Sheikh Mohammed bin Rashid

Are Governments Building Fake AI Data Centers? The Ugly Truth Behind the Hype

What Enterprise AI Agents Are Missing — Taking CoPilot as an Example

How Open Source can fuel the LLMs to produce new era of App Development ?

Who Will Win the Generative AI Race? The Answer Lies in the Ecosystem

ChatGPT vs. DeepSeek and Others: Who is the Winner? Are We Missing Something?

ChatGPT Operator: A Digital Proxy for Human Interaction?

Why the Future of Data Science Lies in Generative AI Skills

Dedicated Features for Every User: The Future of LLM-Driven ALM Apps

Customizing ChatGPT for Parenting: Take Control of Your Kids’ Video Games

社区洞察

其他会员也浏览了

A Business Leader's Guide to Language Models: Making Informed AI Decisions

Pioneering AI Frontier: Unleashing Natural Language Interface

Small Language Models—Scaling Down Without Losing Value

The Rise of Domain-Specific Large Language Models and Why it Matters to Organizations

Why Tech Leaders Are Turning to Small Language Models: A Smart Move in the AI Landscape

The Executive's Guide to Fine-Tuning Large Language Models: Unlocking Business Value

Small Language Models (SLMs): Powerful AI on a Smaller Scale

The Irony of "Small" Large Language Models

Choosing Your AI Text Gen Weapon: RAG vs. Long Context LLMs

Microsoft’s New Love