Fine-tuning models

Fine-tuning models

Fine-tuning models is a powerful technique in machine learning that involves adapting a pre-trained model to perform a specific task or improve its performance on a particular dataset. This process is especially useful because it leverages the knowledge the model has already acquired, saving time and computational resources compared to training a model from scratch.

Analogy

  1. Foundation Model: This is like a general medical education (MBBS), where doctors learn the basics and become generalists.
  2. Fine-tuned Model: This is like advanced specialized training (MS/MD), where doctors focus on specific fields like Cardiology, Dermatology, Neurology, etc to become specialists.

The diagram below uses the analogy to explain how a basic model can be fine-tuned to become specialized, similar to how doctors undergo further training to specialize in certain medical fields.

Neural Network

“A neural network model predicting the next word in a sentence”

The above diagram shows how a neural network model predicts the next word in a sentence. Here’s a simple breakdown:

  1. Input Layer: The model takes four words as input (in this case, “cat,” “sat,” “on,” “a”).
  2. Hidden Layer: These words are processed through interconnected units (represented by circles and lines) that analyze the context.
  3. Output Layer: The model predicts the next word based on the given context. In this example, it predicts “mat” with a 97% probability.

So, the model is essentially trying to guess the next word in a sentence by understanding the context provided by the previous words.

Training a new large language model

Training a large language model like Llama 2 70B is a significant investment. This diagram illustrates the scale and resources involved in training such a large AI model:

  1. Data: “Chunk of the internet, ~10TB of text,” representing the vast amount of text data used for training the model.
  2. Computational Power: Highlights the use of “6,000 GPUs for 12 days, ~$2M,” indicating the significant computational resources and cost required for the training process.
  3. Model Size: Shows a “ZIP” file labeled “~140GB file,” which represents the size of the trained AI model.

This emphasizes the extensive data, computational power, and cost involved in training modern AI systems.

When to fine-tune a language model

  1. Prompt engineering is a quick and easy way to improve how the model acts, and what the model needs to know.
  2. When you want to improve the quality of the model even further, there are two common techniques that are used:

  • Retrieval Augmented Generation (RAG): Ground your data by first retrieving context from a data source before generating a response.
  • Fine-tuning: Train a base language model on a dataset before integrating it in your application.

Prompt Engineering, RAG, and Fine-tuning

  • Prompt Engineering?is a technique that involves designing prompts for natural language processing models. This process improves accuracy and relevancy in responses, optimizing the performance of the model.
  • Retrieval Augmented Generation (RAG)?improves Small or Large Language Model (SLM/LLM) performance by retrieving data from external sources and incorporating it into a prompt. RAG allows businesses to achieve customized solutions while maintaining data relevance and optimizing costs.
  • Fine-tuning?retrains an existing Small or Large Language Model using example data, resulting in a new "custom" Model that has been optimized using the provided examples.

Fine tuning benefits

  • Improved Performance: Fine-tuning can help improve the performance of the model on a specific task. This is especially useful when the pre-trained model was trained on a different but related task.
  • Efficiency: Fine-tuning is often faster and less resource-intensive than training a model from scratch. This is because the pre-trained model has already learned useful features from a large dataset.
  • Data Limitations: If you have a small dataset, fine-tuning a pre-trained model can lead to better performance than training a model from scratch. The pre-trained model acts as a good initialization point and can generalize well to the new task.
  • Domain Adaptation: Fine-tuning can help adapt a model trained on one domain (e.g., general English text) to a different domain (e.g., medical text or legal text).
  • Customization: Fine-tuning allows you to customize the model to your specific needs, making it more powerful and context-aware.

Using Microsoft Olive to fine-tuning

  • Olive is a very easy-to-use open-source model optimization tool that can cover both fine-tuning and reference in the field of generative artificial intelligence.
  • Requires simple configuration, combined with the use of open-source small language models and related runtime environments (AzureML/local GPU, CPU, DirectML).
  • Complete the fine-tuning or reference of the model through automatic optimization and find the best model to deploy to the cloud Or on edge devices.
  • Allow enterprises to build their own industry vertical models on-premises and in the cloud.

QLORA

QLoRA, or Quantized Low-Rank Adapters, is an innovative method for fine-tuning large language models (LLMs) efficiently. It combines quantization and low-rank adapters to reduce memory usage significantly, allowing the fine-tuning of models with billions of parameters on a single GPU.


Here are some key features of QLoRA:

  • 4-bit Quantization: Uses a new data type called 4-bit NormalFloat (NF4) to optimize memory usage.
  • Double Quantization: Further reduces memory footprint by quantizing the quantization constants.
  • Paged Optimizers: Manages memory spikes during training.

QLoRA has been shown to achieve state-of-the-art results with smaller models and less computational resources. It’s a promising approach for democratizing access to large-scale model fine-tuning.

QLORA improves over LoRA by quantizing the transformer model to 4-bit precision and using paged optimizers to handle memory spikes.

Model Fine Tuning in the Cloud

Fine-tuning in Azure involves customizing a pre-trained model to better suit your specific needs. Here’s a general overview of the process:

  1. Select a Pre-trained Model: Choose a model that closely aligns with your task. Azure offers various models, including language models like Phi-3-mini-4k-instruct.
  2. Prepare Your Data: Gather and format the data you’ll use for fine-tuning. This data should be relevant to the tasks you want the model to perform.
  3. Upload Data to Azure: Use Azure Storage to upload your dataset. Ensure your data is properly organized and accessible.
  4. Configure Fine-Tuning: Use Azure Machine Learning or other Azure services to configure the fine-tuning process. This includes setting parameters like learning rate, batch size, and the number of epochs.
  5. Run Fine-Tuning: Execute the fine-tuning process. Azure will use your data to adjust the model’s weights, optimizing it for your specific tasks.
  6. Evaluate the Model: After fine-tuning, evaluate the model’s performance using a validation dataset. Make adjustments as needed to improve accuracy.
  7. Deploy the Model: Once satisfied with the model’s performance, deploy it using Azure’s deployment services. This makes the model available for use in your applications.

Model Fine Tuning locally using AI Toolkit

AI Toolkit for VS Code?simplifies generative AI app development by bringing together cutting-edge AI development tools and models from Azure AI Studio Catalog and other catalogs like Hugging Face. You will be able browse the AI models catalog powered by Azure ML and Hugging Face, download them locally, fine-tune, test and use them in your application. You can also fine-tune and deploy models to the cloud (preview)


The AI Toolkit for Visual Studio Code (VS Code) is a powerful extension that enables developers to explore, fine-tune, and integrate AI models into their applications. Here are some key features and capabilities of the AI Toolkit:

  1. Model Catalog: The AI Toolkit includes a catalog of curated models from Azure AI Studio and HuggingFace, optimized to run locally on Windows and Linux, both on CPU and GPU.
  2. Model Playground: This feature allows developers to experiment with small language models locally or in the cloud.?It provides a controlled environment to explore and understand the capabilities of various models.
  3. Fine-Tuning: The AI Toolkit supports advanced fine-tuning techniques, including Parameter Efficient Fine Tuning (PEFT), Quantized Low Rank Adaptation (QLORA), and Flash Attention 2.?These techniques help optimize model performance for specific use cases.
  4. Deployment: Developers can deploy fine-tuned models seamlessly to Azure Container Apps (ACA), Azure Kubernetes Service (AKS), or Azure AI Studio.?The toolkit also supports high-performance inferencing on Windows using ONNX Runtime and DirectML.
  5. Integration: The AI Toolkit integrates with Azure AI Studio and prompt flow SDKs, allowing developers to create custom evaluations and measure model performance on various metrics.
  6. Cross-Platform Support: The AI Toolkit is designed to work across platforms, making it easier for developers to experiment with new models in their applications.

References:


Andrew Zhang

GTM Partner | Y Combinator | AI Startup | Generative AI | ex-Amazon | ex-IBM

2 个月

Great summary Tarun Sharma, thanks for sharing

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了