登录查看更多内容

Simplifying Complexity of GenAI Model Customization

Praveen Jayachandran

Senior Technical Staff Member and Senior Manager at IBM Research

发布日期: 2024年12月13日

Enterprises are embracing generative AI as a differentiator for their core businesses. Rather than simply using a generic generative AI model that is a jack-of-all-trades, there is a growing trend of customizing a pre-trained model with additional enterprise-specific data to make it excel in a particular domain or task within the enterprise. This involves curating and processing high quality data, choosing a good pre-trained model that is right-sized for the use case (larger models are more expensive to train and infer and may not justify the return on investment in using GenAI), using one of several fine-tuning techniques, and building a robust evaluation methodology to ensure the trained model will perform well in the domain and tasks of interest. This is indeed complex and requires highly skilled scientists and engineers.

The model customization team at IBM Research is building a tuning platform based on leading open source community projects from PyTorch and HuggingFace (HF) ecosystems and also actively contributing to them. We are not only building the best ingredients and tools for model customization embracing open-source development, but we are also on a mission to curate the best recipes that will simplify model customization and enable domain experts with little knowledge of LLM-internals to be able to customize models to suit their needs. This tuning platform forms the basis of the tuning capabilities in Red Hat Openshift AI and IBM watsonx.ai.??

We support a wide range of model architectures including transformer-based models such as Granite and Llama, as well as structured state space models such as mamba, without the user having to know anything about its internals. We support multiple tuning techniques including full fine-tuning (SFT), PEFT LoRA, and quantized LoRA, with support for other techniques such as preference tuning and model distillation planned for the coming months. We recently open-sourced our model optimizer framework to develop reduced precision models with support for GPTQ and FP8 quantization, quantization-aware-training (QAT), and post-training quantization (PTQ).?

Model Customization is Efficient?

We have made several advancements over the past year to make model customization more resource efficient. The usual method to collate multiple sequences during supervised fine tuning is to pad sequences to be of the same length. These padding tokens introduce inefficiencies as they result in unproductive computations. By packing sequences together without padding, using token position information, and making it work with FlashAttention, we were able to obtain ~2x throughput improvement across models and tuning techniques. This was contributed to HF Transformers and TRL (HF blog, PR, PR).?

We helped establish parity between FSDP and DeepSpeed backends in HF to permit users to move between them seamlessly (HF blog, Accelerate documentation, PR). This was because DeepSpeed was internally upcasting weights loaded in bf16 to fp32, causing the loss behaviors to be different. We raised a PR in HF Accelerate to upcast automatically for FSDP if mixed precision is enabled, and included a guide to help users achieve equivalence between the two backends.??

We have implemented fused operations and custom triton kernels adapted from unsloth to accelerate tuning including low-rank adapter fused operations and fast kernels for RoPE, layer norm, and cross-entropy and made them available through our acceleration library. As an example, these kernels help improve throughput by ~40% and reduce memory requirements by ~30% for Mistral-7B. We also have triton kernels for parallelizing expert computations in mixture-of-expert (MoE) models.?

领英推荐

Scaling Generative AI Models: Key Challenges and…

Miracle Software Systems, Inc 3 周前

The AI and ML Revolution in Manufacturing

Liquid Technologies 7 个月前

Unveiling the Future: Hiring Dedicated AI & ML…

BookMyTalent 1 年前

Model Customization is Simple?

We have developed and adopted several tools to make model customization simpler for our users. We have developed a resource estimator to estimate memory requirements and training time even before you execute a tuning job. This is based on a hybrid method that leverages both theoretical knowledge of model training computations for different architectures and tuning techniques, as well as learned regression models from empirical observations of actual tuning runs. This can help provide an estimate for the trade-off that a user faces between how long a tuning job will take to complete and how much resources (and cost) they are willing to expend.?

We developed a trainer controller framework to control the training loop and perform different actions such as stopping training, checkpointing, and logging, based on user-defined metrics and rules using HF trainer callbacks. This permits user control to automatically stop training early if the training is not progressing well or to avoid overfitting. We also have a pluggable mechanism to integrate experiment tracking tools such as AimStack, wandb and MLFlow tracking, to easily monitor the training in near-real-time.??

We are just getting started and there are several other advancements we are working on jointly with others in the community. These include adding tensor parallelism support with PyTorch dtensors in HF Transformers and Accelerate, PyTorch compile optimizations for our tuning stack based on HF libraries, advances in cluster-level job management, improving tuning efficiency on the IBM Spyre AI accelerator chip, and tools to improve training data quality. I am extremely proud of and thankful to the global research team behind these advancements. We intend to publish more detailed blogs on each of these topics in the coming months.?

If you have prior experience in this area and are passionate about building the best platform for generative AI, we are hiring for our team at IBM’s India Research Lab and would love to hear from you!?

Mehrose H.

Facebook Ads Specialist @ YOC - YourOnlineConversation ? - UK

2 个月

Love every bit of it.

Syed Shafiyullah

2 个月

Insightful! Thanks for sharing Praveen Jayachandran sir. I liked the article when I read. You explain this topic very well in this article. Create an article more like that.

Steve Liu

2 个月

Well-written and Insightful! Thanks for sharing, Praveen

查看更多评论

要查看或添加评论，请登录

Praveen Jayachandran的更多文章

The Half Equal Opportunity

2020年3月9日

The Half Equal Opportunity

Opinions expressed here are my own and do not necessarily reflect the opinions of my employer. Despite substantial…
NPTEL ties up with IBM for new online blockchain course

2018年6月19日

NPTEL ties up with IBM for new online blockchain course

At one point of time during my PhD days, I was harboring ambitions of being an academic professor. Apart from engaging…

19 条评论
On The Essentiality of Inclusion

2018年6月5日

On The Essentiality of Inclusion

Robert Solow, a professor from MIT, won the Nobel prize for economics in 1987 for a rather intuitive yet masterfully…

2 条评论
Financial Innovation and the Need for Regulation

2018年4月21日

Financial Innovation and the Need for Regulation

This article presents my personal point of view of how regulation has helped nurture financial innovation and makes a…
Bitcoin Mining and Energy Dynamics

2018年2月18日

Bitcoin Mining and Energy Dynamics

A colleague of mine Shivkumar Kalyanaraman recently wrote an excellent article on bitcoin mining and its energy…
Why so Delirious about Blockchain? - A Technical View

2017年4月15日

Why so Delirious about Blockchain? - A Technical View

Over a billion dollars were invested in blockchain startups in 2016. For the first time, in Q1 2016, investment in…

21 条评论

See all articles

Simplifying Complexity of GenAI Model Customization

Praveen Jayachandran

Senior Technical Staff Member and Senior Manager at IBM Research

Model Customization is Efficient?

领英推荐

Model Customization is Simple?

Praveen Jayachandran的更多文章

社区洞察

其他会员也浏览了

Preparing Governments for Future Shocks: Generative AI is key to being “Future Ready” in 2024 and beyond

Role Evolution in the Era of Gen AI

Abacus AI revolutionising Applied AI

AMC Bridge Launches AI Center of Excellence to Drive Innovation and Empower Clients in AI and ML Integration and Usage

From Prototype to Production: Overcoming AI Deployment Hurdles in Industry

AI and ML Revolution

GenAI-Direct Preference Optimization (DPO): A Revolutionary Paradigm for Human-Centric Artificial Intelligence in Enterprise Applications

Overcoming Challenges In Implementing AI and Machine Learning

The Dispatch | Generative AI with Azure

Unleashing AI Power: Mastering Machine Learning for Optimal Industrial Implementation and Success Stories

Model Customization is Efficient?

领英推荐

Model Customization is Simple?

Praveen Jayachandran的更多文章

The Half Equal Opportunity

NPTEL ties up with IBM for new online blockchain course

On The Essentiality of Inclusion

Financial Innovation and the Need for Regulation

Bitcoin Mining and Energy Dynamics

Why so Delirious about Blockchain? - A Technical View

社区洞察

其他会员也浏览了

Preparing Governments for Future Shocks: Generative AI is key to being “Future Ready” in 2024 and beyond

Role Evolution in the Era of Gen AI

Abacus AI revolutionising Applied AI

AMC Bridge Launches AI Center of Excellence to Drive Innovation and Empower Clients in AI and ML Integration and Usage

From Prototype to Production: Overcoming AI Deployment Hurdles in Industry

AI and ML Revolution

GenAI-Direct Preference Optimization (DPO): A Revolutionary Paradigm for Human-Centric Artificial Intelligence in Enterprise Applications

Overcoming Challenges In Implementing AI and Machine Learning

The Dispatch | Generative AI with Azure

Unleashing AI Power: Mastering Machine Learning for Optimal Industrial Implementation and Success Stories