登录查看更多内容

Can you use AutoML for Generative AI Development?

Manasi Vartak

Chief AI Architect at Cloudera | Prev Founder & CEO at Verta (acq by Cloudera) | MIT PhD in AI Infrastructure

发布日期: 2024年4月4日

If you've built a Generative AI app, you'll know that much of getting generative AI "to work" involves endless prompt engineering, testing an ever-expanding list of LLMs, and figuring out what “good” even looks like.

For someone like me who did traditional ML and deep learning for many years, this sounds very much like the undifferentiated work everyone disliked with traditional ML: should you use a random forest or binary classification? Should you use 40 trees or 60? What about tree depth?

Traditional ML practitioners used automation and techniques like Bayesian optimization of hyperparameters, i.e., collectively termed AutoML, as a means to remove this undifferentiated work.

While AutoML has not been without limitations (overfitting being the primary) it has become an effective way to (1) empower teams with limited DS/ML expertise to perform simple data science and (2) a surprisingly effective way for experts to get to a useful starting point. None of the DS-ML Platforms today would be complete without AutoML capabilities.

The success of AutoML for Traditional ML begs the question:

Can we use AutoML techniques to similarly remove the undifferentiated work in building Generative AI and get to high-quality results faster?

That is the vision we set out to build with the Verta GenAI Workbench and it's been exciting to see the effectiveness of AutoML techniques for GenAI.

So, in this post, I’m going to draw the parallels between AutoML for traditional vs. generative AI, highlight what’s solved, what remains open, and where we can go from here.

First, a quick primer on AutoML for Traditional (predictive) ML:

AutoML seeks to take an input dataset {X, y} and seeks to produce the function “f” such that f(X) → y maximizes some quality metrics (e.g., accuracy.) It does so by exploring the space of models ("f"-s), their hyperparameters, and different transformations of X & y.

Here are the steps involved and the output:

Select candidate models
Select model hyperparameter variations
Select feature engineering strategies
For all(*) combinations of above and potential ensembles, train models on the training split of the dataset
Evaluate models on the test split dataset (or via cross-validation)
Select the model that maximizes the chosen quality metric

Output from AutoML typically looks like this:

Now on to GenAI, how can we use the same AutoML techniques for GenAI?

AutoML for GenAI has the same philosophy as AutoML for traditional ML with a few tweaks as shown in the table below. For example, since LLMs are typically pre-trained, there is no need to perform model training. Similarly, instead of varying hyperparameters during model training, we vary model input parameters like prompt, temperature, chunking, and so on.

AutoML for Traditional ML vs. AutoML for Generative AI

Pronix Inc 4 个月前

Generative AI: The Game-Changer that Will Transform…

Mohammad Arshad 1 年前

The impact of GenAI in application modernization

AGILE Infoways 5 个月前

While, on the surface, it seems like AutoML for GenAI can skip several steps, the steps that remain are challenging to automate and currently ill-defined. Specifically, solving AutoML to Generative AI brings up three challenges:

(1) Experimenting with prompt variations: While hyperparameters are usually numeric (e.g., “C” value) or belong to a small number of categories (e.g., gradient descent algorithm), the universe of prompts is much more open-ended and complex: you can view it as a very high dimensional vector (as some work does.) As a result, techniques like Bayesian optimization are much more challenging to apply effectively.

(2) Datasets: Often, when beginning a GenAI project, the train/test dataset is hard to build. Gen AI may represent a new type of task where training data has not been captured or even available, e.g., when building a bot to turn engineering notes into blogs, there may be no data about past engineering notes.

(3) Evaluation & metrics: While computing accuracy in traditional ML is very formulaic, evaluating LLM results today is extremely ad-hoc, with many teams resorting to “vibe checks.” In addition, the automated LLM evaluators available today still have inconsistent performance. Finally, evaluations are further complicated by the fact that it can be hard to describe what good may look like unless you see some examples (“the language is too flowery”, “this is too long.”)

Some Solutions:

Although these challenges make AutoML for GenAI different from AutoML for traditional ML, we have found that these hurdles are not insurmountable. Here are some of the techniques we have been using in the Workbench.

Autoprompting and AI-powered prompt refinement: New and better ways to prompt LLMs are constantly being developed (e.g., Chain-of-Thought, Reason-And-Act etc.) In addition, meta-prompting work has begun to show promise for helping craft effective prompts. LLM's natural writing talent combined with meta-prompting is now good enough to generate strong prompts automatically. In Verta PromptBrew, we utilize these techniques to create diverse prompts of high quality. Automatically generated prompts are not perfect, but in most cases, 100% better than what a beginner would write.?

Verta PromptBrew creating prompts for the task: "Write a LinkedIn post from a blog article"

Datasets: Depending on the task, it is possible to synthetically generate a starter dataset for your GenAI app. Such approaches are frequently used for RAG problems, e.g., as described in this Hugging Face blog. Beyond synthetic data, this step can be hard to automate. The good news is that to get to decent results, you don’t need 100s of examples, you need tens. That order of magnitude makes the problem much more tractable.?

Evaluation: LLM evaluation is hard, subjective, and an active research problem. However, a few techniques can go a long way.

(1) For AutoML, we are looking to establish a relative ranking of variants to pick the best one. Pairwise comparisons producing ELO scores are a perfect fit for this task. Moreover, you typically only need tens of comparisons. The number of these comparisons can be further reduced by smartly choosing the pairs to compare. This is a key approach we use in the Workbench.

(2) Off-the-shelf LLM-based evaluators are improving daily and can be used to augment human labeling. However, we have found that although these evaluators provide a decent sanity check, they aren't great at fine-grained quality checks.

(3) Ultimately, we think that human labeling + few-shot-prompting based evaluators are the key to building evaluators that capture human preferences. This is where some of latest experiments have focused.

With these techniques for prompting to evaluation, a Verta Workbench user can get from a User Task description to a high-quality app in 23 minutes.

That’s pretty darn great. Again, the goal of AutoML is not getting to the SOTA result, it is to get to a result that is “good enough” and iterate from there.

I’m excited about the promise of AutoML for Generative AI and as a means to get to useful Generative AI faster. If you’ve done these explorations yourself, we would love to hear your experiences and collaborate with you.

And if you want to see how well this can work in real life, give it a spin at app.verta.ai!

Vincent Valentine ??

CEO at Cognitive.Ai | Building Next-Generation AI Services | Available for Podcast Interviews | Partnering with Top-Tier Brands to Shape the Future

7 个月

Excited to see the promising results! Manasi Vartak

Manasi Vartak

Chief AI Architect at Cloudera | Prev Founder & CEO at Verta (acq by Cloudera) | MIT PhD in AI Infrastructure

7 个月

If you are curious about what PromptBrew looks like without signing up, here's a separate link: https://www.verta.ai/promptbrew

查看更多评论

要查看或添加评论，请登录

查看全部

Can you use AutoML for Generative AI Development?

Manasi Vartak

Chief AI Architect at Cloudera | Prev Founder & CEO at Verta (acq by Cloudera) | MIT PhD in AI Infrastructure

First, a quick primer on AutoML for Traditional (predictive) ML:

Now on to GenAI, how can we use the same AutoML techniques for GenAI?

领英推荐

Some Solutions:

更多精彩文章

社区洞察

其他会员也浏览了

Understanding Prompt Engineering: A Strategic Imperative for Senior Business and IT Leaders

5 Revolutionary Ways Generative AI is Changing the Tech Industry

Leveraging Generative AI in Enterprises: A Comprehensive Guide

The architecture of Generative AI in plain English

AthenaGPT: Priming the Prompt Engineering

Generative - AI: More Jobs, New Roles... But Updated Education Needed At Every Age

Beyond Generative AI: The Rise of Autonomous AI Agents !

Empowering Business Strategies with Generative AI Solutions

What Is Generative AI & How Can I Use It?

First, a quick primer on AutoML for Traditional (predictive) ML:

Now on to GenAI, how can we use the same AutoML techniques for GenAI?

领英推荐

Some Solutions:

AI Regulations Will Stress Test ML Operations to Ensure Compliance

2023年5月2日

Verta Model Catalog

2023年5月2日

Embrace Model Lifecycle Management to Tackle the Challenges of Doing ML at Scale

2023年4月25日

How to move fast in AI without breaking things.

2019年6月28日

社区洞察

其他会员也浏览了

Understanding Prompt Engineering: A Strategic Imperative for Senior Business and IT Leaders

5 Revolutionary Ways Generative AI is Changing the Tech Industry

Leveraging Generative AI in Enterprises: A Comprehensive Guide

The architecture of Generative AI in plain English

AthenaGPT: Priming the Prompt Engineering

Generative - AI: More Jobs, New Roles... But Updated Education Needed At Every Age

Beyond Generative AI: The Rise of Autonomous AI Agents !

Empowering Business Strategies with Generative AI Solutions

What Is Generative AI & How Can I Use It?