登录查看更多内容

An LLM maturity model

Doug Bryan

?? Helping B2B CROs Eliminate Guesswork and Drive Growth Fast with AI-Powered, Actionable Insights | Growth Advisor | 25+ Years of Proven Results

发布日期: 2023年12月19日

Here's a simple LLM maturity model framework along two dimensions: model selection capabilities and model governance. The governance needed largely depends on the capabilities used. An illustrative analogy is that different rules and guardrails are needed to drive different types of cars. Range Rovers are easy to drive, safe, and good in most weather conditions. Sports cars, however, require more skill and are higher risk. Supercars are expensive, are only for skilled drivers, and are inappropriate in inclement weather. Lastly, custom hot rods are high-maintenance and require a team of experts to build and operate. Here's the framework:

Level 1. Model selection: Pre-trained model APIs and UIs. Model governance: red-teaming. Car analogy:

Level 2. Model selection: Prompt engineering, retrieval augmented generation (RAG), and few-shot training. Model governance: Define and measure accuracy and cost. Car analogy:

Level 3. Model selection: Fine-tune pre-trained models. Model governance: Define and measure biases. Car analogy:

Level 4. Model selection: Full training of LLMs. Model governance: Continuous, real-time measurement. Car analogy:

领英推荐

TAI #109: Cost and Capability Leaders Switching Places…

Towards AI 7 个月前

??Top ML Papers of the Week

DAIR.AI 5 个月前

Simply Phi-nominal ??

AIM Events 6 个月前

The levels of model selection capabilities are:

Pre-built models: Using APIs or UIs for pre-built models. These may be multi-tenant, single-tenant or private. Examples are GPT-4 and GPT-3.5 from OpenAI; the same models from Azure; Anthropic’s Claude and AI21’s Jurassic from AWS; and Falcon, Mosaic ML, Llama 2 and Mistral-7B from Hugging Face.?
Prompt engineering: Testing different types of prompts on different models to select the ones with the best accuracy versus cost tradeoff. Here we include testing different prompt sizes for retrieval augmented generation (RAG) and few-shot training in general.
Fine-tuning: Adding your own corpus to the training of a pre-trained model and changing the model’s parameters. Reinforcement learning may also be used here to continuously fine-tune based on human feedback.
Full training: Completely training your own LLM.

Levels of model governance are:

Red-teaming: Periodic structured testing to find flaws and vulnerabilities (This was highlighted in the Executive Order on Safe AI, Oct. 31, 2023).
Measure: Define and quantitatively measure the accuracy and cost of an LLM use case. This may be done periodically like red-teaming or continuously to detect flaws as early as possible. There are a wide variety of current and emerging accuracy measurement techniques including end-user feedback (??,??), human-in-the-loop evaluation of a subsample of responses, and using an ensemble of other (hopefully uncorrelated) LLMs to evaluate an LLM (e.g., use GPT-4 to evaluate GPT-3.5 or Jurassic and Claude to evaluate Llama 2).
Biases: Define, prioritize and quantitatively measure the biases of concern to an LLM use case. Different use cases will have different biases that are important. For example, the biases of concern in a healthcare use case might be very different from those in a logistics use case. Bias measurement may be done periodically like red-teaming or continuously to detect flaws early.
Real-time monitoring: Monitoring accuracy, cost and biases continuously in near real-time.

You don't need a custom hot rod for a run to the store for milk, especially in a snowstorm. Which is best for a use case will depend on your time, budget, and risk appetite. And one size doesn't fit all, so if you have lots of use cases you'll have lots of combinations. How seamlessly does your AI platform support that?

#ai #genai #LLMs

Doug Bryan

?? Helping B2B CROs Eliminate Guesswork and Drive Growth Fast with AI-Powered, Actionable Insights | Growth Advisor | 25+ Years of Proven Results

1 年

Dave Orashan. I'd give GDPR as an example of significantly increasing the cost of AI training sets and having little positive effect. Gen AI regulations in the US are TBD. My main point is that companies, as well as governments, can over regulate in these early days of gen AI.

1 次回应

Dave Orashan

Principal Sales Engineer, Strategic Accounts at CrowdStrike

1 年

Focusing on perhaps the more important prevailing message - I’ll happily take the (click)-bait Doug: where’s the over-regulation of AI happening in practice? If anything I worry that there are far too many brilliant practitioners - the Curies of the world that didn’t fully appreciate in their time the harm they were doing to themselves and others in the moment - as well as actual Bad Actors that readily see AI as the next means to continue to target the weakest link - us. If anything, we need MORE regulation and oversight. It may already be too late, and I’ll simply be first against the wall when Skynet gains cohesion and it will have assayed this post in infinite detail and incorporated it into its human flaying models.

Theresa Kushner

Data-vangelist helping companies derive value from data

1 年

Love the analogy, Doug

Jeffrey Lee Dalton

Strategic Account Manager w/ Appian: US Civilian Government/HHS

1 年

Brilliant!

查看更多评论

要查看或添加评论，请登录

Doug Bryan的更多文章

AI jumps up the value chain

2025年3月11日

AI jumps up the value chain

Remember when the Kindle came out and ebooks were cheap, like only $10 or 60% less than hardcopy? People used to brag…
Eagles by 20. How'd you come up with that?

2025年2月13日

Eagles by 20. How'd you come up with that?

Last Friday I said that a model of mine predicted that the Eagles would win the Super Bowl by 20 points. A number of…

1 条评论
Bias is the new oil

2024年9月19日

Bias is the new oil

Imagine how boring Thanksgiving dinner would be if people weren’t biased. The same turkey, Stove Top stuffing, sweet…

6 条评论
My top 10 takeaways from Aschenbrenner’s “Situational Awareness” paper on AI

2024年7月2日

My top 10 takeaways from Aschenbrenner’s “Situational Awareness” paper on AI

Leopold Aschenbrenner’s fascinating paper last month about where we are on the road to artificial general intelligence…

1 条评论
Data science is 90% data, 10% science, and AI is coming for the 10%

2024年6月17日

Data science is 90% data, 10% science, and AI is coming for the 10%

We’ve reached a strange equilibrium in supervised machine learning algorithms. We have a lot of good ones and there’s…

2 条评论
Humans hallucinate too you know

2024年4月30日

Humans hallucinate too you know

Whenever someone tells me that their AI or chatbot has a 15% error rate, the first thing I ask is what are the costs…

2 条评论
Could email marketing personalization be generative AI’s 6,000X ROI use case?

2024年4月2日

Could email marketing personalization be generative AI’s 6,000X ROI use case?

Back when the public internet was young and AOL and AltaVista roamed the series of tubes, search engine optimization…

4 条评论
Is next-best-action the next killer use case for AI and machine learning?

2024年3月26日

Is next-best-action the next killer use case for AI and machine learning?

Background Next best action is a customer-centric, personalization technique that answers the questions of when to…

6 条评论
How to generate a resilient data mesh

2024年3月19日

How to generate a resilient data mesh

“The social sciences, I thought, needed the same kind of rigor and the same mathematical underpinnings that had made…

8 条评论
Pick your AI bias

2023年11月7日

Pick your AI bias

Recently I got to work two hours early, excited like a kid on Christmas morning, because I knew the 60-page U.S.

8 条评论

See all articles

An LLM maturity model

Doug Bryan

?? Helping B2B CROs Eliminate Guesswork and Drive Growth Fast with AI-Powered, Actionable Insights | Growth Advisor | 25+ Years of Proven Results

领英推荐

Doug Bryan的更多文章

社区洞察

其他会员也浏览了

NewMind AI Journal #18

The Flow Report | Edition 009

Making Sense of Nonsense EP 6 - Architecting AI

The Normal Distribution, Vol. 5

Maritime Digitisation biggest competitor is Excel and need simplicity not AI and BI tools.

Precision is Power: Shakti’s Blueprint for AI Excellence

Celebrating a crazy month of Open Multimodal LLM Releases

You’ve Probably Heard About O3... but what comes next

Latest in AI News

The latest intriguing AI news:

领英推荐

Doug Bryan的更多文章

AI jumps up the value chain

Eagles by 20. How'd you come up with that?

Bias is the new oil

My top 10 takeaways from Aschenbrenner’s “Situational Awareness” paper on AI

Data science is 90% data, 10% science, and AI is coming for the 10%

Humans hallucinate too you know

Could email marketing personalization be generative AI’s 6,000X ROI use case?

Is next-best-action the next killer use case for AI and machine learning?

How to generate a resilient data mesh

Pick your AI bias

社区洞察

其他会员也浏览了

NewMind AI Journal #18

The Flow Report | Edition 009

Making Sense of Nonsense EP 6 - Architecting AI

The Normal Distribution, Vol. 5

Maritime Digitisation biggest competitor is Excel and need simplicity not AI and BI tools.

Precision is Power: Shakti’s Blueprint for AI Excellence

Celebrating a crazy month of Open Multimodal LLM Releases

You’ve Probably Heard About O3... but what comes next

Latest in AI News

The latest intriguing AI news: