登录查看更多内容

The Importance of Continuous LLM Training and Maintenance

Anand Mahurkar

Founder & CEO

发布日期: 2023年7月19日

Yesterday's very extensive report on How Is ChatGPT’s Behavior Changing over Time? ( https://arxiv.org/pdf/2307.09009.pdf ) by Lingjiao Chen, Matei Zaharia , James Zou of 美国斯坦福大学 and 美国加州大学伯克利分校 is making rounds on all the platforms. After studying the report, thought of summarizing my understanding from the document.

As shown by the analysis in this report, the performance of LLMs can shift substantially within a short timespan of just 3-4 months. This highlights the need for continuous training and rigorous monitoring of LLMs after they are deployed.?

The analysis evaluated GPT-3.5 and GPT-4 in March 2023 and then again in June 2023 on diverse tasks like math problem solving, visual reasoning, code generation and answering sensitive questions. Surprisingly, there were major performance fluctuations:

GPT-4's accuracy on classifying prime numbers dropped from 97.6% to just 2.4% from March to June.?
The percentage of GPT-4's code generations that were executable declined from 52% to 10% over the period.
GPT-4 became more reluctant to answer dangerous questions directly, but also gave less explanatory rationale.

领英推荐

Artificial Intelligence #202

Andriy Burkov 1 年前

Artificial Intelligence #202

Andriy Burkov 1 年前

Top LLM Papers of the Week (November Week 1, 2024)

Kalyan KS 4 个月前

Several factors likely contribute to such drifts, including changes to the model architecture, training data, and safety constraints. The key insight is that LLMs are not static - their behavior keeps evolving after initial deployment.

This has two important implications:

First, LLM users cannot treat services like GPT-3.5 as a "fixed" black box. They need to continuously monitor the LLM's outputs and performance on representative tasks, even if the provider claims it is the "same" model. Failing to do so risks suddenly losing expected functionality.
Second, LLM providers must continue model training and maintenance to retain quality over time. As data and use cases expand, the models require regular fine-tuning to adapt. Safety constraints also need periodic updating as new vulnerabilities or biases emerge.

In summary, LLMs are not "set it and forget it" systems. To leverage them responsibly over the long term, maintaining rigorous LLM training and monitoring practices is essential. The fluid nature of LLMs necessitates ongoing vigilance and care from both users and providers.

Sanjay Sawwalakhe

Solutions Architect | Product management | Cloud | AI /ML

1 年

LLM that will solve the real-world domain-specific problem set with the highest accuracy and speed will win the world and agree with your obsrevations

Alphamet

1 年

6 months ago it looked like AI / LLMs were going to bring a much needed revival to the venture startup ecosystem after a few tough years. With companies like Jasper starting to slow down, it’s looking like this may not be the case. Right now there are 2 clear winners, a handful of losers, and a small group of moonshots that seem promising. Let’s start with the losers. Companies like Jasper and the VCs that back them are the biggest losers right now. Jasper raised >$100M at a 10-figure valuation for what is essentially a generic, thin wrapper around OpenAI. Their UX and brand are good, but not great, and competition from companies building differentiated products specifically for high-value niches are making it very hard to grow with such a generic product. I’m not sure how this pans out but VC’s will likely lose their money... https://www.dhirubhai.net/pulse/6-months-ago-looked-like-ai-llms-atlas-product-management/?trackingId=jvB60muqSxSvDeG%2B%2BBa3Fg%3D%3D. If you enjoyed this post, don't forget to follow. I share one long-form post per week covering AI, startups, open-source, and more. That's all folks! Thanks for reading

查看更多评论

要查看或添加评论，请登录

Anand Mahurkar的更多文章

Senator Josh Hawley’s Bill Targets Chinese AI: The DeepSeek Dilemma and Global Censorship Risks

2025年2月4日

Senator Josh Hawley’s Bill Targets Chinese AI: The DeepSeek Dilemma and Global Censorship Risks

I would like to discuss two interesting legal issues around DeepSeek for my network to be aware, in case you have not…

8 条评论
Trump’s Revocation of Executive Order 14110: What It Means for AI Governance in the U.S.

2025年1月21日

Trump’s Revocation of Executive Order 14110: What It Means for AI Governance in the U.S.

By Anand Mahurkar, CEO, Findability Sciences On his first day back in office, President Donald Trump took swift action…

8 条评论
This week’s court ruling related the case on Google & use of public data.

2024年6月8日

This week’s court ruling related the case on Google & use of public data.

Two days ago the United States District Court for the Northern District of California has issued significant ruling in…

2 条评论
Generative AI: Transforming Regulatory Affairs Through Business Process Co-Pilots

2024年5月2日

Generative AI: Transforming Regulatory Affairs Through Business Process Co-Pilots

In the swiftly evolving landscape of regulatory compliance, businesses face an ever-increasing burden of keeping up…

1 条评论
Navigating New Horizons: The FTC's Ban on Noncompete Agreements

2024年4月25日

Navigating New Horizons: The FTC's Ban on Noncompete Agreements

We at Findability Sciences are poised to embrace the recent Federal Trade Commission (FTC) ruling that bans #noncompete…

2 条评论
The NYT vs. OpenAI/Microsoft Lawsuit: A Pivotal Moment for AI and Copyright Law

2024年1月8日

The NYT vs. OpenAI/Microsoft Lawsuit: A Pivotal Moment for AI and Copyright Law

The recent lawsuit filed by The New York Times against OpenAI and Microsoft has stirred the legal and tech communities…

2 条评论
2024’s Enterprise AI Landscape: My Top 3 Trend Predictions

2023年12月26日

2024’s Enterprise AI Landscape: My Top 3 Trend Predictions

As we step into 2024, the realm of Artificial Intelligence (AI) continues to evolve at a breathtaking pace, reshaping…

2 条评论
Embracing a New Era in AI: The IBM-Meta Alliance and Its Impact on Enterprise AI

2023年12月5日

Embracing a New Era in AI: The IBM-Meta Alliance and Its Impact on Enterprise AI

I am thrilled to discuss today's announcement of the #AIAlliance formed by Meta and IBM. This groundbreaking…

4 条评论
Revamping the Role of AI for Social Good: Findability Sciences' Alliance with On Purpose is Pioneering New Norms for Societal Impact

2023年8月28日

Revamping the Role of AI for Social Good: Findability Sciences' Alliance with On Purpose is Pioneering New Norms for Societal Impact

The revolutionary potential of AI cannot be contested. Today, we are escalating this transformative strength beyond…

2 条评论
Three Profound Takeaways from the Magical Fusion of Shakti by John McLaughlin, Zakir Hussain and Shankar Mahadevan

2023年8月18日

Three Profound Takeaways from the Magical Fusion of Shakti by John McLaughlin, Zakir Hussain and Shankar Mahadevan

Today, I had the pleasure of experiencing the musical brilliance of the fusion band #Shakti at the renowned #Wang…

4 条评论

See all articles

The Importance of Continuous LLM Training and Maintenance

Anand Mahurkar

Founder & CEO

领英推荐

Anand Mahurkar的更多文章

社区洞察

其他会员也浏览了

??Top ML Papers of the Week

How to Whip an LLM into a Better Problem Solver (5) - Claude Opus 3 no Better

Multi-LLM Routing: From Feature Engineering to Model Building

SearchGPT. It's all here—streamlined, sleek, and fast. (GIG = Google Is Gone)

GPT-5 vs. GPT-4: What’s New and Why It Matters for IT Pros

Yet Another Rant About Why AI Doesn’t Meet My Expectations (Despite All the Hype)

The MLOps Monthly Mini #15

I see RAG everywhere

Summary of what we have learned during AMA hour with the OpenAI o1 team on 2024-09-13

There’s nothing new under the sun!!!

领英推荐

Anand Mahurkar的更多文章

Senator Josh Hawley’s Bill Targets Chinese AI: The DeepSeek Dilemma and Global Censorship Risks

Trump’s Revocation of Executive Order 14110: What It Means for AI Governance in the U.S.

This week’s court ruling related the case on Google & use of public data.

Generative AI: Transforming Regulatory Affairs Through Business Process Co-Pilots

Navigating New Horizons: The FTC's Ban on Noncompete Agreements

The NYT vs. OpenAI/Microsoft Lawsuit: A Pivotal Moment for AI and Copyright Law

2024’s Enterprise AI Landscape: My Top 3 Trend Predictions

Embracing a New Era in AI: The IBM-Meta Alliance and Its Impact on Enterprise AI

Revamping the Role of AI for Social Good: Findability Sciences' Alliance with On Purpose is Pioneering New Norms for Societal Impact

Three Profound Takeaways from the Magical Fusion of Shakti by John McLaughlin, Zakir Hussain and Shankar Mahadevan

社区洞察

其他会员也浏览了

??Top ML Papers of the Week

How to Whip an LLM into a Better Problem Solver (5) - Claude Opus 3 no Better

Multi-LLM Routing: From Feature Engineering to Model Building

SearchGPT. It's all here—streamlined, sleek, and fast. (GIG = Google Is Gone)

GPT-5 vs. GPT-4: What’s New and Why It Matters for IT Pros

Yet Another Rant About Why AI Doesn’t Meet My Expectations (Despite All the Hype)

The MLOps Monthly Mini #15

I see RAG everywhere

Summary of what we have learned during AMA hour with the OpenAI o1 team on 2024-09-13

There’s nothing new under the sun!!!