How ChatGPT's Changing Behavior Will Affect Backend Services
Performance of the March 2023 and June 2023 versions of GPT-4 and GPT-3.5 (https://arxiv.org/pdf/2307.09009.pdf)

How ChatGPT's Changing Behavior Will Affect Backend Services

The conversational AI ChatGPT has rapidly been adopted by individuals and businesses for a myriad of applications, from creative writing to customer service. However, new research reveals that ChatGPT's performance and outputs have been changing substantially from month to month. For the many web and mobile services using ChatGPT's API in their backend stack, these shifting behaviours could seriously impact reliability and functionality.


ChatGPT Performance is Drifting on Key Tasks

A recent analysis by researchers at Stanford and UC Berkeley evaluated different versions of the GPT-3.5 and GPT-4 models that power ChatGPT. They tested the March 2023 and June 2023 versions on tasks like:

  • Math problem solving
  • Answering sensitive questions?
  • Generating executable code
  • Visual reasoning

The study found major differences between the two versions. For example:

  • GPT-4's math accuracy plummeted from 97.6% to just 2.4% between March and June.
  • The percentage of directly executable Python code generated dropped from 52% to 10% for GPT-4.
  • GPT-4 answered far fewer sensitive questions in June, but with less explanation.

In short, the outputs and capabilities of the "same" ChatGPT models changed significantly within a span of just 3 months.


How Backend Services Rely on Stable ChatGPT Outputs

Many modern web and mobile applications now incorporate ChatGPT into their backend infrastructure to power critical parts of the user experience:

  • Customer service chatbots
  • Content generation like marketing copy and reports
  • Answering customer questions about orders or products

These systems often depend on getting consistently high-quality outputs from ChatGPT's API for core functionality:

  • Certain accuracy rates on key user questions
  • Predictable language and terminology?
  • Appropriate refusal to engage with dangerous queries

However, with ChatGPT's behavior changing rapidly, these assumptions are no longer safe. The error rates and suitability of responses for key use cases can now shift dramatically from one month to the next.


Steps Services Can Take to Adapt?

Companies using ChatGPT's API in production need to take steps to safeguard functionality given these reliability risks:

  • Continuously monitor ChatGPT outputs with a suite of test queries, checking for performance drift.
  • Have human confirmation or reviews for any high-risk ChatGPT outputs.
  • Use multiple different LLMs and combine outputs for more robustness.
  • Implement hybrid systems with rules-based and ML components. ?
  • Abstract ChatGPT behind an internal API with business logic and safety checks.
  • Plan to regularly update integration code if OpenAI frequently updates models.


What This Means for the Future

The findings highlight the challenges of building on external proprietary AI services with no stability guarantees. As LLMs continue rapidly evolving, service providers and consumers will need increased vigilance to support responsible LLM integration.


Conclusion

ChatGPT's shifting behavior underscores the need for thoughtful, continuously tested integrations by service providers. Ongoing monitoring and safeguards will be essential as conversational agents continue improving. Truly reliable, safe LLM-based services will require collaboration between AI creators and commercial users.


References

[1] https://research.aimultiple.com/chatgpt-use-cases/

[2] https://poolmarketing.medium.com/the-dark-side-of-chatgpt-has-real-world-consequences-90bff03a00bf

[3] https://venturebeat.com/ai/not-just-in-your-head-chatgpts-behavior-is-changing-say-ai-researchers/

[4] https://towardsdatascience.com/decoupled-frontend-backend-microservices-architecture-for-chatgpt-based-llm-chatbot-61637dc5c7ea

[5] https://www.makeuseof.com/openai-chatgpt-biggest-probelms/

[6] https://cimatri.com/the-changing-behavior-of-chatgpt-over-time/

[7] https://ai.plainenglish.io/generative-ai-like-chatgpt-will-reshape-the-backend-stack-2ce242c5a9f5

[8] https://blog.pangeanic.com/final-thoughts-consequences-chatgpt-2023

[9] https://huggingface.co/papers/2307.09009

[10] https://talent500.co/blog/how-to-use-chatgpt-for-full-stack-development-a-comprehensive-guide/

[11] https://bgr.com/tech/chatgpt-gpt-5-everything-we-know-about-the-next-major-ai-upgrade/

[12] https://bootcamp.uxdesign.cc/supercharge-behavioural-science-with-chatgpt-how-ai-and-design-thinking-revolutionise-everything-d610b1ed548c

[13] https://medium.com/geekculture/create-a-customer-service-chatbot-with-chatgpt-api-184a0fc8ed55

[14] https://techcrunch.com/2023/07/25/chatgpt-everything-you-need-to-know-about-the-open-ai-powered-chatbot/

[15] https://medium.com/@jeffrey.james/the-impact-of-chatgpt-on-my-google-search-behavior-an-analysis-3de335ca189b

[16] https://www.itprotoday.com/artificial-intelligence/what-chatgpt-how-it-works-and-best-uses-chatbots

[17] https://www.youtube.com/watch?v=qdd17F9f5ms

[18] https://www.dhirubhai.net/pulse/transforming-project-management-impact-chatgpt-behavior-kaplan

[19] https://www.geeksforgeeks.org/what-is-chatgpt/

[20] https://www.pluralsight.com/blog/machine-learning/gpt-4-and-chatgpt-update

[21] https://www.dhirubhai.net/pulse/chatgpt-behavioral-health-game-changer-threat-paul-gulbin

[22] https://openai.com/blog/introducing-chatgpt-and-whisper-apis

[23] https://clp.law.harvard.edu/article/the-implications-of-chatgpt-for-legal-services-and-society/

[24] https://arxiv.org/abs/2307.09009

Ramar Smith

Director of Operations at RieVax driving IT efficiency and innovation

1 年

Why do you think this is happening?

回复
Frank DeLeo

Technology Management Services

1 年

Well written ?????? Junior Williams, CISSP! the article is informative and helps clarify assumptions on using ChatGPT's API in their backend stack, and how these shifting behaviours could seriously impact reliability and functionality.

MANOJ SHARMA

Leading Cybersecurity Expert empowering CISSP aspirants at Cybernous

1 年

?????? Junior Williams, CISSP really insightful. Thanks for sharing. What role do you think Artificial Intelligence (AI) plays in enhancing #cybersecurity defenses? Would like to know more.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了