登录查看更多内容

AI and the end of Integration Testing

Sam Schillace

发布日期: 2023年9月17日

Every time there is a fundamental change in how software can be delivered or built, there is a corresponding change in the toolchain and common practices for building it. The desktop had waterfall and manual testing, the cloud had CI/CD and automated testing, and more dependency on telemetry. AI and LLMs will be no different.

We are used to being able to perform both unit and integration tests on software before we release it. Both involve defining some environment and domain that are fixed, that we can test inside of to look for the behaviors we want. Integration testing, particularly with things like large scale distributed services, is notoriously hard because you have to simulate a large environment and many interactions very accurately to get a meaningful and complete result. Programmers still do a lot of “smoke testing” and manual “checking” - because it’s hard to admit that fully integration testing anything is hard. As an industry, we rely on telemetry and user reporting to catch what we miss instead.

AI, particularly more independent agents, is going to finish breaking our capability to do integration testing in advance and is going to usher in a different kind of development pattern, or at least evolutions of the current one.

Why is integration testing going to become impossible? Well, largely because the “aperture” of what is possible for a program (or agent) is going to get essentially infinitely wide. We are already seeing some early attempts at this. The list of an agent can interact with and actions they might take can be essentially infinite - as large as natural language plus all of the APIs they can reach plus the full complexity of the real world. Even if you could set up a full replication of “the real world” as a sandbox to test in, the combinatorial complexity is far too high to test even a representative sample.

What does this mean for testing? How do we build and operate safe software in a world where we can’t test as much as we want? I don’t know - no one fully does - but I think this will drive us to be more telemetry focused, and to build “self-checking” systems that will cost more in compute, but do more checking and correction at runtime, instead of at test time.

AgileAI Labs 11 个月前

FREE AI-based Test Software, Bust the Myths of Testing…

Joe Colantonio 1 年前

Advancing QA: Harnessing AI for Smarter Testing…

Dave Balroop 6 个月前

This is hard to do - we aren’t talking about measuring things like latency or crashes, but about measuring more “semantic” properties like “safe”, “helpful”, “nice”, “making progress”. These will likely require their own classifiers and inference to do well, which of course is a hard security problem at scale. We will have to learn to use a lot more compute to monitor and manage agents at scale - single prompts can be mostly tested in isolation now but agents that are more complex won’t be testable in the same way.

We call this idea “semantic telemetry” because of that need to test semantic properties in real-time. It’s a challenge! There’s no absolute measure of, say, “helpful”. There can only be examples and rubrics, and, hopefully, a stable relative measure on some fixed scale. It might be the case that we will, as an industry, produce “common behavior rubrics” and start to do things like certify an agent holds to them - hard to say.

There are other testing challenges coming too - regression testing is another one that seems apparent, and similarly complex and in the semantic realm. Because so much of what these might do is open-ended, it will likely be the case that anything that is highly dependent on fixed behaviors of the base model will be too brittle, but there will still be things like verbosity or maybe ‘basic intelligence’ that more complex programs will depend on. How do we specify and test this? Is there a real-time component where we can make predictions about which model will work right and schedule an inference accordingly (“semantic scheduling” and optimization)?

There are likely even more problems that will emerge as larger teams begin trying to build more and more complex programs and agents that use LLM base models and other AI models as programming objects. The era of semantic engineering is beginning, and we will have to find what the new development patterns are that work for it.

Dave Duggal

Founder and CEO @EnterpriseWeb

1 年

Sam Schillace - Spot on! GenAI automation is the inevitable future. Intelligent Agents translating NLP voice or text "intent" into actions. While some use-cases won't require determinism as there is tolerance for creativity/mistakes. There is an opportunity to transform the Enterprise and finally overcome silos and connect value streams end-to-end. However, mission-critical systems still require determinism so GenAI needs to be grounded with domain knowledge (ontology) to ensure safe, governed behavior. Our company premiered the industry's 1st Enterprise-grade generative AI orchestration in May. We demonstrated a developer expressing their intent in an informal, unscripted conversation with our platform via Microsoft Jarvis and ChatGPT powered UI to compose, deploy and manage a complex network service. Our platform provides the context, constraints and capabilities. It uses agents to interpret the NLP inputs and dynamically construct a response based on context. Everything is late-bound. Every interaction is wrapped with security/identity, reliable messaging, transaction guarantees and state management. https://enterpriseweb.com/omdia-covers-enterprisewebs-telco-grade-generative-ai-use-case/

1 次回应

Smart Tools AI

1 年

Interesting

Kaj Pedersen

Chief Technology Officer at AstrumU

1 年

This is an interesting idea that will see take up in the realm of safety critical systems (healthcare, airlines, nuclear power, and secure transaction to name a few), which is where the optimization will emerge to make this an economically viable approach to large scale integration testing. We do truly live on the cusp of a remarkable age for knowledge based systems.

查看更多评论

要查看或添加评论，请登录

查看全部

AI and the end of Integration Testing

Sam Schillace

领英推荐

更多精彩文章

社区洞察

其他会员也浏览了

What experts predicts for software testing future in 2018.

Testim for AI-powered Augmented Testing in 2024; Real-Life Instances!

API Testing Market Overview: Navigating Quality Assurance in the Digital Era

The Future of Software Testing: Trends in Test Automation and Optimization

15 Software Testing Trends to Watch Out in 2021.

Top Testing Trends of 2024: Gear Up for the Future of Testing!

AI-Driven Test Prioritization

Optimizing Software Testing Timelines and Costs with AI and Machine Learning

Top QA Trends to Follow in 2022: Key Developments to Thrive in Software Testing

Mastering Quality with AI: The Future of Software Testing Unveiled

领英推荐

Argue the idea, not the person

2024年10月13日

The innovator's (mental) diet

2024年10月6日

There is no path to the unknown

2024年9月29日

There are no easy problems in startups or careers

2024年9月22日

Memory, Planning, Thinking, o1

2024年9月15日

Phase 2 of the GenAI transition is starting

2024年9月8日

What is an AI recipe?

2024年9月1日

What I learned about products from making tacos

2024年8月25日

AI is a funny coworker

2024年8月18日

Holding two thoughts

2024年8月11日