Quest for the Holy Grail of Test Automation (AI Testing)

Preface

The advent of DevOps has played a significant role in the growth of test automation. While many automation tools look great on the surface - their promises ultimately fall short. Among the shortcomings are expensive proprietary products, lack of uniform testing against various software platforms, and a critical need for skilled engineers to maintain test suites. The latter of which defeats the entire purpose of automation. These drawbacks cause a dependency on many automation products and integrations to simply cover all the bases.

AI Deception

Perhaps more insidious than the rest of the drawbacks, many modern automation tools disingenuously boast "AI-driven" features. A commonly touted feature is "AI self-healing" which in layman's terms means the test will attempt to interact with similar, nearby or historical elements if a prior interaction failed and update the test if the new interaction succeeded. It is clear to the keen eye that there is no real "AI" usage in these products.

Big Flaws

A major issue with current automation tools is that they are either too simplistic or too complex. There is no balance. Either the tool cannot accomplish common tasks or it is too unwieldy at scale and becomes hard to maintain. These tools are designed with inherently flawed principles of white and gray box testing.

Simple vs. Advanced

A simplistic automation tool would be a recorder that captures a tester's session as a list of coordinates with reference images or element paths during each interaction. On the other hand, an advanced automation tool would be an overly-engineered proprietary automation editor with a spattering of built-in functionality.

Neither of these tools can accomplish the primary goal of test automation - testing a system as if a real human were using it and ensure it works as expected. In a perfect world, automation would interact with the system under test exactly as a human would - with uncertainty, variance, and no understanding of the underlying system.

This conclusion then posits the following question - how would an authentic "AI-driven" test automation tool be implemented? A good place to start would be to design a high-level architecture with the AI technology we have right now.

High-level Implementation

The proposed AI testing tool is a solution to the current offerings that fail to replace adequate manual UAT testing. This is not targeting code-level testing such as API or unit testing.

The following is a high-level implementation:

1. AI testing performed is purely black box

AI testing does not rely on how the system was coded, rather on how it is actively presented
AI testing can only interact with the system as a human would
By seeing what is presented
By making decisions based on what is presented
By taking actions as humans would (clicking, dragging, typing, tapping, swiping, etc.)

2. AI tests are defined in such a way that steps are inputs to Natural Language Processing (NLP)

Steps are written exactly as you would instruct a human to test it

3. AI testing can be initiated in many ways

AI tests can be defined as steps for re-use
AI tests can be ran on the fly via a text prompt
Text prompt can initiate a whole test run or be used for sequentially entered steps

4. AI testing does not rely on learning a specific system - rather it learns how to use technology as whole and applies its knowledge on a moment by moment basis

AI testing is trained on how to identify and use common UI/UX designs
AI testing can be continually trained to identify and use the latest UI/UX concepts
AI testing is trained without bias as to the UI/UX, learns all software platforms

5. AI testing fills in the knowledge gaps by itself

AI testing takes the given information (NLP steps) and applies it to the current context (what it is seeing) to figure out what to do next
AI testing determines itself, if what it was told to do, was successfully accomplished
AI testing can recognize by itself when something unintended has occurred

6. AI testing is at least as fast as a human or faster

AI testing makes decisions on parity with a human
AI testing has the patience on parity with a human
AI testing has the ability to troubleshoot as a human

The common thread throughout the proposed high-level implementation for the AI testing tool is that it behaves only as a human can. It does not base decisions or actions on the underlying technology of the system under test, rather it makes decisions based on the UI/UX patterns it was trained for and applies the test steps in the context that it is actively present in.

Trying a Common Scenario

Let's take a common UAT testing scenario and apply it to our proposed AI testing tool. Suppose we want to test a shopping website. The exchange below is what the AI testing tool was told and how it applied its knowledge to perform the test:

Test Step: "Open the browser and go to ebay.com"
AI: Knowing that a browser is an application that is commonly found on the desktop, it opens the first browser app it finds. Knowing that a browser is used to search the web or go directly to addresses, it enters "ebay.com" into the address bar and executes the search
Test Step: "Search for shoes and add the first two to the cart"
AI: Knowing that search bars are a common pattern on websites, it looks for the most prominent search bar and types the word "shoes" and executes the search. Knowing that searches may take some time it actively waits for search results to come in. Knowing that search results are usually in an prominently organized list, it finds the list of shoes. Knowing that adding a product to a shopping cart is a common pattern, it looks for the relevant "Add to Cart" buttons for the first two shoes it sees and adds them.
Test Step: "Go to the cart and remove the second item, then proceed to checkout"
AI: Knowing that a shopping cart icon is a common pattern, it looks for the relevant icon and uses it to access the cart. Knowing that a shopping cart page lists the items that were added to the cart, it looks for the ability to remove the second item and does it. Knowing that a common pattern is to confirm a cart before checking out, it looks for the relevant button to continue with the checkout and uses it.
Test Step: "Fill out the checkout form with a random identity and credit card payment"
AI: Knowing that a checkout page is a common pattern that requires identity, address, and card payment information to be entered, it looks for the relevant fields and types in random information.
Test Step: "Finish the checkout and ensure the order was successful"
AI: Knowing that a checkout page must be confirmed for the order to be placed is a common pattern, it looks for the relevant button to confirm the order and then waits. Knowing that a result page with successful order is a common pattern, it looks for relevant text like receipt information, tracking link, or arrival dates to decide the order was successful.

The scenario above was only 5 steps written in natural language. As the AI testing tool was trained to understand these UI/UX concepts, it was able to piece together the directions it was given with the current context of what it was seeing. This made the test a breeze to execute and required no hand holding, coding, or internal information about the system. It executed the test almost exactly as a human would have, given the same steps.

Too Good To Be True?

So if all of this seems too good to be true, then why has no one created a tool like this yet? The answer is simple: we are now living in the crowning moment of available AI technology at the intersection of computer integration. However, AI technology and the way to implement it requires technologists like us to go out of our way to learn the foundations, concepts, and libraries. It also requires a unique set of skills to understand the needs of automated testing and how they apply to available AI technology. Not to mention it requires an entirely new tool to be designed and built from the ground up.

Technical Implementation

Okay, the high-level implementation is great and all, but what do the specifics look like? What technologies would you implement and why?

Here is a proposal on how to put this AI testing tool together by using technology that's available right now:

1. Start by defining your UI/UX training data folder structures

Create parent categories like "website", "android", "ios"
Create child categories like "menu", "button", "search"

2. Aggregate images for your UI/UX training data

Download examples of website menus and place them under website > menu
Download examples of android buttons and place them under android > button
Download examples of ios search bars and place them under ios > search
Be creative about what the AI should learn and repeat this for all combinations of your training data folder structures with relevant images

3. Train your AI models one by one

Train an Image Classification model that can identify website menus using your training data from website > menu
Repeat this for all combinations of your training data folder structures
Follow your AI training skills, like choosing the right pre-trained model, improve the training with validation sets, tune the learning rate and continually improve the loss
This will result in multiple models that you can export for deployment
Each AI model will be used in its respective situation for predicting whether the UI/UX is relevant to use

4. Create the foundations for your AI testing tool engine

Use computer vision libraries to provide screen information to the engine
Use relevant OS libraries to provide ability for the engine to interact with the system

5. Define your NLP training data folder structures

Create categories like "desktop", "mobile"

6. Aggregate text examples for your NLP training data

Download plain text tabular data for "desktop" that includes phrases like "click", "right click", "download", "hamburger menu", "slider", "radio button", etc.
Download plain text tabular data for "mobile" that includes phrases like "tap", "swipe", "double tap", "hold tap", etc.

7. Train your NLP models one by one

Train a model that can predict whether a given phrase is related to desktop technology using the "desktop" training data
Train a model that can predict whether a given phrase is related to mobile technology using the "mobile" training data
Follow your AI training skills to improve the NLP predictions

8. Integrate your AI models to the AI testing tool engine

Hook up the NLP models to parse a text prompt input and predict whether the given phrases are related to desktop or mobile applications
Now that you predicted the target application is desktop or mobile, apply the relevant text to be searched against the computer vision outputs of what is currently on the screen
Suppose the current step was "click the About menu item" the prediction would be desktop (because click is a desktop action) and the UI/UX context would be menu item with the desired target About
Hook up the UI/UX models to predict which sector of the current computer vision output matches a desktop > menu relevant interface that includes the text About
Now use the relevant OS libraries to interact with the screen in the area where the prediction was high probability for the menu item of About
Repeat this concept of integrating the AI models so that the engine can smoothly understand the NLP input, predict where on the screen that the UI/UX to interact with is located, and automatically self-determine what action it should take against the chosen UI/UX according to the NLP instructions

9. Improve your AI testing tool with Collaborative Filtering so that it can make contextual decisions

Similar to defining your NLP training data, create tabular data where the columns say "Text A" has a related score for "Text B"
A higher related score means that "Text A" is closely related to "Text B"
A lower related score means that "Text A" is less relevant to "Text B"
For example, a cell of "Text A" could be "Shopping Cart" and a cell of "Text B" could be "Shopping Bag" and the related score would be a high number meaning they are closely related
Training a Collaborative Filtering model with a data set like this would allow you to predict if a given text prompt has related nearby words or phrases which could be hooked back into your AI testing tool engine to improve UI/UX predictions on where to perform an action
In the previous example, Collaborative Filtering would allow a text prompt to be entered as "Go to the shopping cart", but the engine only finds an icon with label "Shopping Bag" where your engine would subsequently know that those phrases are closely related and result in a high score prediction to interact with that since it could not originally find "Shopping Cart"
This concept can be applied to an unlimited other areas of UI/UX or domain-specific information of any subject in human existence.

10. Perfect your engine with many models

Use web crawling or better queries to get better training data
Use ensembling or embedding techniques to train proficient models
Create your own AI models to train against with larger or more relevant data sets
Refine all your models so predictions are reasonable while avoiding overfitting

11. Build a product ecosystem around your AI testing tool engine

Ship the engine with a way to use it via desktop apps, mobile apps, or websites
Create a web app so that customers can store and organize their AI test cases online
Integrate test results from the engine to your web app
Add more features to the AI testing tool that integrate well with the web app
Create a compelling product showcase website
Develop product offerings such as SaaS and Self-Host licenses
Launch a company with your new AI testing tool

Possible, But Difficult

As defined above, it is clearly possible to build an authentic AI testing tool with today's available technology. The most difficult part from the AI perspective would be aggregating and training enough models so that predictions can be made across all software platforms and all UI/UX patterns within those platforms. Additionally, it would be difficult to aggregate and train NLP models that are needed to correctly predict the test step's intentions from a contextual and actionable perspective. The next hardest part will be developing and fine tuning the AI testing tool engine so that all the models, computer vision, and OS interactions happen in human-like manner. Lastly, what is already a settled science is creating a SaaS product around your new tool and making it a nice experience for your clients to use.

AI Ethics

So if this proposed AI testing tool is so good, couldn't it be maliciously used against software products that the owners did not intend? Yes, that's why to avoid malicious users you can have a product model that makes it not viable for botting on a free-tier and have policies that ensure companies are authorized to use the AI testing tool on their own software and are background checked for authenticity. Additional internal tooling could be created to detect patterns of malicious use of the AI testing tool and a legal clause could be included to immediately terminate the contract of users who are deemed to be using it maliciously.

The Holy Grail

If you've made it this far, you're probably wondering what the benefit of all this effort would be. Market researchers evaluated the software testing market to be valued at ~$52 Billion USD in 2023. It is expected to grow to ~$70 Billion USD by 2030. Imagine how much existing and new software will need to be tested between now and 2030. Now imagine how much money enterprises are spending on test automation tools and skilled professionals. You could legitimately disrupt (buzzword, but this time it's for real) the entire industry by creating an AI testing tool like the one proposed in this article.

That's the benefit. Nearly every existing test automation tool would be rendered useless and significantly inefficient compared to your new AI testing tool that effortlessly turns natural language steps into reliable human-like automated testing. If the SaaS product is made with a high degree of usability and collaboration for enterprises, they will be paying hand over fist for seats on your servers. An AI testing tool of this caliber combined with SaaS would easily surge to billions in sales as every single company that produces software would want to test with your AI testing tool since it is so good. Think about that.

With the publishing of this article, the race is now officially started for the first adventurers to embark on the quest to obtain the holy grail of test automation.

Addendum

I wrote this article in July of 2023 and decided not to publish it because I wanted to get a head start on my own journey on the quest for the holy grail (selfish, I know). At this point, I have already spent a few years in test automation and DevOps so I had gained adept knowledge of the job, tooling, and product landscape.

I promptly began learning FastAI. It is a library and API written on top of PyTorch which, according to them, "simplifies training fast and accurate neural nets using modern best practices." Their Python library and courses are the culmination of 7+ years tireless and unpaid work to ensure AI does not become a mysterious powerful force equipped solely by the few, but rather something that is openly shared and thoroughly understood by the many.

I owe a personal thank you to the renowned Dr. Jeremy Howard, Dr. Rachel Thomas, and the community that they created around FastAI. I highly recommend their free AI courses - even if you just want to learn how AI really works under the hood. (P.S. The deep learning neural network example created entirely in Microsoft Excel absolutely blew my mind).

I completed the Practical Deep Learning course by watching the YouTube videos offline on my phone while hiking in the Rocky Mountains throughout the Autumn (it was amazing). However, it turns out that learning how AI works and starting to code on your own are two very different things (I should have been coding while following the courses instead of hiking - oops).

Eventually, I capitulated my efforts and decided that in the near future someone else would have obtained the holy grail. That day is today. In March of 2024. This new open source AI test automation tool is why I am now publishing my (once secret) idea from half a year ago.

A New Hand Touches The Holy Grail

Skyvern is at the frontier on its own journey to obtain the holy grail of test automation. According to their GitHub repository, which was just open sourced two weeks ago:

"Skyvern automates browser-based workflows using LLMs and computer vision. It provides a simple API endpoint to fully automate manual workflows, replacing brittle or unreliable automation solutions."

While Skyvern does not advertise itself as a test automation tool, I immediately recognized the value it had to disrupt the ~$52B software testing industry.

It appears that Skyvern is still very new, as on their repository they have only checked 2/12 of their proposed items in their feature roadmap. It will be interesting to see this tool develop and grow over time in capabilities. Even more interesting will be the competing tools that arise from this newly established era of authentic AI testing.

Go ahead and read their website and repository as they explain everything in great detail; like how the tasks, steps, and actions flow works - something I did not ideate myself in this article which would be an integral part to the AI testing tool engine. I will be trying this tool out for myself and may write a follow-up article on how it goes.

“The best way to predict the future is to invent it.” Alan Kay

Welcome to the new era of authentic AI test automation.

Quest for the Holy Grail of Test Automation (AI Testing)

Kalyn Coose

Developer III

Preface

AI Deception

Big Flaws

Simple vs. Advanced

High-level Implementation

Trying a Common Scenario

Too Good To Be True?

Technical Implementation

领英推荐

Possible, But Difficult

AI Ethics

The Holy Grail

Addendum

A New Hand Touches The Holy Grail

社区洞察

其他会员也浏览了

Accelerate DevOps with Shift-Left Testing A Smarter Approach to Quality

Leveraging Advanced Automation Tools to Achieve Agility, Error-Free Releases, Resiliency and Immunity

Diving Deeper into API Automation

The Future of Software Testing: AI, Automation, and Beyond

Round-Robin Testing Responsibilities in Multi-Team Healthcare DevOps

The Future of Software Testing: A Vision of AI and TestOps

The 8 Automation Testing Trends You Need to Know for 2025

Optimal Approaches for Implementing Generative AI-Driven Test Automation

Top Software Testing Trends In 2022

The Evolution of Software Testing: The Integration of Machine Learning in Automated Test Case Generation