Enhancing GUI Testing with AI and Squish: A Practical Guide

Enhancing GUI Testing with AI and Squish: A Practical Guide

User interface testing seems simple at first glance. You open the application, click buttons, enter text, and check if everything works as expected. But if it were that easy, there wouldn’t be hundreds of tools for test automation, thousands of professionals specializing in this field, or countless bugs that still manage to slip past even the most attentive eye.

Manual testing is undoubtedly necessary. It allows you to evaluate the product from a real user's perspective, uncover unexpected interaction nuances, and notice rough edges that are difficult to formalize in scripts. However, this approach has obvious drawbacks. It is slow, expensive, and inefficient in terms of scalability. If you have a complex application with numerous screens and states, a single regression test before release can take days or even weeks. And performing this manually over and over again is far from enjoyable.

This is why test automation has become an industry standard. Once a test is written, it can be run repeatedly with minimal effort. But even automation isn’t perfect. Automated tests excel at mechanical verifications—clicking a button, checking results, comparing against a reference. However, sometimes, they overlook issues that are obvious to a human. For example, if the text on a button suddenly turns white on a white background, tests might not detect it—after all, the button exists, is clickable, and technically "works."

This is where artificial intelligence comes into play. What if AI could analyze the interface, detect anomalies, and assist with testing as a human would—only faster and without fatigue? It sounds promising, but it’s not that simple. Fully entrusting testing to AI is not yet feasible—the technology hasn’t reached the required level, and trust in it remains low. However, integrating AI elements into traditional automation to make tests smarter and more flexible is already a practical goal.

In this article, we will explore how AI can be used for GUI testing, why a hybrid approach is the future of automation, and how Squish can help with this.

Introduction to Squish

In the following sections, we will examine examples based on Squish, a tool for automated graphical user interface testing. It is a powerful solution that allows for the creation of flexible test scenarios and supports various platforms and technologies.

Squish supports testing of desktop, mobile, and embedded applications, running on Windows, Linux, and macOS. Additionally, tested applications can be executed not only on these systems but also on iOS, Android, and embedded devices. This wide coverage makes it a universal tool for companies developing cross-platform solutions.

Unlike simple recorders, Squish enables the creation of full-fledged test scenarios with logic and conditions using Python, JavaScript, Perl, Tcl, and Ruby. Python remains the most popular choice due to its flexibility and automation capabilities.

Dynamic AI-driven Test Data Generation in Squish

Squish comes with prebuilt test examples, including a simple "Address Book" application. This application allows adding, editing, and viewing contact records.

Squish Address Book example

Along with it, there are automated tests demonstrating how to automate the verification of the application's functionality. One such test involves populating the address book with data from a TSV file. This is a convenient and straightforward method, but it requires a pre-prepared dataset.

# -*- coding: utf-8 -*-

import names
import os

def main():
    startApplication('"' + os.environ["SQUISH_PREFIX"] + '/examples/qt/addressbook/addressbook"')
    invokeMenuItem("File", "New")
    table = waitForObject({"type": "QTableWidget"})
    test.verify(table.rowCount == 0)
    limit = 10  # To avoid testing 100s of rows since that would be boring
    for row, record in enumerate(testData.dataset("MyAddresses.tsv")):
        forename = testData.field(record, "Forename")
        surname = testData.field(record, "Surname")
        email = testData.field(record, "Email")
        phone = testData.field(record, "Phone")
        table.setCurrentCell(0, 0)  # always insert at the start
        addNameAndAddress((forename, surname, email, phone))
        checkNameAndAddress(table, record)
        if row > limit:
            break
    test.compare(table.rowCount, row + 1, "table contains as many rows as added data")
    closeWithoutSaving()

def invokeMenuItem(menu, item):
    activateItem(waitForObjectItem({"type": "QMenuBar"}, menu))
    snooze(0.2)
    activateItem(waitForObjectItem({'type': 'QMenu', 'title': menu}, item))

def addNameAndAddress(oneNameAndAddress):
    invokeMenuItem("Edit", "Add...")
    type(waitForObject(names.forename_LineEdit), oneNameAndAddress[0])
    type(waitForObject(names.surname_LineEdit), oneNameAndAddress[1])
    type(waitForObject(names.email_LineEdit), oneNameAndAddress[2])
    type(waitForObject(names.phone_LineEdit), oneNameAndAddress[3])
    clickButton(waitForObject(names.address_Book_Add_OK_QPushButton))

def closeWithoutSaving():
    sendEvent("QCloseEvent", waitForObject(names.mainWindow))
    clickButton(waitForObject(names.address_Book_No_QPushButton))

def checkNameAndAddress(table, record):
    for column in range(len(testData.fieldNames(record))):
        test.compare(table.item(0, column).text(),
                     testData.field(record, column))        

This script automatically populates the address book with test data from a TSV file in Squish. It:

  1. Launches the application and creates a new address book.
  2. Reads records from MyAddresses.tsv (first name, last name, email, phone).
  3. Adds them to the application via the UI: opens the form, enters data, confirms.
  4. Verifies that the data was added correctly.
  5. Closes the application without saving.

The main drawback is the fixed dataset. Any changes must be made manually. If we need to test the application's behavior in different languages or check how it handles exotic input data, we have to manually edit the file and add new rows.

But what if we use dynamic test data generation with LLM instead? This approach allows generating new, diverse test records at the press of a button, covering as many scenarios as possible. Let’s explore how this can be implemented in Squish.

Step 1. Installing the OpenAI API

To work with the OpenAI API, you need to install the corresponding package. In Squish, this can be done through the Python interpreter settings.

  1. Open Settings...
  2. Navigate to PyDev -> Interpreters -> Python Interpreter
  3. Click Manage with pip
  4. Install the package by running the following command:

install openai        

Once installed, the OpenAI API can be used in test scripts.

Step 2. Connecting the OpenAI API

Now, let's create an OpenAI client that will send requests and receive generated address records.

from openai import OpenAI

# Configurable model name for test data generation
GEN_MODEL = "gpt-3.5-turbo"

# Create an OpenAI client instance with the API key.
client = OpenAI(api_key="your_api_key_here")        

This initializes an OpenAI client that allows sending requests and receiving responses.

We will use GPT-3.5-turbo instead of a more powerful model like GPT-4 because it is faster and cheaper, making it ideal for generating large amounts of test data efficiently. Since test data generation does not require advanced reasoning capabilities, GPT-3.5-turbo provides an optimal balance between cost, speed, and quality.

Step 3. Creating a Data Generation Request

To generate diverse names, rare languages, and unusual phone numbers while ensuring valid email addresses, we define the following prompt:

TEST_DATA_PROMPT = (
    "Generate {limit} unique test addresses as a JSON array with objects containing: "
    "'Forename', 'Surname', 'Email', and 'Phone'. "
    "Use rare, exotic languages and scripts, ensuring names match the same language. "
    "Include valid corner cases. "
    "Emails should be in English. "
    "Use diverse phone formats, including international, local, and varied spacing. "
    "Ensure output is valid JSON with no extra text."
)        

Modifying this prompt allows fine-tuning the data generation.

Step 4. Sending a Request to the API and Processing the Response

Now, let's create a function that sends a request to OpenAI and retrieves the response.

def askChatGpt(prompt):
    try:
        response = client.chat.completions.create(
            model=GEN_MODEL,
            messages=[
                {"role": "system", "content": "You are an assistant that helps with test automation."},
                {"role": "user", "content": prompt}
            ],
            max_tokens=2048,
            n=1,
            temperature=0.9,
        )
        answer = response.choices[0].message.content.strip()
        return answer
    except Exception as e:
        test.log("Error calling ChatGPT (" + GEN_MODEL + "): " + str(e))
        return ""        

Breakdown of Request Parameters

  • max_tokens=2048 – limits the number of tokens in the response to prevent excessively long or truncated data.
  • n=1 – requests only one response variation since we do not need multiple test sets.
  • temperature=0.9 – controls randomness. At 0.9, the model produces more diverse responses, which is useful for testing.

Step 5. Processing the Response and Extracting JSON

ChatGPT may generate JSON with unnecessary characters, so the data must be cleaned before processing.

def parseApiResponse(rawOutput, apiName):
    test.log(f"Raw output from {apiName} for test data:")
    for line in rawOutput.splitlines():
        test.log(line)

    if not rawOutput:
        test.log(f"No test data generated by {apiName}.")
        return None

    # Remove code fences if present
    cleaned = re.sub(r"^```(?:json)?\s*|```$", "", rawOutput.strip(), flags=re.MULTILINE)
    cleaned = re.sub(r",(\s*[}\]])", r"\1", cleaned)

    try:
        data = json.loads(cleaned)
        return data
    except Exception as e:
        test.log("Error parsing generated JSON: " + str(e))
        test.log(f"{apiName} output that failed to parse: " + cleaned)
        return None        

Step 6. Generating Test Data

Now, let's combine everything into a single function:

def generateTestAddresses(limit):
    prompt = TEST_DATA_PROMPT.format(limit=limit)
    rawOutput = askChatGpt(prompt)
    data = parseApiResponse(rawOutput, "chatgpt")
    return data if data is not None else []        

Step 7. Integration into the Test Scenario

Replace reading from a TSV file with data generation:

def main():
    # ...
    
    limit = 16  # Number of test addresses to generate
    generatedData = generateTestAddresses(limit)
    if not generatedData:
        test.fatal("Failed to generate test data via ChatGPT.")
        return

    # Insert all address entries
    for record in generatedData:
        table.setCurrentCell(0, 0)  # Always insert at the start
        addAddressEntry((
            record.get("Forename", ""),
            record.get("Surname", ""),
            record.get("Email", ""),
            record.get("Phone", "")
        ))

    # ...        

Thanks to the OpenAI API, we have made testing dynamic, eliminating the need to manually edit test data files. Now, tests can verify application functionality in various languages, with different phone number formats and names, while adding new scenarios simply requires modifying the prompt.

We have new contacts from ChatGPT in our Address Book

However, prompts should be authored carefully, as overly aggressive queries for rare and exotic names may result in outputs like this:

{
  "Forename": "????????????",
  "Surname": "??????????????",
  "Email": "[email protected]",
  "Phone": "+1 (123) 456-7890"
}        

And the test revealed that the Address Book cannot properly handle such characters. So, unfortunately, the application is not suitable for users from Ancient Egypt.

Screenshot Verification with GPT-4o-mini

Automated user interface testing involves not only verifying user actions and data states but also analyzing the visual representation of elements. There are several issues that are difficult or impossible to detect using traditional scripted tests, such as:

  • Graphical artifacts – tearing, misalignment, rendering issues.
  • Truncated text – words that do not fit within a UI element.
  • Widget display correctness – for example, ensuring that a profile avatar actually contains a human image.
  • Graphical elements – verifying that charts or graphs truly display data rather than just occupying space on the screen.

These aspects can be partially automated using AI. Of course, AI does not replace manual testing, but it can automatically detect certain visual issues that are difficult to identify with scripts. This approach enables an additional layer of control in automated testing, providing a hybrid solution where traditional scripts are supplemented by AI capabilities.

Let’s explore how screenshot analysis can be automated in Squish using GPT-4o-mini.

Why GPT-4o-mini?

  • Optimized for cost and speed – it provides high accuracy at a lower cost than full-scale GPT-4 models.
  • Sufficient for UI verification – analyzing screenshots requires recognizing structured elements, which GPT-4o-mini handles well.
  • Reduced response randomness – we prioritize consistency in output rather than creativity.

Step 1. Selecting a Model for Screenshot Analysis

Now, let's define the AI models to be used in our automation:

# Configurable model names
GEN_MODEL = "gpt-3.5-turbo"  # Used for text-based test data generation
IMG_MODEL = "gpt-4o-mini"     # Used for screenshot analysis        

Step 2. Capturing a Screenshot of the Tested Element

First, we need to obtain an image of the relevant UI element (e.g., a contact list table) and store it in memory.

def captureScreenshot(objName):
    widget = waitForObject(objName)
    img = object.grabScreenshot(widget, {"delay": 0})
    return img        

This function allows capturing a screenshot of a specific element, such as a table, button, or window.

Step 3. Preparing the Image for Analysis

Since the OpenAI API does not accept raw image objects, we must first convert the captured screenshot to Base64 format.

def saveImageToTempFile(img):
    try:
        tempFile = tempfile.mktemp(suffix=".png")
        img.save(tempFile)
        return tempFile
    except Exception as e:
        test.log("Error saving image to temporary file: " + str(e))
        return None

def encodeImageFileToBase64(filePath):
    try:
        with open(filePath, "rb") as f:
            return base64.b64encode(f.read()).decode("utf-8")
    except Exception as e:
        test.log("Error encoding image from file: " + str(e))
        return ""

def encodeScreenshotToBase64(img):
    tempFile = saveImageToTempFile(img)
    if not tempFile:
        return ""
    base64Image = encodeImageFileToBase64(tempFile)
    try:
        os.remove(tempFile)
    except Exception as e:
        test.log("Error removing temporary file: " + str(e))
    return base64Image        

Step 4. Sending a Request to GPT-4o-mini for Image Analysis

Now that we have the screenshot converted to Base64 format, we can send it to GPT-4o-mini for analysis.

def analyzeScreenshotForMultipleObjects(img, analysis_items):
    base64Image = encodeScreenshotToBase64(img)
    if not base64Image:
        return {}

    keys_list = list(analysis_items.keys())
    keys_str = ", ".join([f'"{k}"' for k in keys_list])

    prompt_lines = [
        "Analyze the provided screenshot (image encoded in Base64) and answer the following questions.",
        "For each of the items below, indicate whether it is present in the image by answering with a boolean (true or false) and a brief explanation.",
        "Return your answer as a JSON object with exactly the following keys:",
        keys_str,
        "Each key should map to an object with two fields:",
        " - 'present': a boolean indicating presence (true if present, false if not).",
        " - 'explanation': a brief explanation for your answer.",
        "Do not include any additional text outside the JSON object."
    ]
    prompt = "\n".join(prompt_lines)

    messages = [
        {
            "role": "user",
            "content": [
                {"type": "text", "text": prompt},
                {"type": "image_url", "image_url": {"url": f"data:image/png;base64,{base64Image}"}}
            ]
        }
    ]

    try:
        response = client.chat.completions.create(
            model=IMG_MODEL,
            messages=messages,
            max_tokens=400,
            n=1,
            temperature=0.1,
        )
        rawOutput = response.choices[0].message.content.strip()
        # Remove code fences if present
        cleaned = re.sub(r"^```(?:json)?\s*|```$", "", rawOutput.strip(), flags=re.MULTILINE)
        data = json.loads(cleaned)
        return data
    except Exception as e:
        test.log(f"Error calling {IMG_MODEL} for multiple objects analysis: " + str(e))
        return {}        

Breakdown of Request Parameters

  • max_tokens=400 – limits the response size to ensure concise JSON output.
  • n=1 – requests only one response variation to maintain consistency.
  • temperature=0.1 – keeps randomness low for predictable and reliable answers.

This function ensures that multiple screenshot characteristics can be analyzed in a single API request, making it efficient avoiding sending the same screenshot for analysis several times.

Step 5. Processing the AI Response

The AI model returns a JSON object indicating the presence of each requested UI element. We need to parse and interpret this response.

def parseBooleanResponse(responseText):
    if not responseText.strip():
        return None
    firstWord = responseText.strip().split()[0].lower()
    firstWord = firstWord.translate(str.maketrans("", "", string.punctuation))
    if firstWord == "true":
        return True
    if firstWord == "false":
        return False
    return None        

This function ensures that only valid boolean responses are considered, eliminating potential parsing errors.

Step 6. Integrating Screenshot Analysis into the Test Scenario

Now, we integrate the automated UI verification into our test script.

def main():
    # ...

    # Capture a screenshot of the table after inserting test data
    img = captureScreenshot(table)

    # Define the analysis items and their expected results.
    analysis_items = {
        "table": "a table",
        "chart": "a chart",
        "rendering_artifacts": "rendering artifacts, errors, or glitches",
        "truncated_text": "text truncation (all text should be fully visible)"
    }
    expected_values = {
        "table": True,   # The screenshot must contain a table
        "chart": False,  # There should be no chart
        "rendering_artifacts": False,  # No graphical glitches should be present
        "truncated_text": False  # Text should be fully visible
    }

    # Request all analyses in one API call
    analysisResults = analyzeScreenshotForMultipleObjects(img, analysis_items)
    test.log("Combined screenshot analysis results:")
    test.log(json.dumps(analysisResults, indent=2))

    # Verify each analysis result against the expected value
    for key, expected in expected_values.items():
        result = analysisResults.get(key)
        if result is not None:
            boolResult = result.get("present")
            explanation = result.get("explanation", "")
            verifyMsg = f"Screenshot should {'contain' if expected else 'not contain'} {analysis_items[key]}."
            test.compare(boolResult, expected, verifyMsg)
            test.log(f"{key} analysis: {explanation}")
        else:
            test.fatal(f"Analysis result for {key} not found.")

    # ...        

Here is an expected output:

Log:   GPT-4o-mini analysis (table): true, the screenshot contains a table with columns labeled "Forename," "Surname," "Email," and "Phone," and it lists multiple entries in a structured format.

Log:   Verification point: The screenshot contains a table.


Log:   GPT-4o-mini analysis (chart): false, the screenshot contains a table with names, emails, and phone numbers, but it does not contain a chart.

Log:   Verification point: The screenshot does not contain a chart.


Log:   GPT-4o-mini analysis (rendering artifacts): false, the screenshot appears to display a table with names, emails, and phone numbers in various scripts without any visible rendering artifacts. The text is clear and properly aligned.

Log:   Verification point: The screenshot does not contain rendering artifacts, errors, or glitches.


Log:   GPT-4o-mini analysis (truncated text): true, the text in the screenshot appears to be truncated, as indicated by the ellipses ("...") at the end of several entries in the Email and Phone columns, suggesting that the full information is not displayed.

Fatal: Verification point failed: The screenshot contains text truncation (all text should be fully visible).        

If the same image needs to be analyzed for multiple characteristics (as in our case), the API request should be optimized and made combinatorial. This approach reduces the number of API calls, lowers costs, and decreases test execution time.

GPT-4o-mini allows automatic screenshot analysis during tests, helping to detect graphical defects, truncated text, and other UI issues.

In the future, it will be possible to verify animations and videos, as well as detect specific elements on the screen, expanding the capabilities of automated testing.

Full source code of a test

Video

Using Local AI Models for Text and Image Analysis

Using cloud-based AI services such as OpenAI provides powerful tools for generating test data and analyzing screenshots. However, it is not always convenient or possible to send data to external services. The reasons may vary: strict security requirements, restrictions on data transfer to the cloud, or the need to work without reliance on third-party APIs. In such cases, locally deployable models come to the rescue, allowing the same tasks to be performed without an internet connection.

One of the convenient tools for working with local models is Ollama. This software enables easy loading and running of neural network models directly on your computer or server. Ollama supports various language and multimodal models, including Llama 3.2 for text processing and LLaVA for image analysis.

Overview of Local Models

Llama 3.2 is a language model developed by Meta, optimized for local system operation. It is weaker than GPT-4o but powerful enough for many tasks, including test data generation, natural language processing, and writing test scripts. It can be used for locally creating test input data, generating diverse text cases, and analyzing logs.

LLaVA is a multimodal model capable of working with images, such as analyzing screenshots and detecting specific elements within them. While it is less accurate and flexible than GPT-4o, it can be used for basic UI analysis, and verifying the presence of elements on the screen.

Installing and Running Ollama Locally

To get started with Ollama, install it on your system:

  • macOS & Linux:

curl -fsSL https://ollama.com/install.sh | sh        

Once installed, you can quickly test it via the command line. For example, to load and chat with Llama 3.2:

ollama run llama3.2        

To process images using LLaVA:

ollama run llava        

Using Ollama with Squish

I have also integrated Ollama models usage into a Squish script. To use them, install the ollama-python Python package within Squish as described earlier.

The integration follows the same principles as for OpenAI models. The quality of test data generation is satisfactory when using llama3.2:3b. For better results, more powerful models can be considered. However, the quality of screen analysis with LLaVA:7b and LLaVA:13b is insufficient. While these models generate coherent textual responses, they often misinterpret images. Due to hardware limitations, I was unable to test LLaVA:34b.

If you need local AI-based test data generation, llama 3.2 and other local models can be a viable solution. For UI verification tasks, however, OpenAI's models currently remain the better choice.

Conclusion and Future Prospects

AI in GUI testing is not a replacement for traditional methods but a powerful complement. The optimal approach is a hybrid model: using proven automated tests in combination with AI for generating input data and analyzing the UI. This approach deepens testing and helps uncover issues that were previously inaccessible to scripts.

The development of AI in testing will move toward improving multimodal models, which will be able to analyze not only screenshots but also videos, tracking animations and element interactions. Specialized local models may emerge, allowing AI to be more precisely adapted to GUI testing tasks. Squish is already prepared for AI integration, thanks to its support for full scripting languages and flexibility in working with external APIs. This makes it easy to incorporate AI tools, such as test data generation or UI analysis, directly into existing automation scenarios.

Which AI methods are you already using in testing? Do you see potential for expanding their use in your projects? Share your experiences and discuss ideas—after all, knowledge exchange is key to finding the best solutions.

Vignesh M

| Automotive, Aerospace & Defence, Med-Tech, Telecommunications Software solutions | Embedded Development & Testing regulatory Licensing | SaaS ? PaaS ? LaaS | Bridging Innovation & Business |

1 周

Amazing to see such examples:!!

Squish is AI ready, nice example to try out ?? Great work Artem.

Nice work Artem! Very didactic, definitely a must read/must try!

Awesome! Squish is a tool I use, and there are many scenarios where I thought an LLM could be of great help there, especially when defining test cases adapted from requirements.

Peter Schneider

Product Management | AI | SaaS | Open Source

1 个月

Very nice! Brilliant use of existing capabilities.

要查看或添加评论,请登录

Artem Kulyk的更多文章

社区洞察

其他会员也浏览了