登录查看更多内容

Retrieval-Augmented Generation (RAG) for Quality Engineering: A Practical Guide

Moulinath Chakrabarty

AI-Powered Software Engineering | Generative AI, Responsible AI & Self-Healing AI | Insurance | Writer

发布日期: 2025年3月9日

Why Quality Engineering Needs RAG

Imagine you have managed to find the recipe of a fancy-looking pudding. You go to the grocery store and gather even fancier ingredients. You come back and make what you feel would change the world. Then, you taste it and YUCK. In other words, you have managed to hit up the perfect recipe for disaster and messed up the real proof of the pudding. Culinary delight, customer delight- you get the drift.

Similarly, Quality is the heart of software engineering. You can do your fancy programming and bring the best of AI, however all that will fall flat if you have not hit the Quality mark. Quality Engineering (QE) within the construct of Software Engineering is rapidly evolving and embracing AI. However, as our friendly neighborhood hero believes- “With Great Power Comes Great Responsibility.’ Generative AI can transform the way you ensure Quality; however, you also owe yourself and your customer a guardrail against hallucinations, outdated information, and context accuracy.

This is where you have an option to consider Retrieval-Augmented Generation (RAG). You can make use of your institutional data to build a custom RAG and then integrate retrieval mechanism and build inferences- all with the intent and pathway to produce a set of outcomes that is contextual to you.

Typical engineering outcomes such as reducing manual effort in test generation, defect triaging, and root cause analysis.
Outcomes contextualized through up-to-date insights into application performance and defect patterns.
Quality framework continuously recalibrated based on application behaviors and user feedback.

How Can RAG Help in Quality Engineering?

As a starting point towards making a RAG-strong QE framework, let us look at the points of impact and how a custom RAG can elevate the outcomes:

Requirements Strengthening: RAG can be used to make use of historical defect data as well as user stories, to identify hot spots and have more targeted risk management strategies. As an example, you can use Jira to build a custom RAG and generate these types of actionable insights. The sample framework for contextual retrieval using Jira can be extended to build such a RAG that can be integrated with a Generative AI solution for the insight generation.
Automated Test Generation: RAG can be used to automatically generate new test cases by retrieving relevant past test cases based on current application changes or defect history. For example, when new code is added to a project, RAG can query past test cases from a vector database like Pinecone or FAISS to identify similar scenarios and generate new test scripts. The architecture can involve setting up a Knowledge Base from Jira data, implementing a retrieval model using FAISS, integrate OpenAI GPT (can use something else as well), and generate new test cases that are contextual with past data and yet addressing new requirements.

The flowchart can look like this:

Flowchart Steps:

Code Change Detection: Trigger: New code added to the application or changes in the codebase. Input: Code repository, commit history.
Jira Data Retrieval (Knowledge Base Setup): Action: Retrieve relevant historical data from Jira (past defects, tickets, test cases, etc.). Tool: Jira API.
Data Vectorization (FAISS/Pinecone): Action: Convert the retrieved Jira data (test cases, defects, application changes) into vector representations using a model like FAISS or Pinecone. Tool: FAISS/Pinecone for efficient vector database storage and retrieval.
Retrieval of Relevant Past Test Cases: Action: Query the vector database for past test cases relevant to the current code changes. Input: Vectorized data (code changes or defect history). Output: A list of past test cases and scenarios that are contextually relevant.
Contextualization via GPT (or other models): Action: Use OpenAI GPT (or an alternative model) to refine the retrieved test cases and adjust them based on new application requirements, context, and recent changes. Input: Retrieved test cases and contextual data from code changes. Output: New, refined, and contextualized test cases.
Test Case Generation: Action: Generate new test cases by combining past scenarios with new application requirements and defect histories. Output: Automatically generated test scripts that are tailored to the current needs.
Test Execution & Feedback Loop: Action: Execute the generated test cases on the application. Output: Test results (pass/fail). Feedback: If necessary, improve the model with feedback from failed tests or additional changes.

Defect Triaging & Root Cause Analysis (RCA): When a defect is reported, RAG can retrieve similar defects from historical databases and automatically suggest potential causes or resolutions based on previous findings. For instance, by integrating Haystack, RAG can fetch test logs, error patterns, and past resolutions, speeding up the debugging process.

Example Flow Summary

Code Change Detection: Detect new code changes via Git.
Jira Data Retrieval: Query Jira API for historical defects related to the changed code.
Data Vectorization: Use FAISS/Pinecone to index historical Jira data into vector embeddings.
Retrieval of Relevant Test Cases: Query FAISS/Pinecone database for relevant past test cases or defects based on code changes.
Contextualization via GPT: Refine the test cases using GPT with context from new application changes.
Test Case Generation: Generate tailored test scripts based on refined test cases and new requirements.
Test Execution & Feedback Loop: Execute tests, analyze results, and update the knowledge base for continuous improvement.

Test Data Generation:

RAG can be used for test data generation in three phases:

Retrieval phase- A Knowledge Base of test data can be created that stores previous test data for test scenarios, requirements, Edge cases etc. When there is a need for new test data for testing a new feature, the system can retrieve the most relevant test data.

Generation phase- Generative AI can be used to produce new test data by extrapolating or modifying the retrieved data. The model can also combine multiple sources of test data to create a more comprehensive test set.

Continuous Learning: Over time, the RAG system will learn from common defect patterns or business rules to ensure the syntactical correctness of the test data as well as relevance.

The flowchart could look like this:

1.????? Set up Knowledge Base from test case repository, defect data, test data examples and put in a vectorize database like Elasticsearch or Pinecone

2.????? Retrieval from Knowledge Base using a requirement as input and using techniques like Maximum Marginal Relevance, similarity scoring

3.????? Preprocess retrieved data to ensure cleansing

4.????? Generation phase using OpenAI or other GPT models

5.????? Post-processing, and merger of retrieved and generated data

Choosing Between Public LLMs and Private LLMs for RAG in QE

One of the key decisions when implementing RAG in QE is whether to use public LLMs (like OpenAI’s GPT models) or build private, customized LLMs. The perspectives to consider are:

Public LLMs (e.g., GPT-4, Google Gemini)

Advantages:

Quick Deployment
Cost-Efficient (need to be watching out for optimal access pattern)
Generalized Knowledge

Disadvantages:

Lack of Specificity
Hallucinations

Private LLMs (e.g., Llama, GPT-NeoX, Mistral)

Advantages:

Custom models
Control
Performance Optimization

Disadvantages:

Resource Intensive
Longer Development Cycle

Evaluation Criteria:

Data Availability: If you have large volumes of high-quality domain-specific data, building a private LLM using custom RAG may provide better results.
Cost & Resources: Public LLMs are easier and more cost-effective for initial implementation, but private models offer long-term value for highly specific tasks.
Security & Privacy: For sensitive or proprietary data, private LLMs provide better data control and confidentiality.

Conclusion: Empowering Quality Engineering with RAG

RAG can transform Quality Engineering by harnessing the power of institutional data. By seamlessly integrating historical insights with innovative AI technologies, RAG can drive efficiency based on organizational knowledge.

Whether you choose to deploy public LLMs for quick wins or invest in a tailored private LLM for long-term scalability, you can drive the most optimal benefits by making the best possible use of your data. You will be able to use the power of AI to continuously learn and recalibrate, while always staying relevant with your organizational context.

In a world where quality is the ultimate differentiator, integrating RAG into your QE pipeline is not just a technological advantage; it can be a strategic imperative. By combining historical data, generative AI, and continuous learning, you can ensure that the proof of the pudding is always to your customer's delight.

What Ho, AI

537 位关注者

Egberdien van der Peijl

Kennis extraheren aan AI modellen is mijn uitdaging

1 周

Would RAG be sensitive to (tautological) self-reference? (Just curious )

1 次回应

Ashis Prasad ?

1 周

Great read !!

1 次回应

查看更多评论

要查看或添加评论，请登录

Moulinath Chakrabarty的更多文章

Generative AI vs. Agentic AI: Crafting a Nuanced Software Engineering Ecosystem

2025年3月16日

Generative AI vs. Agentic AI: Crafting a Nuanced Software Engineering Ecosystem

Not too long ago, we started getting awed at the ability of Generative AI to generate code, automate documentation, and…

6 条评论
5 Truths About Debugging With AI (That No One Tells You)

2025年3月1日

5 Truths About Debugging With AI (That No One Tells You)

AI-powered debugging sounds amazing—being able to zero in on root causes, suggest fixes, and reduce manual effort. But,…

1 条评论
AI-Powered Unit Testing & Code Review: A Pragmatic Evaluation Framework for CIOs

2025年2月19日

AI-Powered Unit Testing & Code Review: A Pragmatic Evaluation Framework for CIOs

Introduction Artificial Intelligence is rapidly transforming software engineering, particularly in unit test…

2 条评论
The Shifting Sands of AI in Software Engineering: A 2023 vs. 2024 Gartner Hype Cycle Analysis

2025年2月16日

The Shifting Sands of AI in Software Engineering: A 2023 vs. 2024 Gartner Hype Cycle Analysis

Generative AI in software engineering is evolving at breakneck speed, and Gartner’s Hype Cycle for 2023 and 2024 show…

1 条评论
AI in Insurance Software Engineering: From Overconfident Intern to Self-Correcting Genius

2025年2月14日

AI in Insurance Software Engineering: From Overconfident Intern to Self-Correcting Genius

Booster & Peeves on AI Self-Healing in Insurance Software Engineering “Sir, I took the liberty of reviewing your…

1 条评论
Peeves peeved with AI

2025年2月13日

Peeves peeved with AI

Booster: Peeves is usually a jolly old egg. This morning, I heard him being utterly grumpily pugnacious in the kitchen…
Breakfast with Peeves: On AI, Software, and the Delicate Art of Not Breaking Everything

2025年2月12日

Breakfast with Peeves: On AI, Software, and the Delicate Art of Not Breaking Everything

It was a crisp morning, the sort that inspires one to make bold declarations over toast and eggs. I had just finished…

5 条评论
Generative AI in Software Engineering Needs A Thoughtful Evaluation Approach

2025年2月10日

Generative AI in Software Engineering Needs A Thoughtful Evaluation Approach

CIOs are bringing AI into software engineering frameworks to drive efficiency, improve quality, and, ultimately, attain…

4 条评论
Generative AI in Software Engineering: A Story By Itself

2025年1月31日

Generative AI in Software Engineering: A Story By Itself

Implementing Generative AI to make software engineering more evolved, sounds simpler than say, trying to transform how…

1 条评论
Generative AI in Insurance: Readability assessment of Insurance contracts

2024年1月23日

Generative AI in Insurance: Readability assessment of Insurance contracts

In the realm of insurance contracts, there is always strife between simplicity and legalese. Wading through complex…

2 条评论

See all articles

What Ho, AI

537 位关注者

Moulinath Chakrabarty的更多文章

Generative AI vs. Agentic AI: Crafting a Nuanced Software Engineering Ecosystem

5 Truths About Debugging With AI (That No One Tells You)

AI-Powered Unit Testing & Code Review: A Pragmatic Evaluation Framework for CIOs

The Shifting Sands of AI in Software Engineering: A 2023 vs. 2024 Gartner Hype Cycle Analysis

AI in Insurance Software Engineering: From Overconfident Intern to Self-Correcting Genius

Peeves peeved with AI

Breakfast with Peeves: On AI, Software, and the Delicate Art of Not Breaking Everything

Generative AI in Software Engineering Needs A Thoughtful Evaluation Approach

Generative AI in Software Engineering: A Story By Itself

Generative AI in Insurance: Readability assessment of Insurance contracts

社区洞察