登录查看更多内容

Diff Based Test Case Selection and Prioritization for Sanity Runs using MongoDB

Parmesh Ashwath

Senior Software Engineer (SDE III) at Amazon

发布日期: 2019年6月28日

Test Driven Development and Agile methodologies are adapted in software development to achieve faster releases and reduce the Time-to-market of the product. In this approach, it is utmost important that we don’t spend much time in Sanity Testing during continuous deployments but at the same time, the quality of the product should not be compromised.

During the Sanity Runs it is very significant to identify the diff changes and then executing only those test cases which has a strong association with that diff because the complete test suites cannot be executed due to the timing and resource constraints. Currently, Random selection of test scripts is done. Or they are selected based on the SME knowledge.

Below is an automated and simple Test Case Selection and Prioritization approach using MongoDB which can help in picking the right set of the test cases for sanity runs.

Data pre-requisite

During the complete regression test runs, we need to capture some additional details which will later help in the test case prioritization. This data can be stored in any database, here we have used MongoDB for the same due to its aggregation support. Below is the sample document that gets stored

{

    "_id" : ObjectId("5cfe17690820e75826298fbe"),

    "platform" : "",

     "product" : "",

    "file_name" : "pcc_pcrpt.c",

    "function_name" : "pcc_pcrpt_test_memb_del",

    "test_script" : "qrrf/edr/pce_pcep_2.py -v",

    "coverage" : 0.86,

    "covered_lines" :[101,103,105,106,107,108,109,114,116,117,118,119],

    "defined_at_line" : NumberInt(101),

    "end_line" : NumberInt(120),

    "uncovered_lines" : [ 110,111]

}

file_name, function_name and test_script mapping will be available during the Nightly or the regression runs.

platform and product fields can be used as filters to limit the results later.

The important fields here are covered_lines and uncovered_lines. This can be calculated by simple parsing of the HTML coverage reports that get generated in most of the testing frameworks (Example gcov report files for .c source files). Using this coverage can be computed using

coverage = covered_lines/(covered_lines+uncovered_lines)

Also, defined_at_lines and end_lines can be either computed using the regex parser on the source files or in some cases this information will be readily available in the coverage XML files.

In addition to the above fields, we can also capture the execution time information which can be used later for adding the constraints

Our Approach:

When a code is committed to a version control system like GIT, we can easily find the diff of that commit and then use a diff-parser to find all the files and the functions that have changed. As a next step, we need to find the test scripts that cover most of the lines in all those functions.

The function coverage value of a test script alone cannot be used to prioritize it because there could be scripts with very low coverage, but they might cover some branches which might not be covered by any other scripts. This is the reason for storing the covered lines information also in our database.

We mainly use set intersection($setIntersection) operator provided by MongoDB to select our test scripts.

Steps:

For a given file and function, find the lines_to_be_covered list which contains the lines that are yet to be covered in this function, initially this will the line numbers starting from defined_lines field till end_line.
Run the set intersection operator against the covered_lines field in our DB and the lines_to_be_covered. And also add a field called weight which will be the size of the intersected array.
Sort the results based on the new field weight and pick the test script with the highest weight
Take the set difference of the lines_to_be_covered and the covered_lines of the selected script to get the lines that are yet to be covered.
Using the newly computed lines_to_be_covered in the above step, repeat the steps 2-4 until there are no lines that get covered newly.
Optionally, capture other scripts which cover the given file and function, but give its weight as 0 since it is not covering any new lines

from pymongo import *

client = MongoClient()
db = client.database_name

def get_scripts(collection_name,component,filename,functionname):
  
  dataFound = False

  document = db[collection_name].find_one({'file_name':filename,'function_name':functionname})
  if document==None:
    return ([],dataFound)

  dataFound = True
  print("Record Found")
  startline = document['defined_at_line']
  endline = document['end_line']
  


  lines_to_be_covered = [x for x in range(startline,endline+1)]


  test_scripts = []
  processed_ids = []
  stop_iter = False
  counter = 0
  while(not stop_iter):
    counter += 1
    print("counter"+str(counter))
    
    results = db[collection_name].aggregate([
                {'$match':{'file_name':filename,'function_name':functionname}},
                {
                      "$addFields": {
                          "weight": {
                              "$size": { "$setIntersection": [ "$covered_lines", lines_to_be_covered ] }
                          }
                      }
                  },

                  { "$sort": { "weight": -1 } }

                ])

    data = None
    for res in results:
      data = res
      break

    if data !=None and data['weight']!=0:
      lines_to_be_covered = list(set(lines_to_be_covered)-set(data['covered_lines']))
      test_scripts.append({'filename':filename,'functionname':functionname,'script':data['test_script'],'weight':data['weight']})
      processed_ids.append(data['_id'])
    else:
      stop_iter = True

  ## Capture other scripts which covers the given file and function , but give its weight as 0 since it is not covering any new lines

  for doc in db[collection_name].find({'file_name':filename,'function_name':functionname,'_id':{'$nin':processed_ids},'coverage':{'$ne':0}}):
    test_scripts.append({'filename':filename,'functionname':functionname,'script':doc['test_script'],'weight':0})



  print(len(test_scripts))
  return (test_scripts,dataFound)

After this, we can aggregate the weights of test scripts and prioritize it. The number of test scripts that get executed depends upon the resources available. Usually, we can pick the top 5 scripts if the diff changes are less.

Next Steps:

Consider the execution time of the test scripts so that Time-limited Test Case Prioritization can also be performed
Reinforcement Learning for Test Case Prioritization instead of set intersection method can be tried.
Use the call graph to get information about the dependencies of the function being changed and test the impacted functions also

要查看或添加评论，请登录

Parmesh Ashwath的更多文章

Real-Time Quality Assurance for Enterprise Application – Powered By AI

2019年8月5日

Real-Time Quality Assurance for Enterprise Application – Powered By AI

Today’s standard enterprise applications are complex and mostly driven by the microservice architecture. Even in the…
SvarNow - API Monitoring

2019年7月10日

SvarNow - API Monitoring

API Monitoring and defining the business rules based on the Health Check and also on the API response are key…
Simple and Efficient Approach for Knowledge Graph Mining from Text

2019年5月31日

Simple and Efficient Approach for Knowledge Graph Mining from Text

Knowledge graphs (KG) is an efficient mechanism of storing and leveraging data, its structure allows people and…

2 条评论

Diff Based Test Case Selection and Prioritization for Sanity Runs using MongoDB

Parmesh Ashwath

Senior Software Engineer (SDE III) at Amazon

Data pre-requisite

Our Approach:

Next Steps:

Parmesh Ashwath的更多文章

社区洞察

其他会员也浏览了

Agile SSIS series (12/12): Future of FLOGS

30 Best CI/CD Tools For 2023

A Friendly Guide to Key Technical Concepts

Infrastructure as code Automation

Agility and Deployment-Driven Development - the DevOps Phenom

CI-CD on steroids

YAML Demystified: A Comprehensive Guide for DevOps Success

Enhanced Pipelines: Fast, Reliable Workflow Auto-Deploy

It Works on my Machine

My 2 cents about UML

Data pre-requisite

Our Approach:

Next Steps:

Parmesh Ashwath的更多文章

Real-Time Quality Assurance for Enterprise Application – Powered By AI

SvarNow - API Monitoring

Simple and Efficient Approach for Knowledge Graph Mining from Text

社区洞察

其他会员也浏览了

Agile SSIS series (12/12): Future of FLOGS

30 Best CI/CD Tools For 2023

A Friendly Guide to Key Technical Concepts

Infrastructure as code Automation

Agility and Deployment-Driven Development - the DevOps Phenom

CI-CD on steroids

YAML Demystified: A Comprehensive Guide for DevOps Success

Enhanced Pipelines: Fast, Reliable Workflow Auto-Deploy

It Works on my Machine

My 2 cents about UML