Minimizing the scope of running unit tests in a mono repo

Minimizing the scope of running unit tests in a mono repo

Context

Running units tests in mono repo can be time consuming if not done properly. For instance, if we make a change to config file or a script file, ideally there is no need to run unit tests as there has been no change in source files. Because in such cases the job could be skipped and achieve little saving on resources (pipeline runner, time). In Similar way if a particular sub-module is changed, then there is no need to run the entire unit test suit as there shouldn't be any change in functionality in unchanged modules. In this case the scope should be limited to the module that is changed and the modules which are dependent on this. This is an attempt to achieve such optimizations.

Mechanism

Gitlab CI allows to add a rule when to trigger a job based on type of file changed. For example, below job will only run when java or kt files are changed

rules:
    - if: '$PIPELINE_TYPE == "ci_build" && $CI_PIPELINE_SOURCE != "merge_request_event"'
      changes:
        - "/*.java"
        - "/*.kt"
      when: always        

This above snippet is useful. In this case the job will be skipped if there is no change to any source file. But now lets we have a mono repo containing 5 modules A to E. Module A is imported in all other 4. Module B is imported in C and D.

Following should be the expected behavior for unit tests based on given scenario as .

  • If something is changed in A, unit tests of all other modules including A should run.
  • If module E is changed, only its unit tests should run.
  • If B is changed, unit tests in C and D should run as well.
  • Change in C or D will run only their respective unit tests.

Unit test dependency graph

High level steps

  1. Detect file changes in first level modules. Say we added some changed in module B. Lets call it result list which has B only.
  2. Next iterate over changed modules list and find all other modules in the project where changed modules are imported and add those dependent modules to the result list.
  3. Once we have the exhaustive list, the final unit test command is created by appending ":test " to each module and the final command is obtained.
  4. Pass that command to gradle wrapper for execution. This way only changed and dependent modules' unit tests are run.

Lets go through each step in details.

Detect relevant changed modules

In this step, execute a git diff between feature branch and target branch and get the list of files changed in a output file.

script:
    - git checkout <master-branch>
    - git checkout $CI_COMMIT_REF_NAME
    - git diff --name-only $CI_COMMIT_REF_NAME $(git merge-base $CI_COMMIT_REF_NAME <master-branch>) > changed_files.txt
    - python scripts/python/git_changeset.py
    - source .env        

Once we find all the changed files, a python script is invoked to find the corresponding changed modules. However not all modules are applicable for unit tests as they may be appearing due to changes in config or resource files and hence need to be filtered out. Modules are fetched on the basis of the file extensions specified in the pattern_set variable mentioned in the code snippet below.

def get_changed_modules():
    with open("changed_files.txt",'r',encoding = "utf-8") as f:
        lines=f.readlines()
    changeset=[]
    pattern_set = {'.java', '.kt', '.proto', '.gradle', <any-other-non- file type>}
    for line in lines:
        folder = line.strip("\n").split("/")
        filename = folder[-1]
        ext = os.path.splitext(filename)[-1]
        if ext in pattern_set:
            changeset.append(folder[0])
    cmd = ""
    for module_name in set(changeset):
        cmd = cmd + module_name +","
    print("Modules changed for unit test ", cmd)
    return cmd

if name == "main":
    with open('.env', 'a') as writer:
        writer.write(f'CHANGED_MODULES={get_changed_modules()}')	        

Above snippet fetches the relevant changed modules and set environment property to be read in later stages

Find dependent modules of changed modules

In this step, we will write a gradle task which takes a list of module names and return a list of dependent modules.

task buildReverseDependencies {
    if(project.hasProperty('changedModules')){
        String changedModules = project.property('changedModules')
        println("changedModules : " +  changedModules.trim().split(","))
        doLast {
            String cmd = " "
            if(changedModules.contains("build.gradle")){
                cmd =  "test"
            }
            else if (!changedModules.trim().isEmpty()) {
                def items = changedModules.trim().split(",")
                def modulesUsingTarget = [] as Set
                items.each { moduleName ->
                    project.subprojects.each { component ->
                        def componentGradleFile = component.file("${component.name}.gradle")
                        if (componentGradleFile.exists()) {
                            def fileContents = componentGradleFile.text
                            def lines = fileContents.split('\n')
                            def inMultiLineComment = false
                            if (component.name == moduleName) {
                                modulesUsingTarget.add(moduleName)
                            }
                            lines.each { line ->
                                if (!inMultiLineComment) {
                                    if (line.contains(moduleName) && !line.contains('/') && !line.trim().startsWith('//')) {
                                        modulesUsingTarget.add(component.name)
                                    }
                                }
                                if (line.contains('/')) {
                                    inMultiLineComment = true
                                }
                                if (line.contains('*/')) {
                                    inMultiLineComment = false
                                }
                            }
                        }
                    }
                }
                for (String i in modulesUsingTarget) {
                    cmd = cmd + i + ":test "
                }
            }
            println("final_unit_test_command=" + cmd)
        }
    }
}        

In the above code snippet, "${component.name}.gradle" represents the build.gradle of that module in mono repo.

Trigger the unit test command

Below snippet invokes the gradle task above to construct the final unit test command and pass the command to gradle wrapper to execute tests.

- unit_test_cmd=$(./gradlew -q buildReverseDependencies -PchangedModules=${CHANGED_MODULES} | grep "final_unit_test_command=" | cut -d '=' -f2)
    - echo unit test command $unit_test_cmd
    - ./gradlew $unit_test_cmd;        

Complete script

Here is the complete script of unit_test job in gitlab-ci.yaml file

unit_test:
  only:
    variables:
      - $PIPELINE_TYPE == "ci_build"
  script:
    - git checkout <master-branch>
    - git checkout $CI_COMMIT_REF_NAME
    - git diff --name-only $CI_COMMIT_REF_NAME $(git merge-base $CI_COMMIT_REF_NAME <master-branch>) > changed_files.txt
    - python scripts/python/git_changeset.py
    - source .env
    - unit_test_cmd=$(./gradlew -q buildReverseDependencies -PchangedModules=${CHANGED_MODULES} | grep "final_unit_test_command=" | cut -d '=' -f2)
    - echo unit test command $unit_test_cmd
    - ./gradlew $unit_test_cmd;
  only:
    variables:
      - $PIPELINE_TYPE == "ci_build"        

Final Thoughts

This was a combined attempt to optimize unit test job which started over a casual talk with my colleagues. We gave it little more thought and tried very na?ve way of doing it. If you have any feedback or suggestions for improvement, feel free to leave a comment.

Credits

Thanks to Tarun Jain Sandhya Akshat Gupta and Jimmy Parija for their contributions. Happy learning !










要查看或添加评论,请登录

Surinder Kumar Mehra的更多文章

社区洞察

其他会员也浏览了