Minimizing the scope of running unit tests in a mono repo
Context
Running units tests in mono repo can be time consuming if not done properly. For instance, if we make a change to config file or a script file, ideally there is no need to run unit tests as there has been no change in source files. Because in such cases the job could be skipped and achieve little saving on resources (pipeline runner, time). In Similar way if a particular sub-module is changed, then there is no need to run the entire unit test suit as there shouldn't be any change in functionality in unchanged modules. In this case the scope should be limited to the module that is changed and the modules which are dependent on this. This is an attempt to achieve such optimizations.
Mechanism
Gitlab CI allows to add a rule when to trigger a job based on type of file changed. For example, below job will only run when java or kt files are changed
rules:
- if: '$PIPELINE_TYPE == "ci_build" && $CI_PIPELINE_SOURCE != "merge_request_event"'
changes:
- "/*.java"
- "/*.kt"
when: always
This above snippet is useful. In this case the job will be skipped if there is no change to any source file. But now lets we have a mono repo containing 5 modules A to E. Module A is imported in all other 4. Module B is imported in C and D.
Following should be the expected behavior for unit tests based on given scenario as .
High level steps
Lets go through each step in details.
Detect relevant changed modules
In this step, execute a git diff between feature branch and target branch and get the list of files changed in a output file.
script:
- git checkout <master-branch>
- git checkout $CI_COMMIT_REF_NAME
- git diff --name-only $CI_COMMIT_REF_NAME $(git merge-base $CI_COMMIT_REF_NAME <master-branch>) > changed_files.txt
- python scripts/python/git_changeset.py
- source .env
Once we find all the changed files, a python script is invoked to find the corresponding changed modules. However not all modules are applicable for unit tests as they may be appearing due to changes in config or resource files and hence need to be filtered out. Modules are fetched on the basis of the file extensions specified in the pattern_set variable mentioned in the code snippet below.
def get_changed_modules():
with open("changed_files.txt",'r',encoding = "utf-8") as f:
lines=f.readlines()
changeset=[]
pattern_set = {'.java', '.kt', '.proto', '.gradle', <any-other-non- file type>}
for line in lines:
folder = line.strip("\n").split("/")
filename = folder[-1]
ext = os.path.splitext(filename)[-1]
if ext in pattern_set:
changeset.append(folder[0])
cmd = ""
for module_name in set(changeset):
cmd = cmd + module_name +","
print("Modules changed for unit test ", cmd)
return cmd
if name == "main":
with open('.env', 'a') as writer:
writer.write(f'CHANGED_MODULES={get_changed_modules()}')
Above snippet fetches the relevant changed modules and set environment property to be read in later stages
Find dependent modules of changed modules
In this step, we will write a gradle task which takes a list of module names and return a list of dependent modules.
task buildReverseDependencies {
if(project.hasProperty('changedModules')){
String changedModules = project.property('changedModules')
println("changedModules : " + changedModules.trim().split(","))
doLast {
String cmd = " "
if(changedModules.contains("build.gradle")){
cmd = "test"
}
else if (!changedModules.trim().isEmpty()) {
def items = changedModules.trim().split(",")
def modulesUsingTarget = [] as Set
items.each { moduleName ->
project.subprojects.each { component ->
def componentGradleFile = component.file("${component.name}.gradle")
if (componentGradleFile.exists()) {
def fileContents = componentGradleFile.text
def lines = fileContents.split('\n')
def inMultiLineComment = false
if (component.name == moduleName) {
modulesUsingTarget.add(moduleName)
}
lines.each { line ->
if (!inMultiLineComment) {
if (line.contains(moduleName) && !line.contains('/') && !line.trim().startsWith('//')) {
modulesUsingTarget.add(component.name)
}
}
if (line.contains('/')) {
inMultiLineComment = true
}
if (line.contains('*/')) {
inMultiLineComment = false
}
}
}
}
}
for (String i in modulesUsingTarget) {
cmd = cmd + i + ":test "
}
}
println("final_unit_test_command=" + cmd)
}
}
}
In the above code snippet, "${component.name}.gradle" represents the build.gradle of that module in mono repo.
Trigger the unit test command
Below snippet invokes the gradle task above to construct the final unit test command and pass the command to gradle wrapper to execute tests.
- unit_test_cmd=$(./gradlew -q buildReverseDependencies -PchangedModules=${CHANGED_MODULES} | grep "final_unit_test_command=" | cut -d '=' -f2)
- echo unit test command $unit_test_cmd
- ./gradlew $unit_test_cmd;
Complete script
Here is the complete script of unit_test job in gitlab-ci.yaml file
unit_test:
only:
variables:
- $PIPELINE_TYPE == "ci_build"
script:
- git checkout <master-branch>
- git checkout $CI_COMMIT_REF_NAME
- git diff --name-only $CI_COMMIT_REF_NAME $(git merge-base $CI_COMMIT_REF_NAME <master-branch>) > changed_files.txt
- python scripts/python/git_changeset.py
- source .env
- unit_test_cmd=$(./gradlew -q buildReverseDependencies -PchangedModules=${CHANGED_MODULES} | grep "final_unit_test_command=" | cut -d '=' -f2)
- echo unit test command $unit_test_cmd
- ./gradlew $unit_test_cmd;
only:
variables:
- $PIPELINE_TYPE == "ci_build"
Final Thoughts
This was a combined attempt to optimize unit test job which started over a casual talk with my colleagues. We gave it little more thought and tried very na?ve way of doing it. If you have any feedback or suggestions for improvement, feel free to leave a comment.
Credits
Thanks to Tarun Jain Sandhya Akshat Gupta and Jimmy Parija for their contributions. Happy learning !