Integrating GPU-Enabled Docker Container Into a Nextflow Pipeline
Belson Malcolm Kutambe
Bioinformatics Engineer | Nextflow | Snakemake | Python | Bash | SQL
Recently I had to use a docker container inside a nextflow pipeline. Here are the things I learned:
Set up
Having nextflow and docker installed is not enough to run a container inside a nextflow pipeline. You need to install some additional drivers, e.g. nvidia-docker. There is excellent documentation regarding the whole process in this article:https://medium.com/@kepler_00/nanopore-gpu-basecalling-using-guppy-on-ubuntu-18-04-and-nvidia-docker-v2-with-a-rtx-2080-d875945e5c8d
Enable docker
Nextflow needs to know that you intend to use a docker image inside your pipeline.There are two ways to do that:
1.Using the with-docker command-line option to specify the docker image during runtime
2. Enabling docker in the config file and specifying the docker image inside the process scope like this:
I find the second option quite convenient!
Mounting a file system inside a docker container
1.Using the temp variable. According to the documentation, you can set the temp variable inside the docker scope to point to a path to mount.Behind the hood the mounted path will be mapped to /tmp endpoint inside the container. For example:
领英推荐
You access the mounted path inside the doer container from the /tmp endpoint:
2. Using the runOptions variable inside the docker scope, you can explicitly set -v flag to mount a file or directory from a host machine like this:
3. Letting nextflow manage the file system mounts and just worry about providing the necessary input files. For example suppose I have a fast5_ch channel for the fast5s which I stage as an input to a process called basecalling. During execution nextflow will mount the fast5 paths specified in the input channel inside my docker container
The automatic management of file system mounts by nextflow is one of the fine points of Nextflow.Thank you nextflow developers!!!
Check check typos
Finally I share a tricky situation I faced while integrating the docker container inside the pipeline.I was getting the following error after running the pipeline:
On the surface it would seem the docker image does not support the GPU mode BUT I had done a successful test run of this image in a GPU mode outside the nextflow pipeline. Further analysis of the problem revealed a typo inside my config file to be the cause. Instead of writing enabled inside my docker scope I wrote enable.That single omission of d caused the above mayhem.
Acknowledgements
Big credits go to the nextflow developers for making it possible to integrate docker inside pipelines and providing excellent documentation for the same.