Boosting reproducibility, compatibility and scalability with Snakemake
The early stage in a theoretical project involves a lot of trial simulation to understand, in simple words, “what’s going on”. However, this is not an excuse for the lack of reproducibility, compatibility and scalability in the last product after we figured out how things work. Snakemake is an excellent solution to achieve the aforementioned characteristics for our simulations. It has several features that will make our lives a lot bit easier. Here I will just show a simple example that parallelizes a typical daily workflow.
Requirements
- Snakemake python library installed:
conda install -c conda-forge -c bioconda snakemake
.
root_path = 'root'
names = ['A', 'B', 'C']
rule all:
input:
"final.result"
run:
pass
rule setup:
output:
ABC = expand(root_path + "/{name}/prod.ABC", name=names)
run:
pass
rule simulation1:
input:
ABC = root_path + "/{name}/prod.ABC"
output:
DEF = root_path + "/{name}/prod.DEF"
run:
pass
rule simulation2:
input:
DEF = root_path + "/{name}/prod.DEF"
output:
XYZ = root_path + "/{name}/prod.XYZ"
run:
pass
rule gather:
input:
XYZ = expand(root_path + "/{name}/prod.XYZ", name=names)
output:
"final.result"
run:
pass
Which will produce the following directed acyclic graph or DAG:
Happy simulations!