Boosting reproducibility, compatibility and scalability with Snakemake

The early stage in a theoretical project involves a lot of trial simulation to understand, in simple words, “what’s going on”. However, this is not an excuse for the lack of reproducibility, compatibility and scalability in the last product after we figured out how things work. Snakemake is an excellent solution to achieve the aforementioned characteristics for our simulations. It has several features that will make our lives a lot bit easier. Here I will just show a simple example that parallelizes a typical daily workflow.

Requirements

  • Snakemake python library installed: conda install -c conda-forge -c bioconda snakemake.
root_path = 'root'
names = ['A', 'B', 'C']

rule all:
    input:
        "final.result"
    run:
        pass

rule setup:
    output:
        ABC = expand(root_path + "/{name}/prod.ABC", name=names)
    run:
        pass

rule simulation1:
    input:
        ABC = root_path + "/{name}/prod.ABC"
    output:
        DEF = root_path + "/{name}/prod.DEF"
    run:
        pass

rule simulation2:
    input:
        DEF = root_path + "/{name}/prod.DEF"
    output:
        XYZ = root_path + "/{name}/prod.XYZ"
    run:
        pass

rule gather:
    input:
        XYZ = expand(root_path + "/{name}/prod.XYZ", name=names)
    output:
        "final.result"
    run:
        pass

Which will produce the following directed acyclic graph or DAG:

dag

Happy simulations!

Alejandro Martínez León
Alejandro Martínez León
PhD-Student in Biophysics

My research interests include molecular dynamic simulations, coding and theoretical biophysics.