FastQC and MultiQC Workflow for BaseSpace Data Merging and Quality Control

public 1yr ago 0 bookmarks

View Workflow

snakemake_basespace_merge_qc — View Workflow

Help improve this workflow!

This workflow has been published but could be further improved with some additional meta data:

Keyword(s) in categories input, output

You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .

This workflow performs fastqc on an input PROJECT directory downloaded from basespace. It will merge the FASTQ files between lanes, then run fastqc on all merged data and compile an aggregate report with multiqc.

Usage

Step 1: Install workflow

clone this workflow to your local computer

Step 2: Configure workflow

Configure the workflow according to your needs by editing the config.yaml to configure your input basespace PROJECT directory.

Step 3: Execute workflow

Test your configuration by performing a dry-run via

snakemake --use-conda -n

Execute the workflow locally via

snakemake --use-conda --cores $N

using $N cores or run it in a cluster environment via

snakemake --use-conda --cluster qsub --jobs 100

Code Snippets

shell: "cat {input} > {output}"

SnakeMake From line 75 of main/Snakefile

shell: "cat {input} > {output}"

SnakeMake From line 80 of main/Snakefile

wrapper:
    "v0.69.0/bio/fastqc"

SnakeMake FastQC From line 93 of main/Snakefile

wrapper:
    "v0.69.0/bio/fastqc"

SnakeMake FastQC From line 106 of main/Snakefile

wrapper:
    "0.62.0/bio/multiqc"

SnakeMake MultiQC From line 117 of main/Snakefile

wrapper:
    "v0.69.0/bio/fastqc"

SnakeMake FastQC From line 130 of main/Snakefile

wrapper:
    "v0.69.0/bio/fastqc"

SnakeMake FastQC From line 143 of main/Snakefile

wrapper:
    "0.62.0/bio/multiqc"

SnakeMake MultiQC From line 154 of main/Snakefile

__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"


from os import path

from snakemake.shell import shell


input_dirs = set(path.dirname(fp) for fp in snakemake.input)
output_dir = path.dirname(snakemake.output[0])
output_name = path.basename(snakemake.output[0])
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "multiqc"
    " {snakemake.params}"
    " --force"
    " -o {output_dir}"
    " -n {output_name}"
    " {input_dirs}"
    " {log}"
)

Python Snakemake MultiQC From line 3 of multiqc/wrapper.py

__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"


from os import path
from tempfile import TemporaryDirectory

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False, stderr=True)


def basename_without_ext(file_path):
    """Returns basename of file path, without the file extension."""

    base = path.basename(file_path)

    split_ind = 2 if base.endswith(".fastq.gz") else 1
    base = ".".join(base.split(".")[:-split_ind])

    return base


# Run fastqc, since there can be race conditions if multiple jobs
# use the same fastqc dir, we create a temp dir.
with TemporaryDirectory() as tempdir:
    shell(
        "fastqc {snakemake.params} --quiet -t {snakemake.threads} "
        "--outdir {tempdir:q} {snakemake.input[0]:q}"
        " {log:q}"
    )

    # Move outputs into proper position.
    output_base = basename_without_ext(snakemake.input[0])
    html_path = path.join(tempdir, output_base + "_fastqc.html")
    zip_path = path.join(tempdir, output_base + "_fastqc.zip")

    if snakemake.output.html != html_path:
        shell("mv {html_path:q} {snakemake.output.html:q}")

    if snakemake.output.zip != zip_path:
        shell("mv {zip_path:q} {snakemake.output.zip:q}")