Transcriptomics Data Quality Control and Adapter Trimming Workflow

public 1yr ago 0 bookmarks

View Workflow

Help improve this workflow!

This workflow has been published but could be further improved with some additional meta data:

Keyword(s) in categories input, output, operation

You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .

Implementation homework for transcriptomics

As discribed in task:

performed fastqc
performed multiqc
trimmed barcodes
performed fastqc on trimmed
performed multiqc on trimmed

the biggest indicator of succesful removal of adapters is Adapter content plot:

Before trimming	after trimming

Used STAR aliger
indexed bam with samtools index, and featureCounts

here we specify strandness:

0 not stranded

1 stranded

2 reversly stranded

Collibri is stranded, therefore 1

KAPA is reversly stranded, therefore 2

Used DESeq2 to perform DE analysis comparing UHRRvsHBR:

as a result obtained DE genes with padj values:

For collibri

and KAPA KAPA

also volkano plots:

for collibri: collibri-volkano

and KAPA:

kapa-volkano it has Snakemake file, which will create plot

performed PCA using DE genes:

plot for collibri:

collibri-pca

and kapa:

kapa-pca

Code Snippets

shell:
    "tar -xf {input} -C genome"

SnakeMake From line 19 of master/Snakefile

wrapper:
    "v1.28.0/bio/multiqc"

SnakeMake MultiQC From line 47 of master/Snakefile

wrapper:
    "v1.28.0/bio/bbtools/bbduk"

SnakeMake From line 63 of master/Snakefile

wrapper:
    "v1.28.0/bio/multiqc"

SnakeMake MultiQC From line 92 of master/Snakefile

shell:
    """
    echo {params.sjdbOverhang}
    STAR --runThreadN 4 --runMode genomeGenerate --genomeDir {output} \
    --genomeFastaFiles {input}
    """

SnakeMake STAR From line 104 of master/Snakefile

shell:
    """
    STAR --genomeDir {input.index} --readFilesIn {input.fq1} --outFileNamePrefix star/ --outSAMtype BAM SortedByCoordinate --readFilesCommand gunzip -c --outStd BAM_SortedByCoordinate > {output.aln}
    """

SnakeMake STAR From line 119 of master/Snakefile

shell:
    """
    echo {input.bam}
    samtools index {input.bam} {output.bai}
    """

SnakeMake SAMtools From line 131 of master/Snakefile

shell:
    """
    export test={input.samples}
    if [[ "$test" == *"Collibri"* ]]; then
        strand="1"
    else
        strand="2"
    fi
    echo "strand"
    echo $strand

    samtools sort -n {input.samples} | featureCounts -p -t exon -g gene_id -a {input.annotation} -o {output[0]} -s $strand
    """

SnakeMake SAMtools FeatureCounts From line 152 of master/Snakefile

__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2021, Filipe G. Vieira"
__license__ = "MIT"

from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts

java_opts = get_java_opts(snakemake)
extra = snakemake.params.get("extra", "")
adapters = snakemake.params.get("adapters", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)


n = len(snakemake.input.sample)
assert (
    n == 1 or n == 2
), "input->sample must have 1 (single-end) or 2 (paired-end) elements."

if n == 1:
    reads = "in={}".format(snakemake.input.sample)
    trimmed = "out={}".format(snakemake.output.trimmed)
else:
    reads = "in={} in2={}".format(*snakemake.input.sample)
    trimmed = "out={} out2={}".format(*snakemake.output.trimmed)


singleton = snakemake.output.get("singleton", "")
if singleton:
    singleton = f"outs={singleton}"


discarded = snakemake.output.get("discarded", "")
if discarded:
    discarded = f"outm={discarded}"


stats = snakemake.output.get("stats", "")
if stats:
    stats = f"stats={stats}"


shell(
    "bbduk.sh {java_opts} t={snakemake.threads} "
    "{reads} "
    "{adapters} "
    "{extra} "
    "{trimmed} {singleton} {discarded} "
    "{stats} "
    "{log}"
)

Python Snakemake snakemake-wrapper-utils BBMap From line 1 of bbduk/wrapper.py

__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"


from os import path

from snakemake.shell import shell


extra = snakemake.params.get("extra", "")
# Set this to False if multiqc should use the actual input directly
# instead of parsing the folders where the provided files are located
use_input_files_only = snakemake.params.get("use_input_files_only", False)

if not use_input_files_only:
    input_data = set(path.dirname(fp) for fp in snakemake.input)
else:
    input_data = set(snakemake.input)

output_dir = path.dirname(snakemake.output[0])
output_name = path.basename(snakemake.output[0])
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "multiqc"
    " {extra}"
    " --force"
    " -o {output_dir}"
    " -n {output_name}"
    " {input_data}"
    " {log}"
)