Transcriptomics Data Quality Control and Adapter Trimming Workflow
Help improve this workflow!
This workflow has been published but could be further improved with some additional meta data:- Keyword(s) in categories input, output, operation
You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .
Implementation homework for transcriptomics
As discribed in task:
-
performed fastqc
-
performed multiqc
-
trimmed barcodes
-
performed fastqc on trimmed
-
performed multiqc on trimmed
the biggest indicator of succesful removal of adapters is Adapter content plot:
Before trimming | after trimming |
---|---|
![]() |
![]() |
-
Used STAR aliger
-
indexed bam with samtools index, and featureCounts
here we specify strandness:
0 not stranded
1 stranded
2 reversly stranded
Collibri is stranded, therefore 1
KAPA is reversly stranded, therefore 2
- Used DESeq2 to perform DE analysis comparing UHRRvsHBR:
as a result obtained DE genes with padj values:
For collibri
and KAPA
also volkano plots:
for collibri:
and KAPA:
it has Snakemake file, which will create plot
- performed PCA using DE genes:
plot for collibri:
and kapa:
Code Snippets
19 20 | shell: "tar -xf {input} -C genome" |
47 48 | wrapper: "v1.28.0/bio/multiqc" |
63 64 | wrapper: "v1.28.0/bio/bbtools/bbduk" |
92 93 | wrapper: "v1.28.0/bio/multiqc" |
104 105 106 107 108 109 | shell: """ echo {params.sjdbOverhang} STAR --runThreadN 4 --runMode genomeGenerate --genomeDir {output} \ --genomeFastaFiles {input} """ |
119 120 121 122 | shell: """ STAR --genomeDir {input.index} --readFilesIn {input.fq1} --outFileNamePrefix star/ --outSAMtype BAM SortedByCoordinate --readFilesCommand gunzip -c --outStd BAM_SortedByCoordinate > {output.aln} """ |
131 132 133 134 135 | shell: """ echo {input.bam} samtools index {input.bam} {output.bai} """ |
152 153 154 155 156 157 158 159 160 161 162 163 164 | shell: """ export test={input.samples} if [[ "$test" == *"Collibri"* ]]; then strand="1" else strand="2" fi echo "strand" echo $strand samtools sort -n {input.samples} | featureCounts -p -t exon -g gene_id -a {input.annotation} -o {output[0]} -s $strand """ |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 | __author__ = "Filipe G. Vieira" __copyright__ = "Copyright 2021, Filipe G. Vieira" __license__ = "MIT" from snakemake.shell import shell from snakemake_wrapper_utils.java import get_java_opts java_opts = get_java_opts(snakemake) extra = snakemake.params.get("extra", "") adapters = snakemake.params.get("adapters", "") log = snakemake.log_fmt_shell(stdout=True, stderr=True) n = len(snakemake.input.sample) assert ( n == 1 or n == 2 ), "input->sample must have 1 (single-end) or 2 (paired-end) elements." if n == 1: reads = "in={}".format(snakemake.input.sample) trimmed = "out={}".format(snakemake.output.trimmed) else: reads = "in={} in2={}".format(*snakemake.input.sample) trimmed = "out={} out2={}".format(*snakemake.output.trimmed) singleton = snakemake.output.get("singleton", "") if singleton: singleton = f"outs={singleton}" discarded = snakemake.output.get("discarded", "") if discarded: discarded = f"outm={discarded}" stats = snakemake.output.get("stats", "") if stats: stats = f"stats={stats}" shell( "bbduk.sh {java_opts} t={snakemake.threads} " "{reads} " "{adapters} " "{extra} " "{trimmed} {singleton} {discarded} " "{stats} " "{log}" ) |
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | __author__ = "Julian de Ruiter" __copyright__ = "Copyright 2017, Julian de Ruiter" __email__ = "julianderuiter@gmail.com" __license__ = "MIT" from os import path from snakemake.shell import shell extra = snakemake.params.get("extra", "") # Set this to False if multiqc should use the actual input directly # instead of parsing the folders where the provided files are located use_input_files_only = snakemake.params.get("use_input_files_only", False) if not use_input_files_only: input_data = set(path.dirname(fp) for fp in snakemake.input) else: input_data = set(snakemake.input) output_dir = path.dirname(snakemake.output[0]) output_name = path.basename(snakemake.output[0]) log = snakemake.log_fmt_shell(stdout=True, stderr=True) shell( "multiqc" " {extra}" " --force" " -o {output_dir}" " -n {output_name}" " {input_data}" " {log}" ) |
Support
- Future updates
Related Workflows





