High throughput Next Generation Sequencing (NGS) data analysis using Python 3 Snakemake
Help improve this workflow!
This workflow has been published but could be further improved with some additional meta data:- Keyword(s) in categories input, output, operation, topic
- Lack of a description for a new keyword Nextflow .
You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .
Alex Soupir
Snakemake is a Bioinformatics tool for managing a workflow. This tool proves valuable when analyzing a large amount of data with multiple tools. This script was made as a learning tool for workflow manager. There is also Nextflow to manage large analysis workflows. Here, Snakemake was used to run everything that is usually run on Linux with RNA-Seq Analyses (here is my long winded version of an RNA-seq analysis on Mice p53 gene mutation).
The Snakemake file was developed for analyzing sequencing data from a recent publication - Dietary walnut altered gene expressions related to tumor growth, survival, and metastasis in breast cancer patients: a pilot clinical trial . The raw sequences were downloaded from the Sequence Read Archive Run Selector using sra-tools.
-
The genome and the gtf files were downloaded and an index was created of the genome.
-
The minikraken was downloaded and extracted.
-
Homo sapiens rRNA sequences were downloaded from NCBI.
-
PhiX sequences were downloaded from Illumina.
Conda Tools Used:
-
FastQC
-
Trimmomatic
-
STAR
-
featureCounts (conda Subread)
-
Bowtie2
-
bwa
-
Samtools
-
Bam2Fastx
-
MultiQC
Running
To run
Snakemake
, a big memory cluster node was used. To run, in the same folder as the snakefile I used
snakemake -j 80
which tells Snakemake to use 80 cores. Snakemake if not given a file name searches current directory for a file named
snakefile
.
The output of running Snakemake is a QC folder with results for all of the steps as well as MultiQC which makes nice HTML pages to summarize the results.
MultiQC doesn't have the ability to identify and create summaries for microbial contamination with KrakenUniq.
Code Snippets
35 36 | shell: "~/miniconda2/bin/fastqc {input} --threads {threads} -O {params}" |
50 51 | shell: "~/miniconda2/bin/trimmomatic PE -threads {threads} {input.read1} {input.read2} {output.fp} {output.fu} {output.rp} {output.ru} ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10:2:keepBothReads SLIDINGWINDOW:4:20 TRAILING:3 MINLEN:36 2>{log}" |
62 63 | shell: "~/miniconda2/bin/fastqc {input} --threads {threads} -O {params}" |
77 78 79 80 | shell: """ ~/miniconda2/bin/STAR --runThreadN {threads} --genomeDir genomeIndex --readFilesIn {input.trimmed1} {input.trimmed2} --outFilterIntronMotifs RemoveNoncanonical --outFileNamePrefix {params.prefix} --outSAMtype BAM SortedByCoordinate --outReadsUnmapped Fastx """ |
87 88 89 90 91 92 93 | shell: "~/miniconda2/bin/featureCounts -a genome/gencode.v32.annotation.gtf -o {output.counts} {input.bam} -T {threads} rule PhiXContamination: input: trimmed1="Trimmed/{id}_forward_paired.fastq.gz", trimmed2="Trimmed/{id}_reverse_paired.fastq.gz" |
101 102 103 104 | shell: """ ~/miniconda2/bin/bowtie2 -p {threads} -x {params.bowtieGenome} -1 {input.trimmed1} -2 {input.trimmed2} -S {params.prefix}.sam &> {params.prefix}.out """ |
116 117 118 119 | shell: """ ~/miniconda2/bin/krakenuniq --preload --db minikraken_20171019_8GB --threads {threads} --paired --report-file {output.report} --fastq-input {input.unmapped1} {input.unmapped2} > {output.out} """ |
130 131 132 133 | shell: """ ~/miniconda2/bin/bwa mem -t {threads} {params} {input.trimmed1} {input.trimmed2} > {output.rnaSam} """ |
143 144 145 146 147 148 | shell: """ ~/miniconda2/bin/samtools view -@ {threads} -bS -o {output.rnaBam} {input.rnaSam} ~/miniconda2/bin/samtools flagstat -@ {threads} {output.rnaBam} > {output.rnaOut} ~/miniconda2/bin/samtools view -@ {threads} -u -f 12 -F 256 {output.rnaBam} > {output.rnaUnm} """ |
157 158 159 160 | shell: """ ~/miniconda2/bin/bamToFastq -i {input} -fq {output.fwd} -fq2 {output.rvs} """ |
174 175 176 177 178 179 180 | shell: """ cp {input.star} {output.starout} cp {input.rrna} {output.rrnaout} cp {input.micr} {output.microut} cp {input.phix} {output.phixout} """ |
185 186 | shell: "~/miniconda2/bin/multiqc QC/." |
Support
- Future updates
Related Workflows





