RNA-Seq Data Analysis Workflow using Snakemake and Salmon for Gene Expression Quantification and Differential Expression Analysis
Help improve this workflow!
This workflow has been published but could be further improved with some additional meta data:- Keyword(s) in categories input, output
You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .
A simple Snakemake workflow for processing RNA-Seq using snakemake in conjunction with
salmon
. Data is
assumed to be paired-end
, and a two-treatment design is assumed, although multiple paired comparisons are also possible The steps implemented are:
-
Prepare the reference
-
Download the
fa
andgtf
files for the relevant build -
Prepare the set of decoy transcripts
-
Index the reference using the set of decoy transcripts
-
-
Pre-process raw fq files
-
Run QC and prepare a report on the raw fq files (
FastQC
/ngsReports
) -
Identify adapters (
bbtools
) -
Trim samples (
AdapterRemoval
)
-
-
Pre-process raw fq files trimmed samples
-
Run QC and prepare a report on the raw fq files (
FastQC
/ngsReports
)
-
Run QC and prepare a report on the raw fq files (
-
Align and quantify (
salmon quant
) -
Gene Level Differential Expression Analysis
-
Normalise using
cqn
-
Compare groups using quasi-likelihood approaches and
glmTreat
-
Perform an alternative analysis using
limma-voom
-
Required Setup
Prior to executing the workflow, please ensure the following steps have been performed
-
Place unprocessed
fastq
(orfastq.gz
) files indata/raw/fastq
. This path is hard-wired into the workflow and cannot be changed. If samples have been run across multiple lanes, please merge prior to running this workflow as all samples are expected to be in a single pair of files. -
Prepare a tab-separated file, usually
samples.tsv
and place this in theconfig
directory. The column namesample
is mandatory, however any other columns required in the analysis can be specified inconfig.yml
. When entering filenames in the sample column, you can leave off any suffixes (e.g. fastq.gz) as these are specified inconfig.yml
. File names are expected to finish in_R[12].fastq.gz
and only the section preceding this should be used for thesamples
column -
Edit
config.yml
as required. Fields cannot be changed
Snakemake implementation
The basic workflow is written
snakemake
and can be called using the following steps.
Firstly, set-up the required conda environments
snakemake \ --use-conda \ --conda-prefix '/home/steveped/mambaforge/envs/' \ --conda-create-envs-only \ --cores 1
Secondly, create and inspect the rulegraph
snakemake --rulegraph > workflow/rules/rulegraph.dot
dot -Tpdf workflow/rules/rulegraph.dot > workflow/rules/rulegraph.pdf
Finally, the workflow itself can be run using:
snakemake \ -p \ --use-conda \ --conda-prefix '/home/steveped/mambaforge/envs/' \ --notemp \ --keep-going \ --cores 16
-
All data (salmon quants, trimmed fastq etc) will be written to the
data
directory. Trimmed fastq andsalmon quant aux_info
files are marked as temporary and can be safely deleted after completion of the analysis -
RMarkdown files will be added to the
anaylsis
directory using modules provided in the workflow -
Compiled
html
files will be written to thedocs
folder, along with any topTable files
Note that this creates common environments able to be called by other workflows and is dependent on the user. For me, my global conda environments are stored in
/home/steveped/mambaforge/envs/
. For other users, this path will need to be modified.
Code Snippets
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | shell: """ Rscript --vanilla {input} &>> {log} if [[ {params.git} == "True" ]]; then TRIES={params.tries} while [[ -f .git/index.lock ]] do if [[ "$TRIES" == 0 ]]; then echo "ERROR: Timeout while waiting for removal of git index.lock" &>> {log} exit 1 fi sleep {params.interval} ((TRIES--)) done git add {output} fi """ |
50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 | shell: """ ## Make sure existing files are not overwritten, ## unless explicitly requested if [[ ! -f {output.rmd} || {params.overwrite} == 'True' ]]; then cp {input.rmd} {output.rmd} fi R -e "rmarkdown::render_site('{output.rmd}')" &>> {log} if [[ {params.git} == "True" ]]; then TRIES={params.tries} while [[ -f .git/index.lock ]] do if [[ "$TRIES" == 0 ]]; then echo "ERROR: Timeout while waiting for removal of git index.lock" &>> {log} exit 1 fi sleep {params.interval} ((TRIES--)) done git add {output} fi """ |
96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 | shell: """ ## Make sure existing files are not overwritten, ## unless explicitly requested if [[ ! -f {output.rmd} || {params.overwrite} == 'True' ]]; then cp {input.rmd} {output.rmd} fi R -e "rmarkdown::render_site('{output.rmd}')" &>> {log} if [[ {params.git} == "True" ]]; then TRIES={params.tries} while [[ -f .git/index.lock ]] do if [[ "$TRIES" == 0 ]]; then echo "ERROR: Timeout while waiting for removal of git index.lock" &>> {log} exit 1 fi sleep {params.interval} ((TRIES--)) done git add {output} fi """ |
143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 | shell: """ ## Make sure existing files are not overwritten, ## unless explicitly requested if [[ ! -f {output.rmd} || {params.overwrite} == 'True' ]]; then cp {input.rmd} {output.rmd} fi R -e "rmarkdown::render_site('{output.rmd}')" &>> {log} if [[ {params.git} == "True" ]]; then TRIES={params.tries} while [[ -f .git/index.lock ]] do if [[ "$TRIES" == 0 ]]; then echo "ERROR: Timeout while waiting for removal of git index.lock" &>> {log} exit 1 fi sleep {params.interval} ((TRIES--)) done git add {output} fi """ |
185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 | shell: """ ## Make sure existing files are not overwritten, ## unless explicitly requested if [[ ! -f {output.rmd} || {params.overwrite} == 'True' ]]; then cp {input.rmd} {output.rmd} fi R -e "rmarkdown::render_site('{output.rmd}')" &>> {log} if [[ {params.git} == "True" ]]; then TRIES={params.tries} while [[ -f .git/index.lock ]] do if [[ "$TRIES" == 0 ]]; then echo "ERROR: Timeout while waiting for removal of git index.lock" &>> {log} exit 1 fi sleep {params.interval} ((TRIES--)) done git add {output} fi """ |
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | shell: """ ## Make sure existing files are not overwritten, ## unless explicitly requested if [[ ! -f {output.rmd} || {params.overwrite} == 'True' ]]; then Rscript --vanilla \ {input.script} \ {params.comp} \ {output.rmd} &>>{log} fi if [[ {params.git} == "True" ]]; then TRIES={params.tries} while [[ -f .git/index.lock ]] do if [[ "$TRIES" == 0 ]]; then echo "ERROR: Timeout while waiting for removal of git index.lock" &>> {log} exit 1 fi sleep {params.interval} ((TRIES--)) done git add {output} fi """ |
65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 | shell: """ R -e "rmarkdown::render_site('{input.rmd}')" &>> {log} if [[ {params.git} == "True" ]]; then TRIES={params.tries} while [[ -f .git/index.lock ]] do if [[ "$TRIES" == 0 ]]; then echo "ERROR: Timeout while waiting for removal of git index.lock" &>> {log} exit 1 fi sleep {params.interval} ((TRIES--)) done git add {output} fi """ |
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 | shell: """ # Write to a separate temp directory for each run to avoid I/O clashes TEMPDIR=$(mktemp -d -t fqcXXXXXXXXXX) fastqc \ {params.extra} \ -t {threads} \ --outdir $TEMPDIR \ {input.fq} &> {log} # Move the files mv $TEMPDIR/*html $(dirname {output.html}) mv $TEMPDIR/*zip $(dirname {output.zip}) # Clean up the temp directory rm -rf $TEMPDIR ## Update the git repo if one is active if [[ {params.git} == "True" ]]; then TRIES={params.tries} while [[ -f .git/index.lock ]] do if [[ "$TRIES" == 0 ]]; then echo "ERROR: Timeout while waiting for removal of git index.lock" &>> {log} exit 1 fi sleep {params.interval} ((TRIES--)) done git add {output} fi """ |
62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 | shell: """ bbmerge.sh \ threads={threads} \ in1={input.r1} \ in2={input.r2} \ outa={output.fa} 2> {log} if [[ {params.git} == "True" ]]; then TRIES={params.tries} while [[ -f .git/index.lock ]] do if [[ "$TRIES" == 0 ]]; then echo "ERROR: Timeout while waiting for removal of git index.lock" &>> {log} exit 1 fi sleep {params.interval} ((TRIES--)) done git add {output} fi """ |
112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 | shell: """ AdapterRemoval \ --adapter1 {params.adapter1} \ --adapter2 {params.adapter2} \ --file1 {input.r1} \ --file2 {input.r2} \ {params.extra} \ --threads {threads} \ --maxns {params.maxns} \ --minquality {params.minqual} \ --minlength {params.minlength} \ --output1 {output.r1} \ --output2 {output.r2} \ --discarded /dev/null \ --singleton /dev/null \ --settings {output.log} &> {log} if [[ {params.git} == "True" ]]; then TRIES={params.tries} while [[ -f .git/index.lock ]] do if [[ "$TRIES" == 0 ]]; then echo "ERROR: Timeout while waiting for removal of git index.lock" &>> {log} exit 1 fi sleep {params.interval} ((TRIES--)) done git add {output.log} fi """ |
44 45 46 47 48 | shell: """ ## Download and compress to bgzip on the fly curl {params.url} 2> {log} | zcat | bgzip -c > {output} """ |
58 59 60 61 62 63 64 | shell: """ samtools faidx \ --gzi-idx {output.gzi} \ --fai-idx {output.fai} \ {input} """ |
76 77 78 79 80 81 | shell: """ samtools faidx {input.fa} \ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 MT X Y | \ gzip > {output} """ |
90 91 92 93 | shell: """ curl {params.url} 2> {log} | zcat | bgzip -c > {output} """ |
103 104 105 106 107 108 109 | shell: """ samtools faidx \ --gzi-idx {output.gzi} \ --fai-idx {output.fai} \ {input} """ |
124 125 126 127 128 129 130 131 132 133 134 | shell: """ zcat {input.fa} | \ egrep "{params.regexp}" | \ cut -f1 -d\ | \ sed -r 's/^>//g' > {output.ids} samtools faidx \ -r {output.ids} \ {input.fa} | \ bgzip -c > {output.fa} """ |
143 144 145 146 | shell: """ curl -o {output} {params.url} 2> {log} """ |
152 153 154 155 156 | shell: """ grep "^>" <(gunzip -c {input}) | cut -d " " -f 1 > {output} sed -i.bak -e 's/>//g' {output} """ |
164 165 166 167 | shell: """ cat {input.trans_fa} {input.ref_fa} > {output} """ |
177 178 179 180 181 182 183 184 | shell: """ salmon index \ -t {input.gentrome} \ -d {input.decoys} \ -p {threads} \ -i {output} 2> {log} """ |
50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 | shell: """ salmon quant \ -i {input.index} \ -l {params.lib} \ -p {threads} \ --numBootstraps {params.nBoot} \ --numGibbsSamples {params.nGibbs} \ --thinningFactor {params.thin} \ {params.extra} \ -o {params.dir} \ -1 {input.r1} \ -2 {input.r2} &>> {log} if [[ {params.git} == "True" ]]; then TRIES={params.tries} while [[ -f .git/index.lock ]] do if [[ "$TRIES" == 0 ]]; then echo "ERROR: Timeout while waiting for removal of git index.lock" &>> {log} exit 1 fi sleep {params.interval} ((TRIES--)) done git add {output.quant} {output.json} {output.meta} fi """ |
Support
- Future updates
Related Workflows





