Command line bioinformatics workflows, created with Snakemake workflow management tool.
Help improve this workflow!
This workflow has been published but could be further improved with some additional meta data:- Keyword(s) in categories input, output, operation, topic
You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .
Update: March 2021, Pipeline No Longer Supported as QIIME is available in a 2.0 Version, and 1.9 is no longer supported.
Synopsis
This workflow describes a series of steps executed to get from raw fastq files, resulting from the amplicon sequencing of sample(s), to OTU table, describing the taxonomic determination summary for the analysed sample(s). It executes on Linux command line, using a Snakemake workflow management system.
Workflow
-
QC of input files
-
Trim input files
-
QC of trimmed files
-
Join forward and referse sequences
-
QC of joined sequences
-
Cluster sequences
-
Pick representative sequences
-
Detect and remove chimeric representative sequences/clusters
-
Taxonomic classification
-
Create OTU table
Setup
-
Create a directory with input data. This should be paired illumina sequences.
- This can be a directory of symbolic links to data elsewhere on a file system.
-
In config.yaml file:
-
Verify the working directory, this is the location of the pipeline output
-
Verify input directory, this can be an absolute path or relative to the working directory
-
Verify reference fasta and taxonomy, this can be absolute paths or relative to the working directory
-
Verify that input sequences file extension
-
Verify the input_file_forward_postfix parameter corresponds to the naming of your raw files. Change it, if necessary.
-
-
Optional: change tools parameters/paths in config.yaml file.
For details on the workflow tools, their version, arguments used, and order of execution see Snakefile .
Execute
To check if the workflow will run correctly without executing the steps:
$ snakemake -np --configfile config.yaml
To execute the workflow:
$ snakemake --configfile config.yaml
Note: If you are not in the same directory as the Snakefile you will need the extra parameter
--snakefile
with the path to the Snakefile
Installation
This worflow runs on Linux. To install this workflow, either locally or on a cluster, you will need to have the following requirements installed.
Requirements
-
Python 3.5+
-
PyYAML 5.4+
-
Snakemake 3.7.1
-
FastQC 0.11.5
-
Trimomatic 0.36
-
Qiime 1.9
Download the latest release of this project:
https://github.com/AAFC-MBB/snakemake-amplicon-metagenomics/releases
OR
Check out this project (requires git):
$ git clone https://github.com/AAFC-MBB/snakemake-workflows.git
Tests
Automated test is located in snakemake-workflows/amplicon_workflow/test/ . To run the test, first download the test data to snakemake-workflows/amplicon_workflow/test/data/ directory (see README in that directory for the instructions). Before you execute the test please note, that the test runs Snakefile with the test data and therefore uses the same output directory as a regular snakemake command (default is snakemake-workflows/amplicon_workflow/data ). Therefore if you already have some input or intermetiate workflow execution data in your data directory and you would like to keep it - back it up.
Execute the tests:
$ ./test.sh -clean -run
Info
For more information about Snakemake, visit their website: https://bitbucket.org/snakemake/snakemake/wiki/Home
Authors
Licensing
See License file.
Code Snippets
97 98 99 100 101 102 | shell: """ initial_data_quality_cmd="fastqc {input} --outdir=step0_initial_data_quality" ;\ echo "Executed command:\n" $initial_data_quality_cmd ;\ $initial_data_quality_cmd """ |
121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 | shell: """ touch null.fa java -jar {input.exec} PE -threads {threads} -phred33 {input.forward} {input.reverse} \ {output.forward_paired} {output.forward_unpaired} {output.reverse_paired} {output.reverse_unpaired} \ ILLUMINACLIP:{config[trimmomatic][ILLUMINACLIP]} \ HEADCROP:{config[trimmomatic][HEADCROP]} \ LEADING:{config[trimmomatic][LEADING]} \ SLIDINGWINDOW:{config[trimmomatic][SLIDINGWINDOW]} \ TRAILING:{config[trimmomatic][TRAILING]} \ AVGQUAL:{config[trimmomatic][AVGQUAL]} \ MINLEN:{config[trimmomatic][MINLEN]} \ CROP:{config[trimmomatic][CROP]} rm null.fa """ |
149 150 151 152 153 154 155 156 157 | shell: """ trimm_quality_cmd="fastqc {input.forward} --outdir=step1_trimmomatic/quality" ;\ echo "Executed command:\n" $trimm_quality_cmd ;\ $trimm_quality_cmd trimm_quality_cmd="fastqc {input.reverse} --outdir=step1_trimmomatic/quality" ;\ echo "Executed command:\n" $trimm_quality_cmd ;\ $trimm_quality_cmd """ |
175 176 177 178 179 180 181 182 183 | shell: """ join_cmd="join_paired_ends.py -f {input.forward_paired} -r {input.reverse_paired} -o step2_join/ -m fastq-join" ;\ echo "Executed command:\n" $join_cmd ;\ $join_cmd ;\ mv step2_join/fastqjoin.join.fastq {output.joined_seqs} ;\ mv step2_join/fastqjoin.un1.fastq {output.unjoined_forward_seqs} ;\ mv step2_join/fastqjoin.un2.fastq {output.unjoined_reverse_seqs} ;\ """ |
197 198 199 200 201 202 | shell: """ join_quality_cmd="fastqc {input} --outdir=step2_join/quality" ;\ echo "Executed command:\n" $join_quality_cmd ;\ $join_quality_cmd """ |
215 216 217 218 219 220 221 222 223 224 225 | shell: """ for file in {input.fastq}; do \ sample_id=$(echo $file | rev | cut -d'/' -f1 | rev |cut -d'_' -f1-4);\ s_id=$(echo $sample_id | sed -e 's/_/\./g');\ echo "Converting file $file";\ echo "Original sample id: $sample_id";\ echo "New sample id: $s_id";\ sed -n '1~4s/^@/>'"$s_id"'_/p;2~4p' "$file" >> {output}; \ done """ |
237 238 239 240 241 242 | shell: """ cluster_otus_cmd="pick_otus.py -i {input} -m uclust -s {config[pick_otus][s]} -o step4_pick_otu" ;\ echo "Executed command:\n" $cluster_otus_cmd ;\ $cluster_otus_cmd """ |
255 256 257 258 259 260 | shell: """ pick_representatives_cmd="pick_rep_set.py -i {input.otu} -f {input.fasta} -m longest -o {output}" ;\ echo "Executed command:\n" $pick_representatives_cmd ;\ $pick_representatives_cmd """ |
274 275 276 277 278 279 280 281 282 | shell: """ check_chimeric_seqs_cmd="parallel_identify_chimeric_seqs.py -i {input.dataset} -t {input.reference_txt} -r {input.reference_fasta} -m blast_fragments -o {output.chimeric_list} -O {config[threads]}" ;\ echo "Executed command:\n" $check_chimeric_seqs_cmd ;\ $check_chimeric_seqs_cmd remove_chimeric_seqs_cmd="filter_fasta.py -f {input.dataset} -o {output.rep_set} -s {output.chimeric_list} -n" ;\ echo "Executed command:\n" $remove_chimeric_seqs_cmd ;\ $remove_chimeric_seqs_cmd """ |
296 297 298 299 300 301 302 | shell: """ classify_cmd="parallel_assign_taxonomy_rdp.py -i {input.dataset} -o step7_classify \ -r {input.reference_fasta} -t {input.reference_txt} --rdp_max_memory 10000 -c {config[assign_taxonomy][c]} -O {config[threads]}" ;\ echo "Executed command:\n" $classify_cmd ;\ $classify_cmd """ |
316 317 318 319 320 321 | shell: """ make_otu_cmd="make_otu_table.py -i {input.otu} -t {input.assigned_taxonomy} -o {output}" ;\ echo "Executed command:\n" $make_otu_cmd ;\ $make_otu_cmd """ |
332 333 334 335 336 337 | shell: """ convert_otu_table_cmd="biom convert -i {input} -o {output} --to-tsv --header-key taxonomy" ;\ echo "Executed command:\n" $convert_otu_table_cmd ;\ $convert_otu_table_cmd """ |
Support
- Future updates
Related Workflows





