Help improve this workflow!
This workflow has been published but could be further improved with some additional meta data:- Keyword(s) in categories input, output
You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .
AmpSeeker is a snakemake workflow for Amplicon Sequencing data analysis. The pipeline is a work in progress, however, it currently implements:
- BCL to Fastq conversion
- Genome alignment
- Variant calling
- Quality control
- Coverage
- Visualisation of reads in IGV
- VCF to DataFrame/.xlsx
- Allele frequency calculation
- Principal component analysis
- Geographic sample maps
The workflow uses a combination of papermill and jupyter book, so that users can visually explore the results in a local webpage for convenience.
Usage
Please see the wiki for more information on running the workflow. If you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this (original) repository and, if available, its DOI (see above).
Testing
Test cases are in the subfolder
.test
. They are automatically executed via continuous integration with
Github Actions
.
Code Snippets
12 13 14 15 | shell: """ samtools faidx {input.ref} 2> {log} """ |
30 31 32 33 | shell: """ bwa index {input.ref} 2> {log} """ |
58 59 60 61 62 | shell: """ bwa mem -t {threads} -R {params.tag} {input.ref} {input.reads} 2> {log.align} | samtools sort -@{threads} -o {output} 2> {log.sort} """ |
73 74 | shell: "samtools index {input} {output} 2> {log}" |
96 97 98 99 100 | shell: """ bcftools mpileup -Ov -f {params.ref} -R {params.regions} --max-depth {params.depth} {input.bam} 2> {log.mpileup} | bcftools call -m -Ov 2> {log.call} | bcftools sort -Ov -o {output.calls} 2> {log.call} """ |
121 122 123 124 125 | shell: """ bcftools mpileup -Ov -f {params.ref} --max-depth {params.depth} {input.bam} 2> {log.mpileup} | bcftools call -m -Ov 2> {log.call} | bcftools sort -Ov -o {output.calls} 2> {log.call} """ |
142 143 | shell: "snpEff download {params.ref} -dataDir {params.dir} 2> {log}" |
164 165 166 167 | shell: """ snpEff eff {params.db} -dataDir {params.dir} -csvStats {output.csvStats} {input.calls} > {output.calls} 2> {log} """ |
20 21 22 23 24 | shell: """ papermill {input.nb} {output.nb} -k AmpSeq_python -p metadata_path {input.metadata} -p genome_name {params.reference_name} -p reference_fasta {input.genome} -p reference_gff3 {input.gff3} 2> {log} cp {output.nb} {output.docs_nb} 2>> {log} """ |
43 44 45 46 47 | shell: """ papermill {input.nb} {output.nb} -k AmpSeq_python -p metadata_path {input.metadata} -p dataset {params.dataset} -p vcf_path {input.vcf} -p colour_column {params.column} 2> {log} cp {output.nb} {output.docs_nb} 2>> {log} """ |
63 64 65 66 67 | shell: """ papermill {input.nb} {output.nb} -k AmpSeq_python -p metadata_path {input.metadata} -p dataset {params.dataset} 2> {log} cp {output.nb} {output.docs_nb} 2>> {log} """ |
87 88 89 90 91 | shell: """ papermill {input.nb} {output.nb} -k AmpSeq_python -p metadata_path {input.metadata} -p dataset {params.dataset} -p bed_path {input.bed} -p cohort_column {params.cohort_column} -p vcf_path {input.vcf} 2> {log} cp {output.nb} {output.docs_nb} 2>> {log} """ |
108 109 110 111 112 | shell: """ papermill {input.nb} {output.nb} -k AmpSeq_python -p dataset {params.dataset} -p wkdir {params.wkdir} 2> {log} cp {output.nb} {output.docs_nb} 2>> {log} """ |
12 13 | shell: "bcl-convert --bcl-input-directory {input.illumina_in_dir} --output-directory {output.reads_dir} --sample-sheet {input.sample_csv} --force 2> {log}" |
32 33 34 35 36 37 38 39 40 41 42 43 44 | shell: """ while IFS="," read -r _ sample_id _ _ read1 read2; do if [[ $sample_id == "RGSM" ]]; then continue fi echo renaming $read1 and $read2 to ${{sample_id}}_1.fastq.gz and ${{sample_id}}_2.fastq.gz mv $read1 resources/reads/${{sample_id}}_1.fastq.gz mv $read2 resources/reads/${{sample_id}}_2.fastq.gz done < {input.fastq_list} """ |
25 26 27 28 | shell: """ python -m ipykernel install --user --name=AmpSeq_python 2> {log} """ |
20 21 22 23 24 | shell: """ jupyter-book build --all {input.pages} --path-output results/ampseeker-results && ln -sf results/ampseeker-results/_build/html/index.html AmpSeeker-results.html """ |
46 47 48 49 | shell: """ papermill -k AmpSeq_python {input.input_nb} {output.out_nb} -p wkdir {params.wkdir} 2> {log} """ |
17 18 19 20 21 | shell: """ papermill {input.nb} {output.nb} -k AmpSeq_python -p metadata_path {input.metadata} 2> {log} cp {output.nb} {output.docs_nb} 2>> {log} """ |
39 40 41 42 43 | shell: """ papermill {input.nb} {output.nb} -k AmpSeq_python -p metadata_path {input.metadata} -p index_read_qc {params.index_qc} -p wkdir {params.wd} 2> {log} cp {output.nb} {output.docs_nb} 2>> {log} """ |
62 63 64 65 66 | shell: """ papermill {input.nb} {output.nb} -k AmpSeq_python -p metadata_path {input.metadata} -p bed_targets_path {input.targets} -p wkdir {params.wkdir} 2> {log} cp {output.nb} {output.docs_nb} 2>> {log} """ |
11 12 13 14 15 16 17 18 19 20 | shell: """ zcat resources/reads/*I1*.fastq.gz | fastqc stdin --outdir results/qc/index-read-qc/ 2>> {log} mv results/qc/index-read-qc/stdin_fastqc.html results/qc/index-read-qc/I1.html 2>> {log} mv results/qc/index-read-qc/stdin_fastqc.zip results/qc/index-read-qc/I1.zip 2>> {log} zcat resources/reads/*I2*.fastq.gz | fastqc stdin --outdir results/index-read-qc/ 2>> {log} mv results/qc/index-read-qc/stdin_fastqc.html results/qc/index-read-qc/I2.html 2>> {log} mv results/qc/index-read-qc/stdin_fastqc.zip results/qc/index-read-qc/I2.zip 2>> {log} """ |
32 33 | wrapper: "v1.25.0/bio/fastp" |
54 55 56 57 | shell: """ mosdepth {params.prefix} {input.bam} --fast-mode --threads {threads} 2> {log} """ |
72 73 74 75 | shell: """ samtools flagstat {input.bam} > {output} 2> {log} """ |
88 89 90 91 | shell: """ qualimap bamqc -bam {input.bam} -outdir {output.folder} 2> {log} """ |
102 103 104 105 | shell: """ bcftools stats {input.vcf} > {output.stats} 2> {log} """ |
124 125 | wrapper: "v2.2.1/bio/multiqc" |
13 14 | wrapper: "v1.17.4-17-g62b55d45/bio/bgzip" |
25 26 27 28 | shell: """ tabix {input.calls} 2> {log} """ |
41 42 43 44 | shell: """ bcftools merge --threads {threads} -o {output.vcf} -O v {input.vcfs} --force-samples 2> {log} """ |
57 58 59 60 | shell: """ bcftools merge --threads {threads} -o {output.vcf} -O v {input.vcfs} 2> {log} """ |
73 74 75 76 | shell: """ bcftools merge --threads {threads} -o {output.vcf} -O v {input.vcfs} 2> {log} """ |
85 86 | wrapper: "v1.17.4-17-g62b55d45/bio/bgzip" |
97 98 99 100 | shell: """ tabix {input.vcfgz} 2> {log} """ |
113 114 115 116 | shell: """ bcftools merge -o {output.vcf} -Ov {input.vcf} 2> {log} """ |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | __author__ = "William Rowell" __copyright__ = "Copyright 2020, William Rowell" __email__ = "wrowell@pacb.com" __license__ = "MIT" from snakemake.shell import shell extra = snakemake.params.get("extra", "") log = snakemake.log_fmt_shell(stdout=True, stderr=True) shell( """ (bgzip -c {extra} --threads {snakemake.threads} \ {snakemake.input} > {snakemake.output}) {log} """ ) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 | __author__ = "Sebastian Kurscheid" __copyright__ = "Copyright 2019, Sebastian Kurscheid" __email__ = "sebastian.kurscheid@anu.edu.au" __license__ = "MIT" from snakemake.shell import shell import re extra = snakemake.params.get("extra", "") adapters = snakemake.params.get("adapters", "") log = snakemake.log_fmt_shell(stdout=True, stderr=True) # Assert input n = len(snakemake.input.sample) assert ( n == 1 or n == 2 ), "input->sample must have 1 (single-end) or 2 (paired-end) elements." # Input files if n == 1: reads = "--in1 {}".format(snakemake.input.sample) else: reads = "--in1 {} --in2 {}".format(*snakemake.input.sample) # Output files trimmed_paths = snakemake.output.get("trimmed", None) if trimmed_paths: if n == 1: trimmed = "--out1 {}".format(snakemake.output.trimmed) else: trimmed = "--out1 {} --out2 {}".format(*snakemake.output.trimmed) # Output unpaired files unpaired = snakemake.output.get("unpaired", None) if unpaired: trimmed += f" --unpaired1 {unpaired} --unpaired2 {unpaired}" else: unpaired1 = snakemake.output.get("unpaired1", None) if unpaired1: trimmed += f" --unpaired1 {unpaired1}" unpaired2 = snakemake.output.get("unpaired2", None) if unpaired2: trimmed += f" --unpaired2 {unpaired2}" # Output merged PE reads merged = snakemake.output.get("merged", None) if merged: if not re.search(r"--merge\b", extra): raise ValueError( "output.merged specified but '--merge' option missing from params.extra" ) trimmed += f" --merged_out {merged}" else: trimmed = "" # Output failed reads failed = snakemake.output.get("failed", None) if failed: trimmed += f" --failed_out {failed}" # Stats html = "--html {}".format(snakemake.output.html) json = "--json {}".format(snakemake.output.json) shell( "(fastp --thread {snakemake.threads} " "{extra} " "{adapters} " "{reads} " "{trimmed} " "{json} " "{html} ) {log}" ) |
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | __author__ = "Julian de Ruiter" __copyright__ = "Copyright 2017, Julian de Ruiter" __email__ = "julianderuiter@gmail.com" __license__ = "MIT" from os import path from snakemake.shell import shell extra = snakemake.params.get("extra", "") # Set this to False if multiqc should use the actual input directly # instead of parsing the folders where the provided files are located use_input_files_only = snakemake.params.get("use_input_files_only", False) if not use_input_files_only: input_data = set(path.dirname(fp) for fp in snakemake.input) else: input_data = set(snakemake.input) output_dir = path.dirname(snakemake.output[0]) output_name = path.basename(snakemake.output[0]) log = snakemake.log_fmt_shell(stdout=True, stderr=True) shell( "multiqc" " {extra}" " --force" " -o {output_dir}" " -n {output_name}" " {input_data}" " {log}" ) |
Support
Do you know this workflow well? If so, you can
request seller status , and start supporting this workflow.
Created: 1yr ago
Updated: 1yr ago
Maitainers:
public
URL:
https://github.com/sanjaynagi/AmpSeeker
Name:
ampseeker
Version:
v0.1.0
Downloaded:
0
Copyright:
Public Domain
License:
None
Keywords:
- Future updates
Related Workflows

ENCODE pipeline for histone marks developed for the psychENCODE project
psychip pipeline is an improved version of the ENCODE pipeline for histone marks developed for the psychENCODE project.
The o...

Near-real time tracking of SARS-CoV-2 in Connecticut
Repository containing scripts to perform near-real time tracking of SARS-CoV-2 in Connecticut using genomic data. This pipeli...

snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...

ATLAS - Three commands to start analyzing your metagenome data
Metagenome-atlas is a easy-to-use metagenomic pipeline based on snakemake. It handles all steps from QC, Assembly, Binning, t...
raw sequence reads
Genome assembly
Annotation track
checkm2
gunc
prodigal
snakemake-wrapper-utils
MEGAHIT
Atlas
BBMap
Biopython
BioRuby
Bwa-mem2
cd-hit
CheckM
DAS
Diamond
eggNOG-mapper v2
MetaBAT 2
Minimap2
MMseqs
MultiQC
Pandas
Picard
pyfastx
SAMtools
SemiBin
Snakemake
SPAdes
SqueezeMeta
TADpole
VAMB
CONCOCT
ete3
gtdbtk
h5py
networkx
numpy
plotly
psutil
utils
metagenomics

RNA-seq workflow using STAR and DESeq2
This workflow performs a differential gene expression analysis with STAR and Deseq2. The usage of this workflow is described ...

This Snakemake pipeline implements the GATK best-practices workflow
This Snakemake pipeline implements the GATK best-practices workflow for calling small germline variants. The usage of thi...