Snakemake pipeline for 16S, 18S and ITS metagenomics using qiime2

public 1yr ago 0 bookmarks

View Workflow

snakemake-workflow-qiime2 — View Workflow

Help improve this workflow!

This workflow has been published but could be further improved with some additional meta data:

Keyword(s) in categories input, output, operation, topic

You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .

This workflow performs microbiome analysis using QIIME2 and PICRUSt2 for functional annotation. Functional annotation is only performed for 16S amplicon sequences.

Please note the following:

I analyze my data with qiime2 version 2020.6 so that's what I have tested this pipeline with.
I have not tested the pipeline using deblur or vsearch even though I have implemented them, so use these methods at your own risk. I have tested the dada2 pipeline and it works great. Hence, I advice you run the dada2 pipeline.
I provide 3 Snakefiles: Snakefile (16S, 18S and ITS), Snakefile.16S (16S and 18S) and Snakefile.ITS (ITS alone).
I will be be happy to fix any bug that you migth find, so please feel free to reach out to me at obadbotanist@yahoo.com

Please do not forget to cite the authors of the tools used.

The Pipeline does the following:

It renames your input files (optional) so that it conforms with the required input format i.e. 01.raw_data/{SAMPLE}_R{1|2}.fastq.gz for paired-end or 01.raw_data/{SAMPLE}.fastq.gz for single-end reads
Quality checks and summarizes the input reads using FASTQC and MultiQC
Imports the reads into Qiime2
Quality checks the input artifact using Qiime2
Trims the imported arfifact for primers and adaptors using cutadapt implemented in qiime2
Quality checks the trimmed input artifact using Qiime2
Denoises (filtering, chimera checking and ASV table generation) the reads using dada2 (default)
Asigns taxonomy to the representative sequences using sci-kit learn and your provided database. see the folder Create__DB for a pipeline that can be used to create the required databases
Excludes singletons and non-target taxa such as Mitochondria, Chloroplast etc. The taxa to be filtered can be set from within the Snakefile file by editing the "taxa2filter" variable.
Excludes rare ASV i.e. ASVs with sequences less than 0.005% of the total number of sequences (Navas-Molina et al. 2013)
Builds a phylogenetic tree
Generates sample and group taxa plots
Performs core diversity analysis i.e alpha and betadiversity analysis along with the related statistical tests
Performs differential abundance testing using ANCOM
Perform functional anaotation using PICRUSt2 for 16S sequences.

Authors

Olabiyi Obayomi (@olabiyi)

Before you start, make sure you have miniconda, qiime2, picrust2 and snakemake installed. You can optionally install my bioinfo environment which contains snakemake and many other useful bioinformatics tools.

STEP 1: Install miniconda and qiime 2 (optional)

See instructions on how to do so here

STEP 2: Install picrust2 (optional)

See instuctions on how to do so here

STEP 3: Install Snakemake in a separate conda environment or install my bioinfo environment which contains snakemake(optional)

Install Snakemake using conda :

conda create -c bioconda -c conda-forge -n snakemake snakemake

For installation details, see the instructions in the Snakemake documentation .

Step 4: Obtain a copy of this workflow

git clone https://github.com/olabiyi/sankemake-workflow-qiime2.git

Step 5: Configure workflow

Configure the workflow according to your needs by editing the files in the config/ folder. Adjust config.yaml to configure the workflow execution, and samples.tsv to specify your sample setup. Make sure your sample.tsv file does not contain any error as this could lead to potentially losing all of your data when renaming the files.

Step 6: Install bioinfo environment (Optional)

If you would like to use my bioinfo environment:

conda env create -f envs/bioinfo.yaml

Step 7: Running the pipeline

Activate the conda environment containing snakemake

source activate bioinfo

Set-up the mapping file and raw data directories

[ -d 00.mapping/ ] || mkdir 00.mapping/
[ -d 01.raw_data/ ] || mkdir 01.raw_data/

Move your raw data to the 01.raw_data directory

# Delete anything that may be present in the rawdata directory
rm -rf mkdir 01.raw_data/*
# Move your read files to the rawa data directory - Every sample in its own directory - see the example in this repo
mv location/rawData/16S/* 01.raw_data/

Create metadata files

You need two metadata files: a general metadata file called metadata.tsv and a treatment-treatment.tsv file. Thes files can be createda nd editted with excel. Make sure to save the names as metadata.tsv and treatment-metadata.tsv . The treatment-metadata is used for makeing grouped bar plots while the metadata.tsv is used for corediversity analysis and general statistics. Please see the examples provided in this repository for specific formats.

Create the required MANIFEST FILE

# Get the sample names. This assumes that the folders in the 01.raw_data/ directory are named by sample.
SAMPLES=($(ls -1 01.raw_data/ | grep -Ev "MANIFEST|seq" - |sort -V))
# Get sample names for "samples" field in the config file
(echo -ne '[';echo ${SAMPLES[*]} | sed -E 's/ /, /g' | sed -E 's/(\w+)/"\1"/g'; echo -e ']') 
# Generate the MANIFEST file
(echo "sample-id,absolute-filepath,direction"; \
for SAMPLE in ${SAMPLES[*]}; do echo -ne "${SAMPLE},$PWD/01.raw_data/${SAMPLE}/${SAMPLE}_R1.fastq.gz,forward\n${SAMPLE},$PWD/01.raw_data/${SAMPLE}/${SAMPLE}_R2.fastq.gz,reverse\n";done) \
> 01.raw_data/MANIFEST

Create config/sample.tsv file

(echo -ne "SampleID\tType\tOld_name\tNew_name\n"; \
for SAMPLE in ${SAMPLES[*]}; do echo -ne "${SAMPLE}\tForward\t01.raw_data/${SAMPLE}/${SAMPLE}_R1.fastq.gz\t01.raw_data/${SAMPLE}/${SAMPLE}_R1.fastq.gz\n${SAMPLE}\tReverse\t01.raw_data/${SAMPLE}/${SAMPLE}_R2.fastq.gz\t01.raw_data/${SAMPLE}/${SAMPLE}_R2.fastq.gz\n";done) \
> config/sample.tsv

gzip fastq files if they are not already gziped as required by this pipeline. It also helps to save disk memory.

find 01.raw_data/ -type f -name '*.fastq' -exec gzip {} \;

Executing the Workflow

import reads and check their quality to determine trunc lengths for dada2

snakemake -pr --cores 10 --keep-going "04.QC/trimmed_reads_qual_viz.qzv" "04.QC/raw_reads_qual_viz.qzv"

Denoise reads - chimera removal, reads merging, quality trimming and ASV feature table generation take a good look at 05.Denoise_reads/denoise_stats.qzv to see if you didn't lose too many reads and if the reads merged well. If the denoizing was not sucessful, adjust the parameters you set for dada2 and then re-run

snakemake -pr --cores 15 --keep-going "05.Denoise_reads/denoise_stats.qzv" "05.Denoise_reads/table_summary.qzv" "05.Denoise_reads/representative_sequences.qzv"

Filter taxa - Examine "08.Filter_feature_table/taxa_filtered_table.qzv" to determine the threshold for filtering out rare taxa

snakemake -pr --cores 15 --keep-going "06.Assign_taxonomy/taxonomy.qzv" "07.Build_phylogenetic_tree/rooted-tree.qza" "08.Filter_feature_table/taxa_filtered_table.qzv"

Filter rare taxa and make relative abundance bar plots

snakemake -pr --cores 15 --keep-going "08.Filter_feature_table/filtered_table.qzv" "09.Taxa_bar_plots/group-bar-plot.qzv" "09.Taxa_bar_plots/samples-bar-plots.qzv"

Get the rarefation depth for diversity analysis after viewing "08.Filter_feature_table/filtered_table.qzv" and run the complete pipeline

snakemake -pr --cores 15 --keep-going

Export the following files for downstream analysis with R Scripts

05.Denoise_reads/denoise_stats.qza -> Denoising statistics
06.Assign_taxonomy/taxonomy.qza -> Taxonomy assignments of the representative sequences
07.Build_phylogenetic_tree/rooted-tree.qza -> Phylogenetic tree for phylogenetic alphadiversity measurements
08.Filter_feature_table/filtered_table.qza -> ASV table
10.Diversity_analysis_{RAREFACTION_DEPTH}/bray_curtis_pcoa_results.qza -> Bray Curtis pcoa results
10.Diversity_analysis_{RAREFACTION_DEPTH}/bray_curtis_distance_matrix.qza -> Bray Curtis distance matrix
15.Function_annotation/picrust2_out_pipeline/pathways_out -> Picrust2 pathway output
15.Function_annotation/picrust2_out_pipeline/KO_metagenome_out -> Picrust2 KO / genes output

Code Snippets

shell:
    """
     [ -d logs/ ] || mkdir -p logs/
     cd logs/
     for RULE in {RULES}; do
      [ -d ${{RULE}}/ ] || mkdir -p ${{RULE}}/
     done
    """

SnakeMake From line 100 of main/Snakefile

run:
    for old,new in zip(metadata.Old_name,metadata.New_name):
        shell("[ -f {new} ] || mv {old} {new}".format(old=old, new=new))

SnakeMake From line 123 of main/Snakefile

run:
    for old,new in zip(metadata.Old_name,metadata.New_name):
        shell("mv {old} {new}".format(old=old, new=new))

SnakeMake From line 133 of main/Snakefile

shell:
    """
    set +u
    {params.conda_activate}
    {params.PERL5LIB}
    set -u

      {params.program} --outdir {params.out_dir}/ \
         --threads {params.threads} {input.forward} {input.rev}

    """

SnakeMake From line 155 of main/Snakefile

shell:
    """
    set +u
    {params.conda_activate}
    {params.PERL5LIB}
    set -u

      {params.program} \
          --interactive \
          -f {params.out_dir} \
          -o {params.out_dir}
    """

SnakeMake From line 182 of main/Snakefile

shell:
    """
    set +u
    {params.conda_activate}
    set -u

    qiime tools import \
         --type 'SampleData[PairedEndSequencesWithQuality]' \
         --input-path {input.manifest_file} \
         --output-path {output} \
         --input-format PairedEndFastqManifestPhred33
    """

SnakeMake QIIME2.0 From line 211 of main/Snakefile

shell:
    """
    set +u
    {params.conda_activate}
    set -u

    qiime cutadapt trim-paired \
         --i-demultiplexed-sequences {input} \
         --p-cores {params.cores} \
         --p-front-f {params.forward_primer} \
         --p-front-r {params.reverse_primer} \
         --o-trimmed-sequences {output} \
         --verbose

    """

SnakeMake Cutadapt QIIME2.0 From line 234 of main/Snakefile

shell:
    """
    set +u
    {params.conda_activate}
    set -u

    qiime tools import \
         --type 'SampleData[SequencesWithQuality]' \
         --input-path {input.manifest_file} \
         --output-path {output} \
         --input-format SingleEndFastqManifestPhred33
    """

SnakeMake QIIME2.0 From line 264 of main/Snakefile

shell:
    """
    set +u
    {params.conda_activate}
    set -u

    qiime cutadapt trim-single \
         --i-demultiplexed-sequences {input} \
         --p-cores {params.cores} \
         --p-front {params.forward_primer} \
         --o-trimmed-sequences {output} \
         --verbose

    """

SnakeMake Cutadapt QIIME2.0 From line 286 of main/Snakefile

shell:
    """
    {params.conda_activate}

     [ -d {params.out_dir} ] ||  mkdir -p {params.out_dir}
     # Merge reads then delete unnecessary files
     {params.program} \
        -f {input.forward} \
        -r {input.rev} \
        -j {params.threads} \
        -o {params.out_dir}/{wildcards.sample} \
        -m {params.max} \
        -n {params.min} \
        -t {params.min_trim} > {log} 2>&1


     rm -rf \
       {params.out_dir}/{wildcards.sample}.discarded.fastq \
       {params.out_dir}/{wildcards.sample}.unassembled.forward.fastq \
       {params.out_dir}/{wildcards.sample}.unassembled.reverse.fastq 

     mv {params.out_dir}/{wildcards.sample}.assembled.fastq {params.out_dir}/{wildcards.sample}.fastq

     # gzip to save memory

     gzip {params.out_dir}/{wildcards.sample}.fastq

   """

SnakeMake From line 358 of main/Snakefile

shell:
    """
    set +u
    {params.conda_activate}
    set -u

    qiime tools import \
         --type 'SampleData[SequencesWithQuality]' \
         --input-path {input.manifest_file} \
         --output-path {output} \
         --input-format SingleEndFastqManifestPhred33
    """

SnakeMake QIIME2.0 From line 400 of main/Snakefile

shell:
    """
    set +u
    {params.conda_activate}
    set -u

    qiime cutadapt trim-single \
         --i-demultiplexed-sequences {input} \
         --p-cores {params.cores} \
         --p-front {params.forward_primer} \
         --o-trimmed-sequences {output} \
         --verbose
    """

SnakeMake Cutadapt QIIME2.0 From line 422 of main/Snakefile

shell:
    """
    set +u
    {params.conda_activate}
    set -u

    qiime tools import \
         --type 'SampleData[PairedEndSequencesWithQuality]' \
         --input-path {input.manifest_file} \
         --output-path {output} \
         --input-format PairedEndFastqManifestPhred33
    """

SnakeMake QIIME2.0 From line 449 of main/Snakefile

shell:
    """
    set +u
    {params.conda_activate}
    set -u

    qiime vsearch join-pairs \
         --i-demultiplexed-seqs {input} \
         --p-truncqual {params.trunc_qual} \
         --p-minlen {params.min_len} \
         --p-maxns {params.min_ns} \
         --p-minmergelen {params.men_merge_len} \
         --p-maxmergelen {params.max_merge_len} \
         --o-joined-sequences {output}
    """

SnakeMake QIIME2.0 VSEARCH From line 469 of main/Snakefile

shell:
    """
    set +u
    {params.conda_activate}
    set -u

    qiime cutadapt trim-single \
         --i-demultiplexed-sequences {input} \
         --p-cores {params.cores} \
         --p-front {params.forward_primer} \
         --o-trimmed-sequences {output} \
         --verbose
    """

SnakeMake Cutadapt QIIME2.0 From line 494 of main/Snakefile

shell:
    """
    set +u
    {params.conda_activate}
    set -u

    qiime demux summarize \
        --p-n 10000 \
        --i-data {input} \
        --o-visualization {output}
"""

SnakeMake QIIME2.0 From line 525 of main/Snakefile

shell:
    """
    set +u
    {params.conda_activate}
    set -u

    qiime demux summarize \
        --p-n 10000 \
        --i-data {input} \
        --o-visualization {output}
    """

SnakeMake QIIME2.0 From line 548 of main/Snakefile

shell:
    """
    set +u
    {params.conda_activate}
    set -u

    qiime demux summarize \
        --p-n 10000 \
        --i-data {input} \
        --o-visualization {output}
    """

SnakeMake QIIME2.0 From line 568 of main/Snakefile

shell:
    """
    set +u
    {params.conda_activate}
    set -u

    MODE={params.mode}

    if [ ${{MODE}} == "paired" ];then

        # Paired end
        qiime dada2 denoise-paired \
            --i-demultiplexed-seqs {input} \
            --o-table {output.table} \
            --o-representative-sequences {output.rep_seqs} \
            --o-denoising-stats {output.stats} \
            --p-trunc-len-f  {params.trun_len_forward} \
            --p-trunc-len-r {params.trun_len_reverse} \
            --p-trim-left-f {params.trim_len_forward} \
            --p-trim-left-r {params.trim_len_reverse} \
            --p-max-ee-f {params.max_forward_err} \
            --p-max-ee-r {params.max_reverse_err} \
            --p-n-threads {params.threads} 

    else

        # Single end
        qiime dada2 denoise-single \
            --i-demultiplexed-seqs {input} \
            --o-table {output.table} \
            --o-representative-sequences {output.rep_seqs} \
            --o-denoising-stats {output.stats} \
            --p-trunc-len  {params.trun_len_forward} \
            --p-trim-left {params.trim_len_forward} \
            --p-max-ee {params.max_forward_err} \
            --p-n-threads {params.threads}

    fi

    """

SnakeMake QIIME2.0 From line 603 of main/Snakefile

        shell:
            """
            set +u
            {params.conda_activate}
            set -u

            # Initial quality filtering process based on quality scores
            qiime quality-filter q-score \
              --i-demux {input} \
              --o-filtered-sequences {output.filtered_reads} \
              --o-filter-stats {output.filter_stats}

            # Tabulate the filter statistics
            qiime metadata tabulate \
	           --m-input-file {output.filter_stats} \
 	           --o-visualization {output.filter_stats_viz}


            # # Next, the Deblur workflow is applied using the qiime deblur denoise-16S method.
            # This method requires one parameter that is used in quality filtering,
            # --p-trim-length n which truncates the sequences at position n.
            #  In general, the Deblur developers recommend setting this value to a length 
            # where the median quality score begins to drop too low

            qiime deblur denoise-16S \
              --i-demultiplexed-seqs {output.filtered_reads} \
              --p-trim-length {params.trunc_length} \
              --o-representative-sequences {params.rep_seqs} \
              --o-table {ouput.table} \
              --p-sample-stats \
              --o-stats {output.stats}
            """

SnakeMake QIIME2.0 From line 662 of main/Snakefile

    shell:
        """
        set +u
        {params.conda_activate}
        set -u

        qiime feature-table summarize \
	       --i-table {input} \
	       --o-visualization {output}
        """

SnakeMake QIIME2.0 From line 704 of main/Snakefile

shell:
    """
    set +u
    {params.conda_activate}
    set -u

    qiime feature-table tabulate-seqs \
       --i-data {input} \
       --o-visualization {output}
    """

SnakeMake QIIME2.0 From line 724 of main/Snakefile

shell:
    """
    set +u
    {params.conda_activate}
    set -u

    # Visualize dada2 denoise stats
    qiime metadata tabulate \
      --m-input-file {input} \
      --o-visualization {output}
"""

SnakeMake QIIME2.0 From line 745 of main/Snakefile

shell:
    """
    set +u
    {params.conda_activate}
    set -u

     # Visualize deblur stats
     qiime deblur visualize-stats \
         --i-deblur-stats {output.stats} \
         --o-visualization {output}
    """

SnakeMake QIIME2.0 From line 765 of main/Snakefile

shell:
    """
    set +u
    {params.conda_activate}
    set -u

     # Assign taxonomy
     qiime feature-classifier classify-sklearn \
       --i-classifier {input.classifier} \
       --i-reads {input.rep_seqs} \
       --o-classification {output.raw} \
       --p-n-jobs {params.threads}

     # Tabulate taxonomy

     qiime metadata tabulate \
       --m-input-file {output.raw} \
       --o-visualization {output.viz}
    """

SnakeMake QIIME2.0 From line 792 of main/Snakefile

shell:
    """
    set +u
    {params.conda_activate}
    set -u

     # Run the make phylogenetic tree pipeline
     # 1. Perform multiple sequence alignment with mafft
     # 2. Mask alignment
     # 3. Make tree with fastree
     # 4. Root the tree


    qiime phylogeny align-to-tree-mafft-fasttree \
       --i-sequences {input} \
       --o-alignment {output.alignment} \
       --o-masked-alignment {output.masked_alignment} \
       --o-tree {output.unrooted_tree} \
       --o-rooted-tree {output.rooted_tree} \
       --p-n-threads {params.threads}
    """

SnakeMake QIIME2.0 MAFFT API (EBI) From line 828 of main/Snakefile

shell:
    """
    set +u
    {params.conda_activate}
    set -u

    # Remove singletons
    qiime feature-table filter-features \
      --i-table {input} \
      --p-min-frequency 2 \
      --o-filtered-table {output.table_raw}

    qiime feature-table summarize \
      --i-table {output.table_raw} \
      --o-visualization {output.table_viz}
    """

SnakeMake QIIME2.0 From line 886 of main/Snakefile

shell:
    """
    set +u
    {params.conda_activate}
    set -u

    # Filter out non-target assigments
    if [ {params.amplicon} == "ITS" ]; then

        # Retain only Fungi sequences
        qiime taxa filter-table \
          --i-table {input.table} \
          --i-taxonomy  {input.taxonomy} \
          --p-include  {params.taxa2exclude} \
          --o-filtered-table {output.table_raw}


    else

        # Filter out non-target assigments
        qiime taxa filter-table \
          --i-table {input.table} \
          --i-taxonomy  {input.taxonomy} \
          --p-exclude  {params.taxa2exclude} \
          --o-filtered-table {output.table_raw}

    fi

    # To figure out the total number of sequences ("Total freqency") 
    # to be used to determine the minuminum frequency for filtering out
    # rare taxa. to calculate the multiply the total number of sequences
    # by 0.005
    qiime feature-table summarize \
      --i-table {output.table_raw} \
      --o-visualization {output.table_viz}
    """

SnakeMake QIIME2.0 From line 920 of main/Snakefile

shell:
    """
    set +u
    {params.conda_activate}
    set -u

    # Removing rare otus / features with abundance less the 0.005%
    qiime feature-table filter-features \
      --i-table {input} \
      --p-min-frequency {params.minumum_frequency} \
      --o-filtered-table {output.table_raw}

    qiime feature-table summarize \
      --i-table {output.table_raw} \
      --o-visualization {output.table_viz}
    """

SnakeMake QIIME2.0 From line 969 of main/Snakefile

shell:
    """
    set +u
    {params.conda_activate}
    set -u


   # Samples bar plot
   qiime taxa barplot \
     --i-table {input.table} \
     --i-taxonomy {input.taxonomy} \
     --m-metadata-file {input.metadata} \
     --o-visualization  {output}
   """

SnakeMake QIIME2.0 From line 1001 of main/Snakefile

shell:
    """
    set +u
    {params.conda_activate}
    set -u

    # Group feature table by group in metadata file
    qiime feature-table group \
        --i-table  {input.table}  \
        --p-axis sample \
        --m-metadata-file {input.metadata} \
        --m-metadata-column '{params.category}' \
        --p-mode {params.mode} \
        --o-grouped-table {output.grouped_table}

    # Grouped bar plot
    qiime taxa barplot \
      --i-table {output.grouped_table} \
      --i-taxonomy {input.taxonomy} \
      --m-metadata-file {params.metadata} \
      --o-visualization  {output.bar_plot}
  """

SnakeMake QIIME2.0 From line 1032 of main/Snakefile

shell:
    """
    set +u
    {params.conda_activate}
    set -u

    qiime diversity core-metrics-phylogenetic \
       --p-sampling-depth {params.depth} \
       --i-table {input.table} \
       --i-phylogeny {input.tree} \
       --m-metadata-file {input.metadata} \
       --p-n-jobs-or-threads 'auto' \
       --output-dir core_diversity/  && \
       mv core_diversity/* {diversity_dir}/ && \
       rm -rf core_diversity/
    """

SnakeMake QIIME2.0 From line 1078 of main/Snakefile

shell:
    """
    set +u
    {params.conda_activate}
    set -u

    qiime diversity alpha-rarefaction \
       --p-max-depth {params.depth} \
       --i-table {input.table} \
       --i-phylogeny {input.tree} \
       --m-metadata-file {input.metadata} \
       --o-visualization {output}
    """

SnakeMake QIIME2.0 From line 1109 of main/Snakefile

shell:
    """
    set +u
    {params.conda_activate}
    set -u

    for metric in {alpha_diversity_metrics}; do

        qiime diversity alpha-group-significance \
           --i-alpha-diversity  {diversity_dir}/${{metric}}_vector.qza \
           --m-metadata-file {input.metadata} \
           --o-visualization {diversity_dir}/alpha_${{metric}}_significance.qzv

    done
    """

SnakeMake QIIME2.0 From line 1136 of main/Snakefile

shell:
    """
    set +u
    {params.conda_activate}
    set -u

    for distance in {distance_matrices}; do

        qiime diversity beta-group-significance \
           --i-distance-matrix  {diversity_dir}/${{distance}}_distance_matrix.qza \
           --m-metadata-file {input.metadata} \
           --m-metadata-column {params.category} \
           --o-visualization {diversity_dir}/beta_${{distance}}_significance.qzv

    done

    """

SnakeMake QIIME2.0 From line 1166 of main/Snakefile

shell:
    """
    set +u
    {params.conda_activate}
    set -u

    TAXON_LEVELS=(2 3 4 5 6)

    for TAXON_LEVEL in ${{TAXON_LEVELS[*]}}; do

        # Collapse ASV table at a taxonomy level of interest
        qiime taxa collapse \
            --i-table {input.table} \
            --i-taxonomy {input.taxonomy} \
            --p-level ${{TAXON_LEVEL}} \
            --o-collapsed-table {params.out_dir}/L${{TAXON_LEVEL}}-filtered_table.qza

    done
    """

SnakeMake QIIME2.0 From line 1200 of main/Snakefile

shell:
    "cp {input} {params.out_dir}/  && "
    "mv {params.out_dir}/{params.basename} {output}"

SnakeMake From line 1231 of main/Snakefile

shell:
    """
    set +u
    {params.conda_activate}
    set -u

    qiime composition add-pseudocount \
        --i-table {input} \
        --o-composition-table {output}

    """

SnakeMake QIIME2.0 From line 1246 of main/Snakefile

shell:
    """
    set +u
    {params.conda_activate}
    set -u

    # Apply ANCOM to identify ASV/OTUs that differ in abundance
    qiime composition ancom \
        --i-table {input.table} \
        --m-metadata-file {input.metadata} \
        --m-metadata-column {params.category} \
        --o-visualization {output}

    """

SnakeMake QIIME2.0 From line 1270 of main/Snakefile

shell:
    """
    set +u
    {params.conda_activate}
    set -u

    # Export feature table
    qiime tools export \
       --input-path {input.feature_table} \
       --output-path {params.out_dir}/


    # Export representative sequences
    qiime tools export --input-path {input.rep_seqs} --output-path {params.out_dir}/ && \
    mv {params.out_dir}/dna-sequences.fasta  {output.rep_seqs}

    # Export taxonomy
    qiime tools export \
         --input-path {input.taxonomy_table} \
         --output-path {params.out_dir}/


    # ---------------------- Add taxonomy to feature table ------------------------ #

    # Creating a TSV BIOM table
    biom convert \
        -i {params.out_dir}/feature-table.biom \
        -o {params.out_dir}/feature-table.tsv \
        --to-tsv

    # Next, we’ll need to modify the exported taxonomy file’s header before using it with BIOM software.

    # Before modifying that file, make a copy:
    cp {params.out_dir}/taxonomy.tsv {params.out_dir}/biom-taxonomy.tsv

    # Change the first line of biom-taxonomy.tsv (i.e. the header) to this:
    # Note that you’ll need to use tab characters in the header since this is a TSV file.
    #OTUID	taxonomy	confidence   

    (echo "#OTUID	taxonomy	confidence"; sed -e '1d' {params.out_dir}/biom-taxonomy.tsv) \
     > {params.out_dir}/tmp.tsv && \
     rm -rf {params.out_dir}/biom-taxonomy.tsv && \
     mv {params.out_dir}/tmp.tsv {params.out_dir}/biom-taxonomy.tsv 

    # Finally, add the taxonomy data to your .biom file:
    biom add-metadata \
         -i {params.out_dir}/feature-table.biom \
         -o {params.out_dir}/feature-table-with-taxonomy.biom \
         --observation-metadata-fp {params.out_dir}/biom-taxonomy.tsv \
         --sc-separated taxonomy

    # Creating a TSV BIOM table
    biom convert \
           -i  {params.out_dir}/feature-table-with-taxonomy.biom  \
           -o  {params.out_dir}/feature-table-with-taxonomy.biom.tsv \
           --to-tsv
    """

SnakeMake QIIME2.0 From line 1301 of main/Snakefile

shell:
    """
    set +u
    {params.conda_activate}
    set -u

    # Remove the temporary output directory if it already exists
    [ -d picrust2_out_pipeline/ ] && rm -rf picrust2_out_pipeline/

    # ---- Run picrust2 pipeline for function annotation -------- #
    picrust2_pipeline.py \
        -s {input.rep_seqs} \
        -i {input.feature_table} \
        -o picrust2_out_pipeline/ \
        -p {params.threads} && \
        mv picrust2_out_pipeline/* {params.out_dir}/ && \
        rmdir picrust2_out_pipeline/
    """

SnakeMake PICRUSt2 From line 1382 of main/Snakefile

shell:
    """
    set +u
    {params.conda_activate}
    set -u

    # ----- Annotate your enzymes, KOs and pathways by adding a description column ------#
    # EC
    add_descriptions.py -i {input.ec} -m EC -o {output.ec}

    # Metacyc Pathway
    add_descriptions.py -i {input.pathway} -m METACYC -o {output.pathway}

    # KO
    add_descriptions.py -i {input.ko} -m KO -o {output.ko} 

    # Unizip the metagenome contribution files - these files describe the micribes contribution the function profiles
    #find {params.outdir} -type f -name "*contrib.tsv.gz" -exec gunzip {{}} \;
    """

SnakeMake From line 1421 of main/Snakefile

shell:
    """
        # Create an empty file
        mkdir -p {params.outdir} && touch {output.ko}
    """

SnakeMake From line 1452 of main/Snakefile

ShowHide 38 more snippets with no or duplicated tags.

Comments

Support

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Created: 1yr ago

Updated: 1yr ago

Maitainers: public

URL: https://github.com/olabiyi/snakemake-workflow-qiime2

Name: snakemake-workflow-qiime2

Version: 1

Badge:

Insert copied code into your website to add a link to this workflow.

License: None

Keywords:

Cutadapt MAFFT API (EBI) PICRUSt2 QIIME2.0 Snakemake VSEARCH

Future updates

Related Workflows

psychip_snakemake — Show Details View Workflow

ENCODE pipeline for histone marks developed for the psychENCODE project

public

psychip pipeline is an improved version of the ENCODE pipeline for histone marks developed for the psychENCODE project. The o...

raw sequence reads Alignment Sequence alignment report macs2 ucsc-bedclip bedGraphToBigWig BEDTools BWA Picard SAMtools Snakemake

Free

Near-real time tracking of SARS-CoV-2 in Connecticut

public

Repository containing scripts to perform near-real time tracking of SARS-CoV-2 in Connecticut using genomic data. This pipeli...

JSON nextclade Augur Biopython FOCUS Pandas Snakemake bs4 epiweeks geopy matplotlib numpy pycountry pycountry-convert uszipcode

Free

cellranger-snakemake-gke — Show Details View Workflow

snakemake workflow to run cellranger on a given bucket using gke.

public

A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...

macs2 ucsc-bedclip bedGraphToBigWig BEDTools BWA Picard SAMtools Snakemake

Free

ATLAS - Three commands to start analyzing your metagenome data

public

Metagenome-atlas is a easy-to-use metagenomic pipeline based on snakemake. It handles all steps from QC, Assembly, Binning, t...

raw sequence reads Genome assembly Annotation track checkm2 gunc prodigal snakemake-wrapper-utils MEGAHIT Atlas BBMap Biopython BioRuby Bwa-mem2 cd-hit CheckM DAS Diamond eggNOG-mapper v2 MetaBAT 2 Minimap2 MMseqs MultiQC Pandas Picard pyfastx SAMtools SemiBin Snakemake SPAdes SqueezeMeta TADpole VAMB CONCOCT ete3 gtdbtk h5py networkx numpy plotly psutil utils metagenomics

Free

175

rna-seq-star-deseq2 — Show Details View Workflow

RNA-seq workflow using STAR and DESeq2

public

This workflow performs a differential gene expression analysis with STAR and Deseq2. The usage of this workflow is described ...

Free

dna-seq-gatk-variant-calling — Show Details View Workflow

This Snakemake pipeline implements the GATK best-practices workflow

public

This Snakemake pipeline implements the GATK best-practices workflow for calling small germline variants. The usage of thi...

VCF raw sequence reads Variant calling genetic variants gatk rust-bio-tools snakemake-wrapper-utils tabix BCFtools BWA FastQC MultiQC Pandas Picard SAMtools Snakemake Trimmomatic Variant Effect Predictor (VEP) common matplotlib numpy seaborn DNA

Free