🍄 Qiime2 ITS classifiers for the UNITE database

public 1yr ago Version: v9.0-v25.07.2023-qiime2- 0 bookmarks

View Workflow

Help improve this workflow!

This workflow has been published but could be further improved with some additional meta data:

Keyword(s) in categories input, output, operation, topic

You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .

A pipeline to build Qiime2 taxonomy classifiers for the UNITE database .

Download a pre-trained classifier here! 🎁

Running Snakemake workflow

Set up:

Install Mambaforge and configure Bioconda .
Install the version of Qiime2 you want using the recomended environment name. (For a faster install, you can replace conda with mamba .)
Install Snakemake into an environment, then activate that environment.

Configure:

Open up config/config.yaml and configure it to your liking. (For example, you may need to update the name of your Qiime2 environment.)

Run:

snakemake --cores 8 --use-conda --resources mem_mb=10000

This takes about 15 hours on my machine

Run on a slurm cluster:

More specifically, The University of Florida HiPerGator supercomputer, with access generously provided by the Kawahara Lab !

screen # We connect to a random login node, so we may not be able...
screen -r # to reconnect with this later on.
snakemake --jobs 12 --slurm \
 --use-envmodules --rerun-incomplete --latency-wait 10 \
 --default-resources slurm_account=kawahara slurm_partition=hpg-milan

Reports:

snakemake --report results/report.html
snakemake --forceall --dag --dryrun | dot -Tpdf > results/dag.pdf

Code Snippets

shell:
    """
    mkdir -p downloads

    # Version 9 update. Get DOIs from here: https://unite.ut.ee/repository.php
    # To get URLs you can download directly, plug them into this API:
    # https://api.plutof.ut.ee/v1/public/dois/?format=api&identifier=10.15156/BIO/2483915

    # 9.0	2023-07-18	Fungi	19 051	143 384	Current	https://doi.org/10.15156/BIO/2938079
    wget -qO- https://files.plutof.ut.ee/public/orig/FB/78/FB78E30E44793FB02E5A4D3AE18EB4A6621A2FAEB7A4E94421B8F7B65D46CA4A.tgz | \
      tar xz -C downloads --strip-components 1 # sh_qiime_release_25.07.2023.tgz      # normal

    # 9.0	2023-07-18	Fungi	19 051	187 443	Current	https://doi.org/10.15156/BIO/2938080
    wget -qO- https://files.plutof.ut.ee/public/orig/37/71/3771274B094D9CA6252DF01359756B60A2FBEEF87854CC01C2577182DBB123C7.tgz | \
      tar xz -C downloads --strip-components 1 # sh_qiime_release_s_25.07.2023.tgz    # add s for 97% singletons

    # 9.0	2023-07-18	All eukaryotes	19 451	215 454	Current	https://doi.org/10.15156/BIO/2938081
    wget -qO- https://files.plutof.ut.ee/public/orig/1C/C2/1CC2477429B3A703CC1C7A896A7EFF457BB0D471877CB8D18074959DBB630D10.tgz | \
      tar xz -C downloads --strip-components 1 # sh_qiime_release_all_25.07.2023.tgz  # add all for Euks

    # 9.0	2023-07-18	All eukaryotes	19 451	307 276	Current	https://doi.org/10.15156/BIO/2938082
    wget -qO- https://files.plutof.ut.ee/public/orig/7D/0C/7D0C329980D2C644CC157A8C76BBD11E78DB8B13286C98D4FEB6ECAC79D67D6F.tgz | \
      tar xz -C downloads --strip-components 1 # sh_qiime_release_s_all_25.07.2023.tgz # and s and all for 97% Euks singletons

    """

SnakeMake From line 40 of workflow/Snakefile

shell: "qiime tools import \
        --type FeatureData[Sequence] \
        --input-format MixedCaseDNAFASTAFormat \
        --input-path {input}/sh_refs_qiime_{wildcards.ver}_{wildcards.id}_{wildcards.type}{wildcards.date}_dev.fasta \
        --output-path {output}"

SnakeMake QIIME2.0 From line 72 of workflow/Snakefile

shell: "qiime tools import \
        --type FeatureData[Taxonomy] \
        --input-format HeaderlessTSVTaxonomyFormat \
        --input-path {input}/sh_taxonomy_qiime_{wildcards.ver}_{wildcards.id}_{wildcards.type}{wildcards.date}_dev.txt \
        --output-path {output}"

SnakeMake QIIME2.0 From line 84 of workflow/Snakefile

shell: "qiime feature-classifier fit-classifier-naive-bayes \
        --p-classify--chunk-size 10000 \
        --i-reference-reads    {input.ref} \
        --i-reference-taxonomy {input.tax} \
        --o-classifier {output}"