Bash scripts for running QIIME 1.9 and a Snakemake workflow to automate it using singularity container images.

public 1yr ago 0 bookmarks

View Workflow

MOTIVATION: Qiime 1.9 has reached the end of its life ycle and thus no longer maintained by its developers. Moreover, QIIME 1.9 is infamously known to be very difficult to install due its many dependencies. Notwithstanding, I still find OTU picking a lot easier using qiime 1.9 than qiime 2. Moreover, I had developed a set of scripts for pathogen analysis using amplicon sequences by leveraging some outputs generated by QIIME 1.9. Which meant that my pathogen analysis scripts no longer work. To solve these problems, I decided to develop a QIMME 1.9 workflow that uses containers to alleviate the burden of software installation and snakemake for workflow orchestration and reproducibility.

It is currently a work under progress, though. The workflow performs microbiome analysis, pathogen analysis and functional annotation using QIIME 1.9 and PICRUSt2.

Code Snippets

shell:
    """
    # Get the stats on the sequences using seqkit
    seqkit stats {input} > temp.txt

     # Sort the sequence statistics
     (sed -n '1p' temp.txt; awk 'NR>1{{print}}' temp.txt | \
       sort -V -k1,1) > {output} \
       && rm temp.txt
    """

SnakeMake seqkit From line 39 of main/Snakefile

shell:
    "fastqc --outdir {params.out_dir}/ "
    "--threads {params.threads} {input.forward} {input.rev}"

SnakeMake FastQC From line 64 of main/Snakefile

shell:
    "multiqc --interactive -f {params.out_dir} -o {params.out_dir}"

SnakeMake MultiQC From line 80 of main/Snakefile

shell:
    "trimmomatic PE "
    "-threads {threads} "
    "{input.forward} {input.rev} "
    "{output.r1} {output.r1_unpaired} "
    "{output.r2} {output.r2_unpaired} "
    "{params.trimmer} > {log} 2>&1 "

SnakeMake Trimmomatic From line 106 of main/Snakefile

shell:
    """
    # -a TGGAATTCTCGGGTGCCAAGG sequence here is the small RNA 3' adaptor
    # https://support-docs.illumina.com/SHARE/AdapterSeq/Content/SHARE/AdapterSeq/TruSeq/RNA/Small-RNA/TruSeqSmallRNA.htm
    # -A CTGTCTCTTATACAC sequece here is the nextera transposa sequence 
    cutadapt \
          -g '{params.forward_primer}' \
          -G '{params.rev_primer}' \
          -o {output.forward_reads} \
          -p {output.rev_reads} \
          --minimum-length  {params.minimum_length} \
          --quality-cutoff  {params.quality_cutoff} \
         {input.forward_reads} {input.rev_reads} > {log} 2>&1

    """

SnakeMake Cutadapt From line 130 of main/Snakefile

shell:
    """ 
     trim_galore \
       -o {params.out_dir} \
       --fastqc  \
       --paired {input.forward} {input.rev} > {log} 2>&1

     #Rename the files
     #Fastq files
    mv {params.out_dir}/{wildcards.sample}_R1_val_1.fq.gz {params.out_dir}/{wildcards.sample}_R1.fastq.gz
    mv {params.out_dir}/{wildcards.sample}_R2_val_2.fq.gz {params.out_dir}/{wildcards.sample}_R2.fastq.gz
     #HTML files
    mv {params.out_dir}/{wildcards.sample}_R1_val_1_fastqc.html {params.out_dir}/{wildcards.sample}_R1_fastqc.html
    mv {params.out_dir}/{wildcards.sample}_R2_val_2_fastqc.html {params.out_dir}/{wildcards.sample}_R2_fastqc.html
     #Zip files
    mv {params.out_dir}/{wildcards.sample}_R1_val_1_fastqc.zip {params.out_dir}/{wildcards.sample}_R1_fastqc.zip
    mv {params.out_dir}/{wildcards.sample}_R2_val_2_fastqc.zip {params.out_dir}/{wildcards.sample}_R2_fastqc.zip
    """

SnakeMake Trim_Galore From line 161 of main/Snakefile

shell:
    """
      multiqc \
          --interactive \
          -f {params.in_dir} \
          -o {params.out_dir}  > {log} 2>&1
    """

SnakeMake MultiQC From line 195 of main/Snakefile

shell:
    """
    # Get the stats on the sequences using seqkit
    seqkit stats {input} > temp.txt

     # Sort the sequence statistics
     (sed -n '1p' temp.txt; awk 'NR>1{{print}}' temp.txt | \
       sort -V -k1,1) > {output} \
       && rm temp.txt
    """

SnakeMake seqkit From line 210 of main/Snakefile

shell:
    "cat {input} > {output}"

SnakeMake From line 228 of main/Snakefile

shell:
    """
    validate_mapping_file.py \
       -m {input} \
       -o {params.out_dir}
    """

SnakeMake From line 239 of main/Snakefile

shell:
    """    
    [ -d {params.out_dir} ] ||  mkdir -p {params.out_dir}
    # Merge reads then delete unnecessary files
    pear \
        -f {input.forward} \
        -r {input.rev} \
        -j {params.threads} \
        -o {params.out_dir}/{wildcards.sample} \
        -m {params.max} \
        -n {params.min} \
        -t {params.min_trim} > {log} 2>&1

   rm -rf \
        {params.out_dir}/{wildcards.sample}.discarded.fastq \
        {params.out_dir}/{wildcards.sample}.unassembled.forward.fastq \
        {params.out_dir}/{wildcards.sample}.unassembled.reverse.fastq 
        mv {params.out_dir}/{wildcards.sample}.assembled.fastq {params.out_dir}/{wildcards.sample}.fastq

   # gzip to save memory

   #gzip {params.out_dir}/{wildcards.sample}.fastq
    """

SnakeMake PEAR From line 260 of main/Snakefile

shell:
    """
    seqkit stats {input} > {output} 2> {log}
    """

SnakeMake seqkit From line 291 of main/Snakefile

shell:
    """
    multiple_split_libraries_fastq.py --sampleid_indicator '.fastq' \
         -i {params.reads_dir} -o {params.out_dir} \
         -p {input.parameter_file}
    """

SnakeMake From line 311 of main/Snakefile

shell:
    """
    # Count all the sequences after split_labrary's quality filtering and concatenation
    count_seqs.py -i {input} > {output}
    """

SnakeMake From line 338 of main/Snakefile

shell:
    """
    # Split the seqs.fna file into per sample fna files 
    # this step is necessary because usearch fails when you have a lot of samples
    split_sequence_file_on_sample_ids.py  -i {input} -o  {params.out_dir}
    """

SnakeMake From line 359 of main/Snakefile

shell:
    """

    [ -d {params.out_dir}/{wildcards.sample}/ ] || mkdir -p {params.out_dir}/{wildcards.sample}/
    PROJECT_DIR=`pwd`
    cd {params.out_dir}/{wildcards.sample}/ 

      identify_chimeric_seqs.py \
         -m usearch61 \
         -i ${{PROJECT_DIR}}/{input.fasta} \
         -r {input.reference_database} \
         -o ${{PROJECT_DIR}}/{params.out_dir}/{wildcards.sample}/

    [ -d ${{PROJECT_DIR}}/{params.chimera_dir}/ ] || mkdir -p ${{PROJECT_DIR}}/{params.chimera_dir}/
    # Rename the chimera.txt file so that it will be sample specific
    mv ${{PROJECT_DIR}}/{params.out_dir}/{wildcards.sample}/chimeras.txt \
      ${{PROJECT_DIR}}/{params.chimera_dir}/{wildcards.sample}_chimera.txt && \
    rm -rf  ${{PROJECT_DIR}}/{params.out_dir}/{wildcards.sample}/
    """

SnakeMake From line 378 of main/Snakefile

shell:
    "cat {input} > {output}"

SnakeMake From line 404 of main/Snakefile

shell:
    """
    # Filter out the chimeric sequences fom seqs.fna
    filter_fasta.py \
       -s {input.chimeras} \
       -f {input.seqs} \
       -o {output} \
       -n
    """

SnakeMake From line 415 of main/Snakefile

shell:
    """
    set +u
    {params.conda_activate}
    set -u

    # Count all the sequences after chimera filtering
    count_seqs.py -i {input} > {output}
    """

SnakeMake From line 431 of main/Snakefile

shell:
    """
    #set +u;source activate qiime1;set -u
    if [ {params.method_class} == "denovo" ];then
    pick_de_novo_otus.py \
        -i {input.seqs} \
        -o {params.out_dir} \
        -p {input.parameter_file} \
        -f

   else

    pick_open_reference_otus.py \
        -i {input.seqs} \
        -r {input.reference_database} \
        -o {params.out_dir} \
        -p {input.parameter_file} \
        -m {params.method} \
        --suppress_align_and_tree \
        -f

    fi
    """   

SnakeMake From line 460 of main/Snakefile

shell:
    """
    filter_taxa_from_otu_table.py \
        -i {input} \
        -o {output} \
        -n {params.taxa2filter}
    """

SnakeMake From line 495 of main/Snakefile

shell:
    """
    # Filter-out the really rare otus "The recommended procedure is to discard those OTUs with a
    # number of sequences less than 0.005% of the total number of sequences" (Navas-Molina et al, 2013)
    filter_otus_from_otu_table.py \
        -i {input} \
        -o {output} \
        --min_count_fraction {params.min_freq}
    """

SnakeMake From line 511 of main/Snakefile

shell:
    """
    biom summarize-table \
        -i {input} \
        -o {output}
    """

SnakeMake From line 531 of main/Snakefile

    shell:
        """
        #  #--tree_fp
        if [ -d {params.out_dir} ]; then rm -rf {params.out_dir}; fi 

        set +u;source activate qiime1;set -u && \
        core_diversity_analyses.py \
		    -i  {input.otu_table} \
		    -o  {params.out_dir} \
		    --mapping_fp {input.mapping_file} \
		    --sampling_depth {params.depth} \
		    --parameter_fp {params.parameter_file} \
		    --categories {params.category} \
            --nonphylogenetic_diversity
        """

SnakeMake From line 552 of main/Snakefile

shell:
    """
    {params.conda_activate}
    # Remove the temporary output directory if it already exists
    [ -d picrust2_out_pipeline/ ] && rm -rf picrust2_out_pipeline/

    # ---- Run picrust2 pipeline for function annotation -------- #
    picrust2_pipeline.py \
        -s {input.rep_seqs} \
        -i {input.feature_table} \
        -o picrust2_out_pipeline/ \
        -p {threads} && \
        mv picrust2_out_pipeline/* {params.out_dir}/ && \
        rmdir picrust2_out_pipeline/
    """

SnakeMake PICRUSt2 From line 586 of main/Snakefile

shell:
    """
    {params.conda_activate}
    # ----- Annotate your enzymes, KOs and pathways by adding a description column ------#
    # EC
    add_descriptions.py -i {input.ec} -m EC -o {output.ec}
    # Metacyc Pathway
    add_descriptions.py -i {input.pathway} -m METACYC -o {output.pathway}
    # KO
    add_descriptions.py -i {input.ko} -m KO -o {output.ko} 

    # Unizip the metagenome contribution files - these files describe the micribes contribution the function profiles
    #find {params.outdir} -type f -name "*contrib.tsv.gz" -exec gunzip {{}} \;
    """

SnakeMake From line 622 of main/Snakefile

shell:
    """
        # Create an empty file
        mkdir -p {params.outdir} && touch {output.ko}
    """

SnakeMake From line 648 of main/Snakefile

shell:
    """
    # filter out the OTUs of what you are looking for from the given OTU table e.g. human pathogens
    # And convert biom formattted OTU table to tsv 
    filter_taxa_from_otu_table.py \
        -i {input.biom_table} \
        -o {output.biom} -p {input.list2search} && \
    biom convert     \
           -i  {output.biom}   \
           -o  {output.tsv}     \
           --to-tsv     \
           --header-key taxonomy     \
           --output-metadata-id "Consensus Lineage" > {log} 2>&1
    """

SnakeMake From line 667 of main/Snakefile

shell:
    """
    # Get the pathogenic OTU names
    OTUS=($(cut -f1 {input.tsv_table} | sed -e 1,2d))
    parallel -j {threads} "grep -wE '{{}}' {input.otu_map}  >> {output}" ::: ${{OTUS[*]}} \
        > {log} 2>&1
    """

SnakeMake From line 692 of main/Snakefile

shell:
    """
    filter_fasta.py \
        -f {input.seqs} \
        -o {output} \
        -m  {input.otu_map} \
        > {log} 2>&1
    """

SnakeMake From line 708 of main/Snakefile

ShowHide 22 more snippets with no or duplicated tags.

Comments

Support

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Created: 1yr ago

Updated: 1yr ago

Maitainers: public

URL: https://github.com/olabiyi/qiime1-scripts

Name: qiime1-scripts

Version: 1

Badge:

Insert copied code into your website to add a link to this workflow.

License: None

Keywords:

Sample annotation Gene functional annotation Feature table Cutadapt FastQC MultiQC PEAR PICRUSt2 seqkit Snakemake Trimmomatic Trim_Galore taxonomic and functional annotation

Future updates

Related Workflows

psychip_snakemake — Show Details View Workflow

ENCODE pipeline for histone marks developed for the psychENCODE project

public

psychip pipeline is an improved version of the ENCODE pipeline for histone marks developed for the psychENCODE project. The o...

raw sequence reads Alignment Sequence alignment report macs2 ucsc-bedclip bedGraphToBigWig BEDTools BWA Picard SAMtools Snakemake

Free

Near-real time tracking of SARS-CoV-2 in Connecticut

public

Repository containing scripts to perform near-real time tracking of SARS-CoV-2 in Connecticut using genomic data. This pipeli...

JSON nextclade Augur Biopython FOCUS Pandas Snakemake bs4 epiweeks geopy matplotlib numpy pycountry pycountry-convert uszipcode

Free

cellranger-snakemake-gke — Show Details View Workflow

snakemake workflow to run cellranger on a given bucket using gke.

public

A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...

macs2 ucsc-bedclip bedGraphToBigWig BEDTools BWA Picard SAMtools Snakemake

Free

ATLAS - Three commands to start analyzing your metagenome data

public

Metagenome-atlas is a easy-to-use metagenomic pipeline based on snakemake. It handles all steps from QC, Assembly, Binning, t...

raw sequence reads Genome assembly Annotation track checkm2 gunc prodigal snakemake-wrapper-utils MEGAHIT Atlas BBMap Biopython BioRuby Bwa-mem2 cd-hit CheckM DAS Diamond eggNOG-mapper v2 MetaBAT 2 Minimap2 MMseqs MultiQC Pandas Picard pyfastx SAMtools SemiBin Snakemake SPAdes SqueezeMeta TADpole VAMB CONCOCT ete3 gtdbtk h5py networkx numpy plotly psutil utils metagenomics

Free

175

rna-seq-star-deseq2 — Show Details View Workflow

RNA-seq workflow using STAR and DESeq2

public

This workflow performs a differential gene expression analysis with STAR and Deseq2. The usage of this workflow is described ...

Free

dna-seq-gatk-variant-calling — Show Details View Workflow

This Snakemake pipeline implements the GATK best-practices workflow

public

This Snakemake pipeline implements the GATK best-practices workflow for calling small germline variants. The usage of thi...

VCF raw sequence reads Variant calling genetic variants gatk rust-bio-tools snakemake-wrapper-utils tabix BCFtools BWA FastQC MultiQC Pandas Picard SAMtools Snakemake Trimmomatic Variant Effect Predictor (VEP) common matplotlib numpy seaborn DNA

Free