SpaceMicrobe Snakemake Workflow for 10X Visium Spatial Gene Expression Data

public 1yr ago 0 bookmarks

View Workflow

Help improve this workflow!

This workflow has been published but could be further improved with some additional meta data:

Keyword(s) in categories input, output, operation

You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .

The SpaceMicrobe Snakemake workflow is part of the SpaceMicrobe computational framework to detect microbial reads in 10X Visium Spatial Gene Expression data.

The Snakemake workflow requires that spaceranger count has already been run on the spatial transcriptomcics dataset. The input file required for the Snakemake workflow is the possorted_genome_bam.bam file from the spaceranger count outs folder.

The Snakemake workflow outputs the taxonomic classifications of the reads (a modified Kraken2 output file), that have to be further processed with the R package microbiome10XVisium .

Graph

Overview of the Snakemake workflow:

Reads that did not align to the host transcriptome/genome are extracted (samtools_view), the molecular (UMI) and spatial (10X barcode) information of the reads are preserved in read2 (umi) and quality control on read2 is performed (cutadapt and fastp), in order to remove adapters and poly-A tails, perform quality trimming and enforce a minimum read length. Then the metagenomic profiler Kraken2 is used to perform taxonomic classification of the reads (classify).

Snakemake rule graph

Code Snippets

shell:
    """
    samtools view \
        --output-fmt BAM \
        --bam \
        --require-flags 4 \
        --threads {threads} \
        {input} > {output}
    """

SnakeMake SAMtools From line 71 of main/Snakefile

shell:
    """
    samtools sort -n \
        --output-fmt BAM \
        --threads {threads} \
        {input} > {output}
    """

SnakeMake SAMtools From line 93 of main/Snakefile

shell:
    """
    rm -rf {params.dir}/fastq && \
    mkdir -p {params.dir} && \
    bamtofastq \
        --nthreads={threads} \
        {input} \
        {params.dir}/fastq \
    && touch {output[0]}
    """

SnakeMake From line 118 of main/Snakefile

shell:
    """
    rm -f {output}
    for name in {params.dir}/*/*_R1_*.fastq.gz; do 
       cat $name >> {output}
    done
    """

SnakeMake From line 144 of main/Snakefile

shell:
    """
    rm -f {output}
    for name in {params.dir}/*/*_R2_*.fastq.gz; do 
       cat $name >> {output}
    done
    """        

SnakeMake From line 166 of main/Snakefile

shell:
    """
    umi_tools extract \
    --bc-pattern={params.bc} \
    --stdin {input.fq1} \
    --stdout {output.fq1} \
    --read2-in {input.fq2} \
    --read2-out={output.fq2}
    """

SnakeMake umi_tools From line 190 of main/Snakefile

shell:
    """
    cutadapt \
        --front {params.adaptor} \
        --front {params.polyA} \
        --times 2 \
        --minimum-length 31 \
        --cores {threads} \
        --output {output.fq} \
        {input} > {output.report}
    """

SnakeMake Cutadapt From line 222 of main/Snakefile

shell:
    """
     fastp --in1 {input} \
         --cut_right_window_size 4 \
         --cut_right_mean_quality 20 \
         --disable_adapter_trimming \
         --length_required 31 \
         --thread {threads} \
         --dont_eval_duplication \
         --trim_poly_x \
         --out1 {output.fq} \
         --html {output.html} \
         --json {output.json}
    """

SnakeMake fastp From line 257 of main/Snakefile

shell:
    """
    kraken2 --db {params.db} \
        --memory-mapping \
        --confidence 0.1 \
        --threads {threads} \
        --use-names \
        --gzip-compressed \
        --report-minimizer-data \
        --output {output.txt} \
        --report {output.report} \
        {input}
    """

SnakeMake kraken2 From line 291 of main/Snakefile

shell:
    """
    ### extract only classified reads & remove all human reads
    sed -e '/^U/d' -e '/sapiens/d' {input} > {output.f}
    ### extract spatial BC & UMIs into separate tabs
    sed -i 's/\_/\t/g' {output.f}
    ### extract taxid into separate tab
    sed -i -e 's/ (taxid /\t/g' -e 's/)//g' {output.f}
    ### extract only CB, UMI, taxid columns
    awk -F \"\t\" '{{ print $3,\"\t\",$4,\"\t\",$6 }}' {output.f} > {output.p}
    """

SnakeMake From line 318 of main/Snakefile

shell:
    """
    multiqc \
        --force \
        --module fastp \
        --module kraken \
        --module cutadapt \
        --outdir {params.outdir} \
        {params.indir} \
    && rm -rf {BAMDIR}
    """