SpaceMicrobe Snakemake Workflow for 10X Visium Spatial Gene Expression Data
Help improve this workflow!
This workflow has been published but could be further improved with some additional meta data:- Keyword(s) in categories input, output, operation
You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .
The SpaceMicrobe Snakemake workflow is part of the SpaceMicrobe computational framework to detect microbial reads in 10X Visium Spatial Gene Expression data.
The Snakemake workflow requires that
spaceranger count
has already been run on the spatial transcriptomcics dataset. The input file required for the Snakemake workflow is the
possorted_genome_bam.bam
file from the spaceranger count
outs
folder.
The Snakemake workflow outputs the taxonomic classifications of the reads (a modified Kraken2 output file), that have to be further processed with the R package microbiome10XVisium .
Graph
Overview of the Snakemake workflow:
Reads that did not align to the host transcriptome/genome are extracted (samtools_view), the molecular (UMI) and spatial (10X barcode) information of the reads are preserved in read2 (umi) and quality control on read2 is performed (cutadapt and fastp), in order to remove adapters and poly-A tails, perform quality trimming and enforce a minimum read length. Then the metagenomic profiler Kraken2 is used to perform taxonomic classification of the reads (classify).
Code Snippets
71 72 73 74 75 76 77 78 79 | shell: """ samtools view \ --output-fmt BAM \ --bam \ --require-flags 4 \ --threads {threads} \ {input} > {output} """ |
93 94 95 96 97 98 99 | shell: """ samtools sort -n \ --output-fmt BAM \ --threads {threads} \ {input} > {output} """ |
118 119 120 121 122 123 124 125 126 127 | shell: """ rm -rf {params.dir}/fastq && \ mkdir -p {params.dir} && \ bamtofastq \ --nthreads={threads} \ {input} \ {params.dir}/fastq \ && touch {output[0]} """ |
144 145 146 147 148 149 150 | shell: """ rm -f {output} for name in {params.dir}/*/*_R1_*.fastq.gz; do cat $name >> {output} done """ |
166 167 168 169 170 171 172 | shell: """ rm -f {output} for name in {params.dir}/*/*_R2_*.fastq.gz; do cat $name >> {output} done """ |
190 191 192 193 194 195 196 197 198 | shell: """ umi_tools extract \ --bc-pattern={params.bc} \ --stdin {input.fq1} \ --stdout {output.fq1} \ --read2-in {input.fq2} \ --read2-out={output.fq2} """ |
222 223 224 225 226 227 228 229 230 231 232 | shell: """ cutadapt \ --front {params.adaptor} \ --front {params.polyA} \ --times 2 \ --minimum-length 31 \ --cores {threads} \ --output {output.fq} \ {input} > {output.report} """ |
257 258 259 260 261 262 263 264 265 266 267 268 269 270 | shell: """ fastp --in1 {input} \ --cut_right_window_size 4 \ --cut_right_mean_quality 20 \ --disable_adapter_trimming \ --length_required 31 \ --thread {threads} \ --dont_eval_duplication \ --trim_poly_x \ --out1 {output.fq} \ --html {output.html} \ --json {output.json} """ |
291 292 293 294 295 296 297 298 299 300 301 302 303 | shell: """ kraken2 --db {params.db} \ --memory-mapping \ --confidence 0.1 \ --threads {threads} \ --use-names \ --gzip-compressed \ --report-minimizer-data \ --output {output.txt} \ --report {output.report} \ {input} """ |
318 319 320 321 322 323 324 325 326 327 328 | shell: """ ### extract only classified reads & remove all human reads sed -e '/^U/d' -e '/sapiens/d' {input} > {output.f} ### extract spatial BC & UMIs into separate tabs sed -i 's/\_/\t/g' {output.f} ### extract taxid into separate tab sed -i -e 's/ (taxid /\t/g' -e 's/)//g' {output.f} ### extract only CB, UMI, taxid columns awk -F \"\t\" '{{ print $3,\"\t\",$4,\"\t\",$6 }}' {output.f} > {output.p} """ |
350 351 352 353 354 355 356 357 358 359 360 | shell: """ multiqc \ --force \ --module fastp \ --module kraken \ --module cutadapt \ --outdir {params.outdir} \ {params.indir} \ && rm -rf {BAMDIR} """ |
Support
- Future updates
Related Workflows





