16S pipeline using qiime2 created with snakemake
Help improve this workflow!
This workflow has been published but could be further improved with some additional meta data:- Keyword(s) in categories input, output, operation, topic
You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .
Qiime2 workflow for 16S analysis created with snakemake.
Authors
-
Ann-Kathrin Brüggemann (@AKBrueggemann)
-
Thomas Battenfeld (@thomasbtf)
Usage
If you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this (original) repository and, if available, its DOI (see above).
Step 1: Obtain a copy of this workflow
If you want to add your own changes to the workflow, create a GitHub repository of your own, then clone this one.
-
Create a new github repository using this workflow as a template .
-
Clone the newly created repository to your local system, into the place where you want to perform the data analysis.
If you just want to use this workflow locally, then simply clone it or download it as zip-file.
When you have the folder structure added on your local machine, please add a "data" folder manually.
Step 2: Configure workflow
Configure the workflow according to your needs via editing the files in the
config/
folder. Adjust
config.yaml
to configure the workflow execution, and
metadata.txt
to specify your sample setup.
Some important parameters you should check and set according to your own fatsq-files in the
config.yaml
are primers for the forward and reverse reads, the
datatype
, that should be used by QIIME2 and the
min-seq-length
. Based on the sequencing, the length of the reads can vary.
The default parameters for filtering and truncation were validated with the help of a MOCK community and fitted to retrieve all bacteria from that community.
In addition to that, you need to fit the metadata-parameters to your data. Please change the names of the used metadata-columns according to your information.
If your metadata is not containing numeric values, please use the "reduced-analysis" option in the config file to run the workflow, as the workflow is currently not able to run only on categorical metadata for the full analysis version. We are going to fix that in the future.
The workflow is able to perform clustering and denoising either with vsearch, leading to OTU creation, or with DADA2, creating ASVs. You can decide which modus to use by setting the variable "DADA2" to
True
(DADA2 usage) or
False
(vsearch).
Please make sure, that the names of your fastq files are correctly formatted. They should look like this:
samplename_SNumber_Lane_R1/R2_001.fastq.gz
Step 3: Install Snakemake
Create a snakemake environment using mamba via:
mamba create -c conda-forge -c bioconda -n snakemake snakemake
For installation details, see the instructions in the Snakemake documentation .
Step 4: Execute workflow
Activate the conda environment:
conda activate snakemake
Fill up the
metadata.txt
with the information of your samples:
Please be careful to not include spaces between the commas. If there is a column, that you don't have any information about, please leave it empty and simply
go on with the next column.
Test your configuration by performing a dry-run via
snakemake --use-conda -n
Executing the workflow takes two steps:
Data preparation: snakemake --cores $N --use-conda data_prep
Workflow execution: snakemake --cores $N --use-conda
using
$N
cores.
Step 5: Investigate results
After successful execution, the workflow provides you with a compressed folder, holding all interesting results ready to decompress or to download to your local machine. The compressed file 16S-report.tar.gz holds several qiime2-artifacts that can be inspected via qiime-view. In the zipped folder report.zip is the snakemake html report holding graphics as well as the DAG of the executed jobs and html files leading you directly to the qiime2-results, without the need of using qiime-view.
This report can, e.g., be forwarded to your collaborators.
Step 6: Commit changes
Whenever you change something, don't forget to commit the changes back to your github copy of the repository:
git commit -a
git push
Step 7: Obtain updates from upstream
Whenever you want to synchronize your workflow copy with new developments from upstream, do the following.
-
Once, register the upstream repository in your local copy:
git remote add -f upstream git@github.com:snakemake-workflows/16S.git
orgit remote add -f upstream https://github.com/snakemake-workflows/16S.git
if you do not have setup ssh keys. -
Update the upstream version:
git fetch upstream
. -
Create a diff with the current version:
git diff HEAD upstream/master workflow > upstream-changes.diff
. -
Investigate the changes:
vim upstream-changes.diff
. -
Apply the modified diff via:
git apply upstream-changes.diff
. -
Carefully check whether you need to update the config files:
git diff HEAD upstream/master config
. If so, do it manually, and only where necessary, since you would otherwise likely overwrite your settings and samples.
Step 8: Contribute back
In case you have also changed or added steps, please consider contributing them back to the original repository:
-
Fork the original repo to a personal or lab account.
-
Clone the fork to your local system, to a different place than where you ran your analysis.
-
Copy the modified files from your analysis to the clone of your fork, e.g.,
cp -r workflow path/to/fork
. Make sure to not accidentally copy config file contents or sample sheets. Instead, manually update the example config files if necessary. -
Commit and push your changes to your fork.
-
Create a pull request against the original repository.
Testing
Test cases are in the subfolder
.test
. They are automatically executed via continuous integration with
Github Actions
.
Tools
A list of the tools used in this pipeline:
Tool | Link |
---|---|
QIIME2 | www.doi.org/10.1038/s41587-019-0209-9 |
Snakemake | www.doi.org/10.12688/f1000research.29032.1 |
FastQC | www.bioinformatics.babraham.ac.uk/projects/fastqc |
MultiQC | www.doi.org/10.1093/bioinformatics/btw354 |
pandas | pandas.pydata.org |
kraken2 | www.doi.org/10.1186/s13059-019-1891-0 |
vsearch | www.github.com/torognes/vsearch |
DADA2 | www.doi.org/10.1038/nmeth.3869 |
songbird | www.doi.org/10.1038/s41467-019-10656-5 |
bowtie2 | www.doi.org/10.1038/nmeth.1923 |
Ancom | www.doi.org/10.3402/mehd.v26.27663 |
Code Snippets
15 16 17 | shell: "mkdir {output.dirc} \n" "bowtie2-build {input} {output.dirc}/{params.filename}" |
32 33 34 35 36 37 | shell: "bowtie2 -p 8 -x {input.db}/bowtie_host_DB " "-1 {input.read1} " "-2 {input.read2} " "-S {output} " "2> {log} " |
51 52 53 | shell: "bowtie2 -x {input.db}/bowtie_host_DB {input.read1} -S {output} " "2> {log} " |
64 65 66 | shell: "samtools view -bS {input} > {output} " "2> {log} " |
79 80 81 82 | shell: "samtools view -b -f 12 -F 256 " "{input} > {output} " "2> {log} " |
95 96 97 98 99 100 101 | shell: "samtools sort -n -m 5G -@ 2 {input} -o {output.sorted} 2> {log} \n" "samtools fastq -@ 8 {output.sorted} " "-1 {output.read1} " "-2 {output.read2} " "-0 /dev/null -s /dev/null -n " "2> {log} " |
114 115 116 117 | shell: "samtools view -b -F 256 " "{input} > {output} " "2> {log} " |
129 130 131 132 | shell: "samtools sort -n -m 5G -@ 2 {input} -o {output.sorted} 2> {log} \n" "samtools fastq -0 {output.read1} {output.sorted} " "2> {log} " |
143 144 145 | shell: "samtools view -F 4 -c {input} > {output} " "2> {log} " |
163 164 | script: "../scripts/bowtie_frequency.py" |
13 14 15 16 17 18 19 20 21 22 23 | shell: "qiime feature-table filter-features " "--i-table {input.table} " "--m-metadata-file {input.direc}/chimeras.qza " "--p-no-exclude-ids " "--o-filtered-table {output.table} \n" "qiime feature-table filter-seqs " "--i-data {input.seqs} " "--m-metadata-file {input.direc}/chimeras.qza " "--p-no-exclude-ids " "--o-filtered-data {output.seqs}" |
41 42 43 44 45 46 47 48 49 50 51 | shell: "qiime feature-classifier classify-consensus-vsearch " "--i-query {input.query} " "--i-reference-reads {input.reference_reads} " "--i-reference-taxonomy {input.reference_taxonomy} " "--p-maxaccepts {params.maxaccepts} " "--p-maxrejects {params.maxrejects} " "--p-perc-identity {params.perc_identity} " "--p-threads 4 " "--o-classification {output} " "--verbose" |
64 65 66 67 68 69 | shell: "qiime taxa barplot " "--i-table {input.table} " "--i-taxonomy {input.taxonomy} " "--m-metadata-file config/pep/sample.tsv " "--o-visualization {output}" |
13 14 15 16 17 18 | shell: "qiime vsearch dereplicate-sequences " "--i-sequences {input} " "--o-dereplicated-table {output.table} " "--o-dereplicated-sequences {output.seqs} " "--verbose 2> {log}" |
34 35 36 37 38 39 40 41 42 | shell: "qiime vsearch cluster-features-de-novo " "--i-table {input.table} " "--i-sequences {input.seqs} " "--p-perc-identity {params.perc_identity} " "--p-threads {params.threads} " "--o-clustered-table {output.table} " "--o-clustered-sequences {output.seqs} " "--verbose 2> {log}" |
64 65 66 67 68 69 70 71 72 73 74 75 76 77 | shell: "qiime feature-classifier classify-consensus-vsearch " "--i-query {input.query} " "--i-reference-reads {input.reference_reads} " "--i-reference-taxonomy {input.reference_taxonomy} " "--p-maxaccepts {params.maxaccepts} " "--p-maxrejects {params.maxrejects} " "--p-perc-identity {params.perc_identity} " "--p-query-cov {params.query_cov} " "--p-min-consensus {params.min_consensus} " "--p-threads {params.threads} " "--o-classification {output.tax} " "--o-search-results {output.search} " "--verbose 2> {log} " |
92 93 94 95 96 97 98 99 | shell: "qiime phylogeny align-to-tree-mafft-fasttree " "--i-sequences {input} " "--o-alignment {output.alignment} " "--o-masked-alignment {output.masked_alignment} " "--o-tree {output.tree} " "--o-rooted-tree {output.rooted_tree} " "--verbose 2> {log}" |
115 116 117 118 119 120 121 122 | shell: "qiime diversity core-metrics-phylogenetic " "--i-phylogeny {input.phylogeny} " "--i-table {input.table} " "--p-sampling-depth {params.depth} " "--m-metadata-file {input.metadata} " "--output-dir {output} " "--verbose 2> {log}" |
137 138 139 140 141 142 143 | shell: "qiime repeat-rarefy repeat-rarefy " "--i-table {input.table} " "--p-sampling-depth {params.sampling_depth} " "--p-repeat-times {params.repeats} " "--o-rarefied-table {output.table} " "--verbose 2> {log}" |
21 22 23 24 25 26 27 28 29 | shell: "qiime quality-filter q-score " "--i-demux {input} " "--p-min-quality {params.min_quality} " "--p-min-length-fraction {params.min_length_frac} " "--p-max-ambiguous {params.max_ambig} " "--o-filtered-sequences {output.filtering} " "--o-filter-stats {output.stats} " "--verbose 2> {log}" |
52 53 54 55 56 57 58 59 60 | shell: "qiime quality-filter q-score " "--i-demux {input} " "--p-min-quality {params.min_quality} " "--p-min-length-fraction {params.min_length_frac} " "--p-max-ambiguous {params.max_ambig} " "--o-filtered-sequences {output.filtering} " "--o-filter-stats {output.stats} " "--verbose 2> {log}" |
79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 | shell: "qiime vsearch uchime-denovo " "--i-table {input.table} " "--i-sequences {input.seqs} " "--p-minh {params.minh} " "--output-dir {output.direc} \n" "qiime feature-table filter-features " "--i-table {input.table} " "--m-metadata-file {output.direc}/chimeras.qza " "--p-exclude-ids " "--o-filtered-table {output.table} \n" "qiime feature-table filter-seqs " "--i-data {input.seqs} " "--m-metadata-file {output.direc}/chimeras.qza " "--p-exclude-ids " "--o-filtered-data {output.seqs} " "--verbose 2> {log}" |
110 111 112 113 114 115 116 117 118 119 120 | shell: "qiime feature-table filter-seqs " "--i-data {input.seq} " "--m-metadata-file {input.seq} " "--p-where 'length(sequence) > {params.min_length}' " "--o-filtered-data {output.seq} \n" "qiime feature-table filter-features " "--i-table {input.table} " "--m-metadata-file {output.seq} " "--o-filtered-table {output.table} " "--verbose 2> {log}" |
155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 | shell: "qiime dada2 denoise-paired " "--i-demultiplexed-seqs {input} " "--p-trunc-len-f {params.trunc_len_f} " "--p-trunc-len-r {params.trunc_len_r} " "--p-trim-left-f {params.trim_left_f} " "--p-trim-left-r {params.trim_left_r} " "--p-max-ee-f {params.max_ee_f} " "--p-max-ee-r {params.max_ee_r} " "--p-trunc-q {params.trunc_q} " "--p-min-overlap {params.min_overlap} " "--p-pooling-method {params.pooling_method} " "--p-chimera-method {params.chimera_method} " "--p-min-fold-parent-over-abundance {params.min_fold_parent_over_abundance} " "--p-n-threads {params.threads} " "--p-n-reads-learn {params.n_reads_learn} " "--o-table {output.table} " "--o-representative-sequences {output.seq} " "--o-denoising-stats {output.stats} " "--verbose 2> {log}" |
202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 | shell: "qiime dada2 denoise-paired " "--i-demultiplexed-seqs {input} " "--p-trunc-len {params.trunc_len_f} " "--p-trim-left {params.trim_left_f} " "--p-max-ee {params.max_ee_f} " "--p-trunc-q {params.trunc_q} " "--p-pooling-method {params.pooling_method} " "--p-chimera-method {params.chimera_method} " "--p-min-fold-parent-over-abundance {params.min_fold_parent_over_abundance} " "--p-n-threads {params.threads} " "--p-n-reads-learn {params.n_reads_learn} " "--o-table {output.table} " "--o-representative-sequences {output.seq} " "--o-denoising-stats {output.stats} " "--verbose 2> {log}" |
237 238 | script: "../scripts/relative_abundance.py" |
253 254 255 256 257 258 259 260 261 262 263 264 265 266 | shell: "value=$(<{input.abundance}) \n" "echo $value \n" "qiime feature-table filter-features " "--i-table {input.table} " "--p-min-frequency $value " "--o-filtered-table {output.table} " "--verbose 2> {log} \n" "qiime feature-table filter-seqs " "--i-data {input.seqs} " "--i-table {output.table} " "--p-no-exclude-ids " "--o-filtered-data {output.seqs} " "--verbose 2> {log} " |
289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 | shell: "qiime quality-control exclude-seqs " "--i-query-sequences {input.seq} " "--i-reference-sequences {input.ref_seq} " "--p-threads {params.threads} " "--p-perc-identity {params.perc_identity} " "--p-perc-query-aligned {params.perc_query_aligned} " "--o-sequence-hits {output.human_hit} " "--o-sequence-misses {output.seq} " "--verbose 2> {log} \n" "qiime feature-table filter-features " "--i-table {input.table} " "--m-metadata-file {output.seq} " "--o-filtered-table {output.table} " "--verbose 2> {log} " |
316 317 318 319 320 321 322 | shell: "qiime taxa collapse " "--i-table {input.table} " "--i-taxonomy {input.taxonomy} " "--p-level 6 " "--o-collapsed-table {output} " "--verbose 2> {log} " |
337 338 339 340 341 342 343 344 345 346 347 348 349 | shell: "qiime taxa filter-table " "--i-table {input.table} " "--i-taxonomy {input.taxonomy} " "--p-exclude mitochondria,chloroplast " "--o-filtered-table {output.table} " "--verbose 2> {log} \n" "qiime taxa filter-seqs " "--i-sequences {input.seq} " "--i-taxonomy {input.taxonomy} " "--p-exclude mitochondria,chloroplast " "--o-filtered-sequences {output.seq} " "--verbose 2> {log} " |
14 15 16 17 18 19 20 21 22 23 24 25 26 | shell: "qiime tools export " "--input-path {input.table} " "--output-path {output.table_biom} \n" "qiime tools export " "--input-path {input.taxonomy} " "--output-path {output.taxa_biom} \n" "qiime feature-table presence-absence " "--i-table {input.table} " "--o-presence-absence-table {output.table_binary} \n" "qiime tools export " "--input-path {output.table_binary} " "--output-path {output.binary_biom}" |
38 39 40 41 42 | shell: "qiime metadata tabulate " "--m-input-file {input} " "--o-visualization {output} " "--verbose 2> {log}" |
113 114 | script: "../scripts/rename_qzv.py" |
188 189 | script: "../scripts/extract_reports.py" |
209 210 | script: "../scripts/extract_beta_corr.py" |
230 231 | script: "../scripts/extract_significance.py" |
251 252 | script: "../scripts/extract_significance.py" |
272 273 | script: "../scripts/extract_significance.py" |
291 292 | script: "../scripts/extract_significance.py" |
310 311 | script: "../scripts/extract_beta_corr.py" |
393 394 395 396 | shell: "snakemake --nolock --report {output} --report-stylesheet resources/custom-stylesheet.css " "{params.for_testing} " "> {log} 2>&1" |
413 414 | shell: "tar -czvf {output} {params.directory} " |
430 431 | script: "../scripts/yaml_to_table.py" |
477 478 479 480 481 482 483 484 | shell: """ mkdir results/{wildcards.date}/16S-report cp -r {input} results/{wildcards.date}/16S-report/ tar -czvf results/{wildcards.date}/16S-report.tar.gz results/{wildcards.date}/16S-report/ cp results/{wildcards.date}/16S-report.tar.gz {params.outpath} rm -r results/{wildcards.date}/16S-report """ |
14 15 16 17 18 19 20 21 22 23 24 25 26 | shell: "qiime tools export " "--input-path {input.table} " "--output-path {output.table_biom} \n" "qiime tools export " "--input-path {input.taxonomy} " "--output-path {output.taxa_biom} \n" "qiime feature-table presence-absence " "--i-table {input.table} " "--o-presence-absence-table {output.table_binary} \n" "qiime tools export " "--input-path {output.table_binary} " "--output-path {output.binary_biom}" |
99 100 | script: "../scripts/rename_qzv.py" |
174 175 | script: "../scripts/extract_reports.py" |
195 196 | script: "../scripts/extract_beta_corr.py" |
216 217 | script: "../scripts/extract_significance.py" |
237 238 | script: "../scripts/extract_significance.py" |
258 259 | script: "../scripts/extract_significance.py" |
277 278 | script: "../scripts/extract_significance.py" |
296 297 | script: "../scripts/extract_beta_corr.py" |
380 381 382 383 | shell: "snakemake --nolock --report {output} --report-stylesheet resources/custom-stylesheet.css " "{params.for_testing} " "> {log} 2>&1" |
400 401 | shell: "tar -czvf {output} {params.directory} " |
417 418 | script: "../scripts/yaml_to_table.py" |
465 466 467 468 469 470 471 472 | shell: """ mkdir results/{wildcards.date}/16S-report cp -r {input} results/{wildcards.date}/16S-report/ tar -czvf results/{wildcards.date}/16S-report.tar.gz results/{wildcards.date}/16S-report/ cp results/{wildcards.date}/16S-report.tar.gz {params.outpath} rm -r results/{wildcards.date}/16S-report """ |
14 15 | script: "../scripts/create_sample_metadata.py" |
16 17 18 19 20 21 | shell: "cd resources; " "wget {params.genomic}; " "wget {params.kraken}; " "wget {params.seq}; " "wget {params.tax}; " |
70 71 | shell: "gzip -dc {input} > {output} 2> {log}; " |
83 84 | script: "../scripts/lower_to_upper_fasta.py" |
96 97 98 99 100 101 | shell: "qiime tools import " "--input-path {input} " "--output-path {output} " "--type 'FeatureData[Sequence]' " "2> {log} " |
113 114 | shell: "tar -zxvf {input} -C resources;" |
132 133 134 135 136 137 138 | shell: "qiime tools import " "--type {params.datatype} " "--input-path {params.direc} " "--input-format CasavaOneEightSingleLanePerSampleDirFmt " "--output-path {output} " "2> {log} " |
167 168 169 170 171 172 173 | shell: "qiime tools import " "--type {params.datatype} " "--input-path {params.dirc} " "--input-format CasavaOneEightSingleLanePerSampleDirFmt " "--output-path {output} " "2> {log} " |
198 199 200 201 202 203 204 | shell: "qiime tools import " "--type {params.datatype} " "--input-path {params.dirc} " "--input-format CasavaOneEightSingleLanePerSampleDirFmt " "--output-path {output} " "2> {log} " |
228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 | shell: """ if [[ '${params.datatype}' == '$SampleData[PairedEndSequencesWithQuality]' ]] then qiime cutadapt trim-paired \ --i-demultiplexed-sequences {input} \ --p-adapter-f {params.adapter1} \ --p-front-f {params.primer1} \ --p-front-r {params.primer2} \ --p-adapter-r {params.adapter2} \ --p-error-rate {params.error_rate} \ --p-times {params.rep_times} \ --p-overlap {params.overlap} \ --p-minimum-length {params.min_length} \ --o-trimmed-sequences {output} \ --verbose 2> {log} else qiime cutadapt trim-single \ --i-demultiplexed-sequences {input} \ --p-cores 10 \ --p-adapter {params.primer1} \ --p-front {params.primer2} \ --p-error-rate {params.error_rate} \ --p-times {params.rep_times} \ --p-overlap {params.overlap} \ --p-minimum-length {params.min_length} \ --o-trimmed-sequences {output} \ --verbose 2> {log} fi """ |
283 284 285 286 287 288 289 290 291 292 | shell: "qiime vsearch merge-pairs " "--i-demultiplexed-seqs {input} " "--p-allowmergestagger " "--p-minovlen {params.minovlen} " "--p-minlen {params.minlen} " "--p-maxdiffs {params.maxdiffs} " "--p-threads {params.threads} " "--o-merged-sequences {output} " "--verbose 2> {log}" |
19 20 | shell: "(kraken2 --db {input.db} --memory-mapping --threads {params.threads} --paired --use-names --report {output.report} {input.read1} {input.read2}) 2> {log}" |
40 41 | shell: "(kraken2 --db {input.db} --memory-mapping --threads {params.threads} --use-names --report {output.report} {input.read}) 2> {log}" |
53 54 | wrapper: "v1.3.1/bio/fastqc" |
76 77 | wrapper: "v1.23.3/bio/multiqc" |
10 11 12 13 14 | shell: "qiime demux summarize " "--i-data {input} " "--o-visualization {output} " "--verbose 2> {log}" |
26 27 28 29 30 | shell: "qiime feature-table summarize " "--i-table {input} " "--o-visualization {output} " "--verbose 2> {log}" |
42 43 44 45 46 | shell: "qiime vsearch fastq-stats " "--i-sequences {input} " "--o-visualization {output} " "--verbose 2> {log}" |
58 59 60 61 62 | shell: "qiime metadata tabulate " "--m-input-file {input} " "--o-visualization {output} " "--verbose 2> {log}" |
76 77 78 79 80 | shell: "qiime feature-table tabulate-seqs " "--i-data {input} " "--o-visualization {output} " "--verbose 2> {log}" |
98 99 | script: "../scripts/extract_humancount.py" |
111 112 | shell: "mkdir {output}" |
127 128 129 130 131 132 133 134 | shell: "qiime feature-table heatmap " "--i-table {input} " "--m-sample-metadata-file config/pep/sample.tsv " "--m-sample-metadata-column {params.metadata} " "--p-cluster {params.cluster} " "--o-visualization {output} " "--verbose 2> {log}" |
147 148 149 150 151 152 153 | shell: "qiime taxa barplot " "--i-table {input.table} " "--i-taxonomy {input.taxonomy} " "--m-metadata-file config/pep/sample.tsv " "--o-visualization {output} " "--verbose 2> {log}" |
165 166 167 168 169 | shell: "qiime metadata tabulate " "--m-input-file {input} " "--o-visualization {output} " "--verbose 2> {log}" |
181 182 183 184 185 | shell: "qiime tools export " "--input-path {input} " "--output-path {output} " "--verbose 2> {log}" |
197 198 | script: "../scripts/rename_taxonomy.py" |
212 213 214 215 216 217 | shell: """ biom add-metadata -i {input.biom}/feature-table.biom -o {output.biom} --observation-metadata-fp {input.taxonomy} biom convert -i {output.biom} -o {output.txt} --to-tsv --header-key taxonomy """ |
231 232 233 234 235 236 | shell: """ biom add-metadata -i {input.biom}/feature-table.biom -o {output.biom} --observation-metadata-fp {input.taxonomy} biom convert -i {output.biom} -o {output.txt} --to-tsv --header-key taxonomy """ |
253 254 | script: "../scripts/binaryheatmap.py" |
271 272 | script: "../scripts/absolute_taxabarplot.py" |
288 289 290 291 292 293 294 295 296 297 298 299 300 | shell: "qiime tools export " "--input-path {input.table} " "--output-path {output.table_biom} \n" "qiime tools export " "--input-path {input.taxonomy} " "--output-path {output.taxa_biom} \n" "qiime feature-table presence-absence " "--i-table {input.table} " "--o-presence-absence-table {output.table_binary} \n" "qiime tools export " "--input-path {output.table_binary} " "--output-path {output.binary_biom}" |
320 321 | script: "../scripts/rename_qzv.py" |
369 370 | script: "../scripts/extract_reports.py" |
389 390 391 392 | shell: "snakemake --nolock --report {output} --report-stylesheet resources/custom-stylesheet.css " "{params.for_testing} " "> {log} 2>&1" |
409 410 | shell: "tar -czvf {output} {params.directory} " |
424 425 426 427 428 429 430 431 432 | shell: "qiime feature-table summarize " "--i-table {input.table_wh} " "--o-visualization {output.visual_wh} " "--verbose 2> {log} \n" "qiime feature-table summarize " "--i-table {input.table_woh} " "--o-visualization {output.visual_woh} " "--verbose 2> {log}" |
453 454 | script: "../scripts/sample_freq_difference.py" |
470 471 | script: "../scripts/yaml_to_table.py" |
502 503 504 505 506 507 508 509 | shell: """ mkdir results/{wildcards.date}/16S-report cp -r {input} results/{wildcards.date}/16S-report/ tar -czvf results/{wildcards.date}/16S-report.tar.gz results/{wildcards.date}/16S-report/ cp results/{wildcards.date}/16S-report.tar.gz {params.outpath} rm -r results/{wildcards.date}/16S-report """ |
12 13 14 15 16 17 | shell: "qiime diversity alpha " "--i-table {input} " "--p-metric {params.metric} " "--o-alpha-diversity {output} " "--verbose 2> {log}" |
29 30 31 32 33 34 | shell: "qiime diversity alpha-group-significance " "--i-alpha-diversity {input} " "--m-metadata-file config/pep/sample.tsv " "--o-visualization {output} " "--verbose 2> {log} " |
49 50 51 52 53 54 55 56 | shell: "qiime diversity alpha-correlation " "--i-alpha-diversity {input} " "--m-metadata-file {params.metadata} " "--p-method {params.method} " "--p-intersect-ids " "--o-visualization {output} " "--verbose 2> {log}" |
72 73 74 75 76 77 78 79 | shell: "qiime diversity beta " "--i-table {input} " "--p-metric {params.metric} " "--p-pseudocount {params.pseudocount} " "--p-n-jobs {params.n_jobs} " "--o-distance-matrix {output} " "--verbose 2> {log}" |
93 94 95 96 97 98 99 | shell: "qiime diversity beta-group-significance " "--i-distance-matrix {input} " "--m-metadata-file config/pep/sample.tsv " "--m-metadata-column {wildcards.metadata_column} " "--o-visualization {output.out} " "--verbose 2> {log}" |
117 118 119 120 121 122 123 124 125 126 127 128 129 | shell: "qiime diversity beta-correlation " "--i-distance-matrix {input} " "--m-metadata-file {params.metadata_file} " "--m-metadata-column {wildcards.metadata_column} " "--p-method {params.method} " "--p-permutations {params.permutations} " "--p-intersect-ids " "--p-label1 jaccard-distance-matrix " "--p-label2 {params.metadata} " "--o-metadata-distance-matrix {output.distance_matrix} " "--o-mantel-scatter-visualization {output.mantel_scatter_vis} " "--verbose 2> {log}" |
142 143 144 145 146 147 148 | shell: "qiime diversity alpha-phylogenetic " "--i-phylogeny {input.phylogeny} " "--i-table {input.table} " "--p-metric faith_pd " "--o-alpha-diversity {output} " "--verbose 2> {log} " |
165 166 167 168 169 170 171 172 173 | shell: "qiime diversity beta-phylogenetic " "--i-table {input.table} " "--i-phylogeny {input.phylogeny} " "--p-metric {params.metrics} " "--p-threads {params.threads} " "--p-variance-adjusted {params.variance_adjusted} " "--o-distance-matrix {output} " "--verbose 2> {log}" |
185 186 187 188 189 | shell: "qiime diversity pcoa " "--i-distance-matrix {input} " "--o-pcoa {output} " "--verbose 2> {log} " |
203 204 205 206 207 208 | shell: "qiime emperor plot " "--i-pcoa {input} " "--m-metadata-file {params.metadata} " "--o-visualization {output} " "--verbose 2> {log} " |
10 11 12 13 14 | shell: "qiime demux summarize " "--i-data {input} " "--o-visualization {output} " "--verbose 2> {log}" |
26 27 28 29 30 | shell: "qiime feature-table summarize " "--i-table {input} " "--o-visualization {output} " "--verbose 2> {log}" |
44 45 46 47 48 49 50 51 52 | shell: "qiime feature-table summarize " "--i-table {input.table_wh} " "--o-visualization {output.visual_wh} " "--verbose 2> {log} \n" "qiime feature-table summarize " "--i-table {input.table_woh} " "--o-visualization {output.visual_woh} " "--verbose 2> {log}" |
64 65 66 67 68 | shell: "qiime vsearch fastq-stats " "--i-sequences {input} " "--o-visualization {output} " "--verbose 2> {log}" |
80 81 82 83 84 | shell: "qiime metadata tabulate " "--m-input-file {input} " "--o-visualization {output} " "--verbose 2> {log}" |
98 99 100 101 102 | shell: "qiime feature-table tabulate-seqs " "--i-data {input} " "--o-visualization {output} " "--verbose 2> {log}" |
120 121 | script: "../scripts/extract_humancount.py" |
133 134 | shell: "mkdir {output}" |
149 150 151 152 153 154 155 156 | shell: "qiime feature-table heatmap " "--i-table {input} " "--m-sample-metadata-file config/pep/sample.tsv " "--m-sample-metadata-column {params.metadata} " "--p-cluster {params.cluster} " "--o-visualization {output} " "--verbose 2> {log}" |
169 170 171 172 173 174 175 | shell: "qiime taxa barplot " "--i-table {input.table} " "--i-taxonomy {input.taxonomy} " "--m-metadata-file config/pep/sample.tsv " "--o-visualization {output} " "--verbose 2> {log}" |
187 188 189 190 191 | shell: "qiime metadata tabulate " "--m-input-file {input} " "--o-visualization {output} " "--verbose 2> {log}" |
203 204 205 206 207 | shell: "qiime tools export " "--input-path {input} " "--output-path {output} " "--verbose 2> {log}" |
226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 | shell: "qiime diversity alpha-rarefaction " "--i-table {input.table} " "--i-phylogeny {input.phylogeny} " "--p-max-depth {params.max_depth} " "--m-metadata-file config/pep/sample.tsv " "--o-visualization {output.alpha} " "--verbose 2> {log} \n" "qiime diversity beta-rarefaction " "--i-table {input.table} " "--i-phylogeny {input.phylogeny} " "--p-metric {params.metric} " "--p-clustering-method {params.clustering_method} " "--m-metadata-file config/pep/sample.tsv " "--p-sampling-depth {params.sampling_depth} " "--o-visualization {output.beta} " "--verbose 2> {log}" |
257 258 259 260 261 262 263 264 265 266 267 268 269 | shell: "qiime gneiss correlation-clustering " "--i-table {input} " "--o-clustering {output.tree} \n" "qiime gneiss dendrogram-heatmap " "--i-table {input} " "--i-tree {output.tree} " "--m-metadata-file config/pep/sample.tsv " "--m-metadata-column {params.metadata} " "--p-color-map seismic " "--o-visualization {output.heatmap_gneiss} " "--verbose 2> {log}" |
281 282 | script: "../scripts/rename_taxonomy.py" |
296 297 298 299 300 301 | shell: """ (biom add-metadata -i {input.biom}/feature-table.biom -o {output.biom} --observation-metadata-fp {input.taxonomy}) > {log} 2>&1 (biom convert -i {output.biom} -o {output.txt} --to-tsv --header-key taxonomy) > {log} 2>&1 """ |
315 316 317 318 319 320 | shell: """ (biom add-metadata -i {input.biom}/feature-table.biom -o {output.biom} --observation-metadata-fp {input.taxonomy}) > {log} 2>&1 (biom convert -i {output.biom} -o {output.txt} --to-tsv --header-key taxonomy) > {log} 2>&1 """ |
337 338 | script: "../scripts/binaryheatmap.py" |
355 356 | script: "../scripts/absolute_taxabarplot.py" |
374 375 376 377 378 379 380 381 382 383 384 385 | shell: "songbird multinomial " "--input-biom {input} " "--metadata-file config/pep/sample.tsv " "--formula {params.formula} " "--epochs 10000 " "--differential-prior {params.differential_prior} " "--min-sample-count {params.min_sample_count} " "--min-feature-count {params.min_feature_count} " "--summary-interval {params.summary_interval} " "--summary-dir {output} " "2> {log}" |
401 402 | script: "../scripts/featuretable_songbird.py" |
425 426 427 428 429 430 431 432 | shell: "qurro " "--ranks {params.differentials} " "--table {input.table} " "--sample-metadata {params.metadata_file} " "--feature-metadata {input.feature_metadata} " "--output-dir {output.plot} " "2> {log} " |
448 449 450 451 452 453 454 455 456 457 | shell: "qiime composition add-pseudocount " "--i-table {input} " "--o-composition-table {output.pseudocount_table} \n" "qiime composition ancom " "--i-table {output.pseudocount_table} " "--m-metadata-file {params.metadata_file} " "--m-metadata-column {params.metadata_column} " "--o-visualization {output.ancom_output} " "--verbose 2> {log}" |
478 479 | script: "../scripts/sample_freq_difference.py" |
498 499 | script: "../scripts/prepare_diversity.py" |
516 517 | script: "../scripts/plot_distance.py" |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | import pandas as pd import matplotlib as mpl import matplotlib.pyplot as plt import numpy as np from pylab import savefig import sys sys.stderr = open(snakemake.log[0], "w") df = pd.read_csv(str(snakemake.input), delimiter="\t", header=1, index_col="taxonomy") df.drop("#OTU ID", axis=1, inplace=True) df_reduced = df.groupby(df.index).sum() species = df_reduced.index.unique() for i in df_reduced.index: if "Unassigned" not in i: df_reduced.index = df_reduced.index.str.split("; s__").str[0] df_reduced_second = df_reduced.groupby(df_reduced.index).sum() for i in df_reduced_second.index: if "Chloroplast" in i: df_reduced_second.drop(index=i, inplace=True) df_trans = df_reduced_second.T df_trans.replace(-np.inf, 0, inplace=True) np.random.seed(100) mycolors = np.random.choice( list(mpl.colors.XKCD_COLORS.keys()), len(species), replace=False ) fig = df_trans.plot(kind="bar", stacked=True, color=mycolors, figsize=(16, 8)) handles, labels = fig.get_legend_handles_labels() lgd = fig.legend( handles, labels, loc="center left", bbox_to_anchor=(1, 0.5), borderaxespad=0.0, fontsize=8, ) plt.yscale("log") if len(df) < 40: plt.xticks(fontsize=7, rotation=45) if len(df) > 40: plt.xticks(fontsize=5, rotation=45) if len(df) > 100: plt.xticks(fontsize=4, rotation=45) plt.xlabel("Sample name") plt.ylabel("Log10 abundance") plt.title("Taxa-bar-plot of log10(abundance)") plt.savefig(str(snakemake.output), bbox_extra_artists=(lgd,), bbox_inches="tight") |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | import pandas as pd import matplotlib.pyplot as plt import seaborn as sb import numpy as np from pylab import savefig import sys sys.stderr = open(snakemake.log[0], "w") # Reading information from a file, holding binary information whether an OTU is present or absent in a sample # Reading the file, deleting unnecessary columns df = pd.read_csv(str(snakemake.input), delimiter="\t", header=1, index_col="taxonomy") del df["#OTU ID"] # Creating a specific colour scheme my_colors = [(0.2, 0.6, 0.3), (0.1, 0.4, 0.5)] fig, ax = plt.subplots(figsize=(20, 15)) # Creating a heatmap with seaborn heatmap = sb.heatmap( df, cmap=my_colors, linewidth=0.01, linecolor="Black", cbar=False, yticklabels=True ) # Setting the properties for the legend cbar = heatmap.figure.colorbar(heatmap.collections[0]) cbar.set_ticks([0.25, 0.75]) cbar.set_ticklabels(["absent", "present"]) heatmap.set_yticklabels(heatmap.get_ymajorticklabels(), fontsize=6) if len(df) < 40: plt.xticks(fontsize=8, rotation=45) if len(df) > 40: plt.xticks(fontsize=6, rotation=45) if len(df) > 100: plt.xticks(fontsize=5, rotation=45) heatmap.set(xlabel="Sample", ylabel="Taxonomic classification of bacteria") # Setting the orientation of the plot plt.subplots_adjust(bottom=0.2, left=0.4) plt.show() figure = heatmap.get_figure() figure.savefig(str(snakemake.output), dpi=400) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | import pandas as pd import matplotlib.pyplot as plt import seaborn as sb import numpy as np import os from pylab import savefig import sys sys.stderr = open(snakemake.log[0], "w") path = os.path.dirname(str(snakemake.input[0])) inputs = [] for file in os.listdir(path): if file.endswith(".txt"): inputs.append(os.path.join(path, file)) # concatenate all txt files in a file with open(str(snakemake.output), "w") as outfile: i = 0 outfile.write("Sample: Difference without human " + "\n") while i < len(inputs): with open(inputs[i], encoding="utf-8", errors="ignore") as infile: path_name = inputs[i].split("_")[0] name = path_name.split("/")[-1] + ": " + infile.read() outfile.write(name) i += 1 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 | import os import pandas as pd from os.path import isfile, join from os import path, getcwd, listdir from os import mkdir from shutil import move, copy2 from datetime import date, datetime import sys sys.stderr = open(snakemake.log[0], "w") # Creating a metadata sample-sheet, that holds additional information concerning the samples. # Metadata information is needed to be able to create plots and for metadata-specific analysis. config = snakemake.config DATA_PATH = str(config["data"]) IN_PATH = str(config["input"]) # today = date.today().strftime("%Y-%m-%d") incoming_files = [] # Running through all files in the IN_PATH and checking them # Adding the files to a list, when they meet the requirements for f in listdir(IN_PATH): if ( path.isfile(path.join(IN_PATH, f)) and "ndetermined" not in f and ".fastq.gz" in f and os.stat(IN_PATH + f).st_size > 100 ): incoming_files.append(f) else: print(f, "not used") metadata = pd.read_csv(str(snakemake.input), header=0, delimiter=",") date = metadata["run_date"].iloc[1] print(date) # Adding a direcory for the specific date as subdirectory to the DATA_PATH DATA_PATH += date if not os.path.isdir(DATA_PATH): mkdir(DATA_PATH) # Checking if some files are already present in the DATA_PATH, those are not copied data_files = [f for f in listdir(DATA_PATH) if path.isfile(path.join(DATA_PATH, f))] files_not_to_copy = [f for f in data_files if f in incoming_files] files_to_copy = [f for f in incoming_files if f not in data_files] for file in files_to_copy: if file.endswith(".fastq.gz") and not "ndetermined" in file: copy2(IN_PATH + file, DATA_PATH) print(files_to_copy) # Reading the sample names and the metadata from the file-names and the metadata.csv file, that needs to be provided files = os.listdir(DATA_PATH) print(files) sample_list = [] for name in files: sample = name.split("_")[0] sample_list.append(sample) sample_list = list(set(sample_list)) metadata = pd.read_csv(str(snakemake.input), header=0, delimiter=",") metadata.columns = metadata.columns.str.lower() # Samples, that are not mentioned in the metadata.csv are excluded from the sample-metadata-sheet for name in sample_list: if name not in metadata["sample_name"].tolist(): print("No metadata was found for " + name) sample_list.remove(name) # Also remove the fastq file from folder? # Error if names don't fit metadata_name_list = metadata["sample_name"].tolist() metadata_name_list.remove("#q2:types") if len(sample_list) == len(metadata_name_list): if set(sample_list) != set(metadata_name_list): print( "There seems to be a discrepancy between fastq file and metadata. Please check the spelling in the metadata.txt." ) raise Exception( "There seems to be a discrepancy between fastq file and metadata. Please check the spelling in the metadata.txt." ) # Replacing empty metadata-columns with 0 and after that removing them from the file metadata.fillna(0, inplace=True) metadata.rename(columns={"sample_name": "sample-ID"}, inplace=True) metadata.to_csv(snakemake.output.metadata, sep="\t", index=False) # Creating a sample_info df, that holds paths to the fastq files and the date. # Common functions can read information from sample_info.txt instead from the data directories data = [metadata["sample-ID"], metadata["run_date"]] header = ["sample-ID", "run_date"] sample_info = pd.concat(data, axis=1, keys=header) if "SampleData[PairedEndSequencesWithQuality]" in str(snakemake.params.datatype): sample_info["path1"] = "" sample_info["path2"] = "" elif "SampleData[SequencesWithQuality]" in str(snakemake.params.datatype): sample_info["path1"] = "" sample_info.set_index("sample-ID", inplace=True) sample_info.drop(labels=["#q2:types"], axis=0, inplace=True) i = 0 while i < len(sample_info.index): for file in files_to_copy: if sample_info.index[i] in file and len(sample_info.index[i]) == len( file.split("_")[0] ): if "SampleData[PairedEndSequencesWithQuality]" in str( snakemake.params.datatype ): if "R1" in file: sample_info.loc[sample_info.index[i], "path1"] = ( DATA_PATH + "/" + file ) elif "R2" in file: sample_info.loc[sample_info.index[i], "path2"] = ( DATA_PATH + "/" + file ) elif "SampleData[SequencesWithQuality]" in str(snakemake.params.datatype): if "R1" in file: sample_info.loc[sample_info.index[i], "path1"] = ( DATA_PATH + "/" + file ) else: continue i = i + 1 sample_info.to_csv(snakemake.output.sample_info, sep=",", mode="a") |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | import os import pandas as pd import shutil from os.path import isfile, join from os import path, getcwd, listdir from os import mkdir from shutil import copy, copy2 from datetime import date, datetime import sys sys.stderr = open(snakemake.log[0], "w") # Function to extract results from the qiime2 artifacts, already saved in a zip file """print("results/2022-02-24/visual/unzipped/") os.walk("results/2022-02-24/visual/unzipped/") for x in os.walk("results/2022-02-24/visual/unzipped/"): print(x[0])""" # Reading the zip-files from the directory dirlist = os.listdir(str(snakemake.input)) # Getting the subdirectories of the files subdir = [] i = 0 while i < len(dirlist): directory = str(snakemake.input) + "/" + dirlist[i] + "/" subdir.append(directory) i = i + 1 """z = 0 while z < len(subdir): if path.isdir(subdir[z]): dir = os.listdir(subdir[z]) print(dir) subdir[z] = subdir[z] + dir[0] + "/" print(subdir[z]) z = 1 + z else: z = 1 + z""" # Iterating through the different subdirectories and copying the important information into the output directory b = 0 path_name = str(snakemake.output) print(path_name) plot_name = path_name.split("/")[-1] while b < len(subdir): datadir = subdir[b] + "data/" if plot_name == subdir[b].split("/")[-2]: html = datadir shutil.copytree(html, str(snakemake.output)) b = b + 1 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 | import os import shutil import gzip import zipfile from os.path import isfile, join from os import path, getcwd, listdir from os import mkdir from shutil import copy, copy2 from datetime import date, datetime import sys sys.stderr = open(snakemake.log[0], "w") file = str(snakemake.input) name = os.path.splitext(file)[0] shutil.copy(file, name + ".zip") zipped_file = name + ".zip" with zipfile.ZipFile(zipped_file, "r") as zip_ref: name = zipped_file.split("/")[-1] new_dir = str(snakemake.params.between) + "/" + name zip_ref.extractall(os.path.splitext(new_dir)[0] + "/") directory = os.listdir(str(snakemake.params.between)) subdir = os.listdir(str(snakemake.params.between) + "/" + directory[0]) orig_dir = str(snakemake.params.between) + "/" + directory[0] + "/" + subdir[0] new_dir = str(snakemake.params.between) + "/" + directory[0] print(new_dir) dir_name = new_dir.split("/")[-1] for f in os.listdir(orig_dir): path = orig_dir + "/" + f print(path) unzipped_path = str(snakemake.params.between) + "/" + dir_name + "/" + f shutil.move(path, unzipped_path) os.rmdir(orig_dir) dirlist = os.listdir(str(snakemake.params.between)) subdir = [] i = 0 while i < len(dirlist): directory = str(snakemake.params.between) + "/" + dirlist[i] + "/" subdir.append(directory) i = i + 1 b = 0 while b < len(subdir): datadir = subdir[b] + "/" + "data/" if "human-count" in subdir[b]: html = datadir shutil.copytree(html, snakemake.output.human_count) b += 1 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 | import os import pandas as pd import shutil from os.path import isfile, join from os import path, getcwd, listdir from os import mkdir from shutil import copy, copy2 from datetime import date, datetime import sys sys.stderr = open(snakemake.log[0], "w") # Function to extract results from the qiime2 artifacts, already saved in a zip file """print("results/2022-02-24/visual/unzipped/") os.walk("results/2022-02-24/visual/unzipped/") for x in os.walk("results/2022-02-24/visual/unzipped/"): print(x[0])""" # Reading the zip-files from the directory dirlist = os.listdir(str(snakemake.input)) # Getting the subdirectories of the files subdir = [] i = 0 while i < len(dirlist): directory = str(snakemake.input) + "/" + dirlist[i] + "/" subdir.append(directory) i = i + 1 """z = 0 while z < len(subdir): if path.isdir(subdir[z]): dir = os.listdir(subdir[z]) print(dir) subdir[z] = subdir[z] + dir[0] + "/" print(subdir[z]) z = 1 + z else: z = 1 + z""" # Iterating through the different subdirectories and copying the important information into the output directory b = 0 while b < len(subdir): datadir = subdir[b] + "data/" if "beta-rarefaction" in subdir[b]: svg = datadir + "heatmap.svg" shutil.copy(svg, snakemake.output.beta_svg) html = datadir shutil.copytree(html, snakemake.output.beta_html) elif "heatmap" in subdir[b] and "gneiss" not in subdir[b]: svg = datadir + "feature-table-heatmap.svg" shutil.copy(svg, snakemake.output.heatmap) elif "taxonomy" in subdir[b]: tsv = datadir + "metadata.tsv" shutil.copy(tsv, snakemake.output.taxonomy_tsv) elif "taxa-bar-plots" in subdir[b]: html = datadir shutil.copytree(html, snakemake.output.taxa_barplot) elif "heatmap_gneiss" in subdir[b]: svg = datadir + "heatmap.svg" shutil.copy(svg, snakemake.output.gneiss) if "alpha-rarefaction" in subdir[b]: html = datadir shutil.copytree(html, snakemake.output.alpha_html) if "paired-seqs" in subdir[b]: html = datadir shutil.copytree(html, snakemake.output.paired_seqs) if "fastq_stats" in subdir[b]: html = datadir shutil.copytree(html, snakemake.output.fastq_stats) if "demux-joined-filter-stats" in subdir[b]: html = datadir shutil.copytree(html, snakemake.output.demux_filter_stats) if "dada2-stats" in subdir[b]: html = datadir shutil.copytree(html, snakemake.output.dada2) b = b + 1 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | import os import pandas as pd import shutil from os.path import isfile, join from os import path, getcwd, listdir from os import mkdir from shutil import copy, copy2 from datetime import date, datetime import sys sys.stderr = open(snakemake.log[0], "w") # Reading the zip-files from the directory dirlist = os.listdir(str(snakemake.input)) # Getting the subdirectories of the files subdir = [] i = 0 while i < len(dirlist): directory = str(snakemake.input) + "/" + dirlist[i] + "/" subdir.append(directory) i = i + 1 # Iterating through the different subdirectories and copying the important information into the output directory b = 0 path_name = str(snakemake.output) print(path_name) plot_name = path_name.split("/")[-1] while b < len(subdir): datadir = subdir[b] + "data/" if plot_name == subdir[b].split("/")[-2]: html = datadir shutil.copytree(html, str(snakemake.output)) b = b + 1 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | import pandas as pd import os import sys sys.stderr = open(snakemake.log[0], "w") differentials_df = pd.read_csv( str(snakemake.params), header=0, index_col="featureid", sep="\t" ) taxa_df = pd.read_csv(str(snakemake.input.taxa), header=1, sep="\t") taxa_small = taxa_df[["#OTU ID", "taxonomy"]].copy() taxa_small_new = taxa_small.set_index(["#OTU ID"]) diff_taxa_df = pd.concat([differentials_df, taxa_small_new], axis=1) diff_taxa_df_new = diff_taxa_df.set_index(["taxonomy"]) diff_taxa_df_new.to_csv(str(snakemake.output.diff), sep="\t", index=True) taxa_small_new.rename_axis("featurue-id", inplace=True) taxa_small_new.to_csv(str(snakemake.output.feature_meta), sep="\t", index=True) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | import pandas as pd import sys sys.stderr = open(snakemake.log[0], "w") fasta = open(str(snakemake.input), "r") fasta_upper_file = open(str(snakemake.output), "w") print("read!") for x in fasta.read(): fasta_upper = x.upper() fasta_upper_file.write(fasta_upper) print("ready") fasta.close() fasta_upper_file.close() |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | import os import shutil import sys import pandas as pd import seaborn as sb import numpy as np import matplotlib.pyplot as plt sys.stderr = open(snakemake.log[0], "w") directories = os.listdir(str(snakemake.input)) i = 0 while i < len(directories): output_name = os.path.split(str(snakemake.output))[1] name = output_name.split(".")[0] if name in directories[i]: df = pd.read_csv( str(snakemake.input) + "/" + directories[i] + "/data" + "/distance-matrix.tsv", sep="\t", header=0, index_col=0, ) matrix = np.triu(df) fig, ax = plt.subplots(figsize=(20, 15)) heatmap = sb.heatmap( df, mask=matrix, cbar_kws={"label": "Distance"}, annot=True, fmt=".1f" ) heatmap.set(xlabel="Samplename", ylabel="Samplename") figure = heatmap.get_figure() figure.savefig(str(snakemake.output), dpi=400) i = i + 1 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | import os import shutil import sys import zipfile sys.stderr = open(snakemake.log[0], "w") zipped_files = [] i = 0 os.mkdir(str(snakemake.output)) while i < len(snakemake.input): file = str(snakemake.input[i]) name_path = os.path.splitext(file)[0] name = os.path.split(name_path)[1] output_path = str(snakemake.output) + "/" + name + ".zip" shutil.copy2(file, output_path) zipped_files.append(output_path) i += 1 j = 0 while j < len(zipped_files): with zipfile.ZipFile(zipped_files[j], "r") as zip_ref: name = zipped_files[j].split("/")[-1] print(name) new_dir = str(snakemake.output) + "/" + name print(new_dir) zip_ref.extractall(os.path.splitext(new_dir)[0] + "/") os.remove(new_dir) j = j + 1 directory = os.listdir(str(snakemake.output)) print(directory) z = 0 while z < len(directory): subdir = os.listdir(str(snakemake.output) + "/" + directory[z]) orig_dir = str(snakemake.output) + "/" + directory[z] + "/" + subdir[0] new_dir = str(snakemake.output) + "/" + directory[z] for f in os.listdir(orig_dir): path = orig_dir + "/" + f shutil.move(path, new_dir) os.rmdir(orig_dir) z = z + 1 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 | import pandas as pd import gzip import shutil import os import zipfile import sys sys.stderr = open(snakemake.log[0], "w") # Extracting the number of total features over all samples from the sample-table. # Multiplying the number with the relative abundance filtering value to create a threshold for the # qiime filtering. # Reading the sample-table, creating a zip file file = str(snakemake.input) name = os.path.splitext(file)[0] shutil.copy(file, name + ".zip") filename = name + ".zip" # Extract zip files to folder with zipfile.ZipFile(filename, "r") as zip_ref: name = filename.split("/")[-1] dir_name = os.path.dirname(str(snakemake.input)) new_dir = dir_name + "/" + name zip_ref.extractall(os.path.splitext(new_dir)[0] + "/") name = name.split(".")[0] directory = os.path.dirname(str(snakemake.input)) + "/" + name # Moving the folder inventory one folder up b = 0 subdir = os.listdir(directory) while b < len(subdir): orig_dir = directory + "/" + subdir[b] new_dir = directory for f in os.listdir(orig_dir): path = orig_dir + "/" + f shutil.move(path, new_dir) b = b + 1 # Read the specific csv holding the information, creating a dataframe, adding up all feature frequencies datadir = str(snakemake.output.feature_table) + "/" csv = datadir + "sample-frequency-detail.csv" frequency = pd.read_csv(csv, header=0, delimiter=",") number = frequency["0"].sum() # Creating the abundance threshold and storing it in an output file abundance = float(str(snakemake.params)) endnumber = number * abundance endnumber = int(endnumber) if endnumber == 0: endnumber += 1 with open(str(snakemake.output.abundance), "w") as f: f.write(str(endnumber)) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | import pandas as pd import gzip import shutil import os import zipfile import sys sys.stderr = open(snakemake.log[0], "w") # Renaming the qiime2 artifacts for viewing to zip files so the information can be accessed # Iterating over the input files and renaming them to zip i = 0 zipped_files = [] while i < len(snakemake.input): file = snakemake.input[i] name = os.path.splitext(file)[0] shutil.copy(file, name + ".zip") zipped_files.append(name + ".zip") i = i + 1 z = 0 # opening the zip files and saving them to the output directory while z < len(zipped_files): with zipfile.ZipFile(zipped_files[z], "r") as zip_ref: name = zipped_files[z].split("/")[-1] new_dir = str(snakemake.output) + "/" + name zip_ref.extractall(os.path.splitext(new_dir)[0] + "/") os.remove(zipped_files[z]) z = z + 1 directory = os.listdir(str(snakemake.output)) # Iterating through the directory holding the unzipped files, moving the file content one folder up. # Folder created while unzipping can be removed, because it gives no further information j = 0 while j < len(directory): subdir = os.listdir(str(snakemake.output) + "/" + directory[j]) orig_dir = str(snakemake.output) + "/" + directory[j] + "/" + subdir[0] new_dir = str(snakemake.output) + "/" + directory[j] for f in os.listdir(orig_dir): path = orig_dir + "/" + f shutil.move(path, new_dir) os.rmdir(orig_dir) j = j + 1 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | import pandas as pd import os import sys sys.stderr = open(snakemake.log[0], "w") # Renaming the headers in the taxonomy.tsv file, so that they match with the headers of the table-biom file. # When the headers are equal, the files can be merged into one file. taxonomy_file = str(snakemake.input) + "/" + "taxonomy.tsv" print(os.listdir(str(snakemake.input))[0]) taxonomy_df = pd.read_csv(taxonomy_file, delimiter="\t", header=0) taxonomy_df.rename( columns={"Feature ID": "#OTU ID", "Taxon": "taxonomy", "Consensus": "confidence"}, inplace=True, ) taxonomy_df.to_csv(str(snakemake.output), sep="\t", index=False) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | import pandas as pd sys.stderr = open(snakemake.log[0], "w") whuman = str(snakemake.params.visual_wh) wohuman = str(snakemake.params.visual_woh) whuman_df = pd.read_csv(whuman, sep=",", header=0, index_col=0) wohuman_df = pd.read_csv(wohuman, sep=",", header=0, index_col=0) whuman_df.rename(columns={"0": "whuman"}, inplace=True) wohuman_df.rename(columns={"0": "wohuman"}, inplace=True) combined = pd.concat([whuman_df, wohuman_df], axis=1) for sample in combined.index: if combined.at[sample, "whuman"] == combined.at[sample, "wohuman"]: combined.drop([sample], inplace=True) # combined.drop_duplicates(subset = ["whuman", "wohuman"], inplace = True) for sample in combined.index: difference = combined.at[sample, "whuman"] - combined.at[sample, "wohuman"] combined.at[sample, "difference"] = difference combined.index.name = "Sample" combined.rename( columns={"whuman": "Features with human", "wohuman": "Features without human"}, inplace=True, ) combined.to_csv(str(snakemake.output)) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 | import sys from pathlib import Path import oyaml as yaml from prettytable import PrettyTable import argparse import os """ Example : Input YAML : ~~~~~~~~~~ metadata: name: nginx-deployment labels: app: nginx Output Table : ~~~~~~~~~~~~ metadata: +--------+------------------+----------+-----------------------------------------------------------+ | Field | Example Value | Required | Description | +--------+------------------+----------+-----------------------------------------------------------+ | name | nginx-deployment | -- | Lorem ipsum dolor sit amet, consecteteur adipiscing elit. | | labels | -- | -- | Lorem ipsum. | | app | nginx | -- | Lorem ipsum. | +--------+------------------+----------+-----------------------------------------------------------+ """ # parser = argparse.ArgumentParser(description='YAML file to (HTML) table converter', # epilog='text table will be printed as STDOUT - html table will be save in html file ') # parser.add_argument('--inputFile', dest='inputfile', required=True, help="input yaml file to process") # parser.add_argument('--out', dest='format', choices=['txt', 'html', 'text'], help="convert yaml to text table or html " # "table") # parser.add_argument('--outpath', dest='outputfile', required=True, help="path to the output file") # args = parser.parse_args() sys.stderr = open(snakemake.log[0], "w") outputFmt = "html" INPUT_YAML = str(snakemake.input) if outputFmt == "text" or outputFmt == "txt": PRINT_HTML = False else: PRINT_HTML = True in_file = Path(INPUT_YAML) if not in_file.is_file(): sys.exit("Input file [" + INPUT_YAML + "] does not exists") SPACE_CHAR = "~" OUTPUT_HTMl = str(snakemake.output) CSS_TEXT = """ <html> <head> <meta name="viewport" content="width=device-width, initial-scale=1"> <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.4.1/css/bootstrap.min.css"> <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script> <script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.4.1/js/bootstrap.min.js"></script> <style> body { padding-left: 20px; } th:nth-child(1) { width: 200px; } /* the second */ th:nth-child(2) { width: 200px; } /* the third */ th:nth-child(3) { width: 100px; } /* the third */ th:nth-child(4) { width: 420px; } pre { white-space: -moz-pre-wrap; /* Mozilla, supported since 1999 */ white-space: -pre-wrap; /* Opera */ white-space: -o-pre-wrap; /* Opera */ white-space: pre-wrap; /* CSS3 - Text module (Candidate Recommendation) http://www.w3.org/TR/css3-text/#white-space */ word-wrap: break-word; /* IE 5.5+ */ width: 725px } </style> </head> """ def listToString(inList): """Convert list to String""" ret = "" for line in inList: ret = ret + line return ret def printDic(inDictionary, inPTable, indent): """ Iterate over Dictionary If needed call same function again (recursively) until we key : value dictionary Add key , value , isItRequired , description to pretty table object (inPTable) """ global SPACE_CHAR # No space character that will be replaced when we print this table to text/html # Go ver dictionary for item in inDictionary: if isinstance( item, dict ): # If it again dictionary call same function with this new dictionary inPTable.add_row([SPACE_CHAR, SPACE_CHAR]) printDic(item, inPTable, indent) else: # Two way to get next item based on input type if isinstance(inDictionary, dict): moreStuff = inDictionary.get(item) elif isinstance(inDictionary, list): # If it simple array/list we just print all it's value and we are done for _item in inDictionary: inPTable.add_row([indent + _item, SPACE_CHAR + SPACE_CHAR]) break # if it is dictionary or list process them accordingly if isinstance(moreStuff, dict): inPTable.add_row([indent + item, SPACE_CHAR + SPACE_CHAR]) printDic(moreStuff, inPTable, SPACE_CHAR + SPACE_CHAR + indent) elif isinstance(moreStuff, list): # If we are not in nested call (as indent is empty string) we add one extra row in table (for clarity) # if indent is "": # inPTable.add_row([SPACE_CHAR, SPACE_CHAR, SPACE_CHAR]) # # inPTable.add_row([indent + item, "", ""]) for dicInDic in moreStuff: if dicInDic is not None: if isinstance(dicInDic, dict): printDic( dicInDic, inPTable, SPACE_CHAR + SPACE_CHAR + SPACE_CHAR + SPACE_CHAR + indent, ) else: inPTable.add_row([indent + item, dicInDic]) else: # Most of the call will end-up eventually here - # this will print - key,value,isItRequired, Lorem ipsum (description) inPTable.add_row([indent + item, inDictionary[item]]) """ Read given yaml file process each to level element build table (HTML) out of it and print it console/file """ with open(INPUT_YAML) as file: # The FullLoader parameter handles the conversion from YAML # scalar values to Python the dictionary format yaml_file_object = yaml.load(file, Loader=yaml.FullLoader) if PRINT_HTML: html_st = [] f = open(OUTPUT_HTMl, "w") html_st.append(CSS_TEXT) i = 0 for key in yaml_file_object: body_st = [] prettyTable = PrettyTable() prettyTable.field_names = ["Field", "Example Value"] if not PRINT_HTML: prettyTable.align["Field"] = "l" prettyTable.align["Example Value"] = "l" if isinstance(yaml_file_object, list): dic = yaml_file_object[i] i += 1 elif isinstance(yaml_file_object, dict): dic = yaml_file_object.get(key) if isinstance(dic, dict) or isinstance(dic, list): printDic(dic, prettyTable, "") if isinstance(yaml_file_object, dict): yaml_snippet = yaml.dump({key: dic}) else: yaml_snippet = yaml.dump(dic) else: prettyTable.add_row([key, dic]) yaml_snippet = yaml.dump({key: dic}) if isinstance(yaml_file_object, dict): if PRINT_HTML: body_st.append("<h2>" + key + "</h2>") else: print("=> " + key + ":") table = prettyTable.get_html_string( attributes={ "name": key, "id": key, "class": "table table-striped table-condensed", "style": "width: 1450px;table-layout: fixed;overflow-wrap: " "break-word;", } ) table = table.replace(SPACE_CHAR, " ") body_st.append(table) if PRINT_HTML: html_st.append(" ".join(body_st)) else: print(str(prettyTable).replace(SPACE_CHAR, " ")) if PRINT_HTML: html_st.append("</html>") f.write(" ".join(html_st)) f.close() print("File " + OUTPUT_HTMl + " has been generated") |
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | __author__ = "Julian de Ruiter" __copyright__ = "Copyright 2017, Julian de Ruiter" __email__ = "julianderuiter@gmail.com" __license__ = "MIT" from os import path from snakemake.shell import shell extra = snakemake.params.get("extra", "") # Set this to False if multiqc should use the actual input directly # instead of parsing the folders where the provided files are located use_input_files_only = snakemake.params.get("use_input_files_only", False) if not use_input_files_only: input_data = set(path.dirname(fp) for fp in snakemake.input) else: input_data = set(snakemake.input) output_dir = path.dirname(snakemake.output[0]) output_name = path.basename(snakemake.output[0]) log = snakemake.log_fmt_shell(stdout=True, stderr=True) shell( "multiqc" " {extra}" " --force" " -o {output_dir}" " -n {output_name}" " {input_data}" " {log}" ) |
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 | __author__ = "Julian de Ruiter" __copyright__ = "Copyright 2017, Julian de Ruiter" __email__ = "julianderuiter@gmail.com" __license__ = "MIT" from os import path import re from tempfile import TemporaryDirectory from snakemake.shell import shell log = snakemake.log_fmt_shell(stdout=True, stderr=True) def basename_without_ext(file_path): """Returns basename of file path, without the file extension.""" base = path.basename(file_path) # Remove file extension(s) (similar to the internal fastqc approach) base = re.sub("\\.gz$", "", base) base = re.sub("\\.bz2$", "", base) base = re.sub("\\.txt$", "", base) base = re.sub("\\.fastq$", "", base) base = re.sub("\\.fq$", "", base) base = re.sub("\\.sam$", "", base) base = re.sub("\\.bam$", "", base) return base # Run fastqc, since there can be race conditions if multiple jobs # use the same fastqc dir, we create a temp dir. with TemporaryDirectory() as tempdir: shell( "fastqc {snakemake.params} -t {snakemake.threads} " "--outdir {tempdir:q} {snakemake.input[0]:q}" " {log}" ) # Move outputs into proper position. output_base = basename_without_ext(snakemake.input[0]) html_path = path.join(tempdir, output_base + "_fastqc.html") zip_path = path.join(tempdir, output_base + "_fastqc.zip") if snakemake.output.html != html_path: shell("mv {html_path:q} {snakemake.output.html:q}") if snakemake.output.zip != zip_path: shell("mv {zip_path:q} {snakemake.output.zip:q}") |
Support
- Future updates
Related Workflows





