Help improve this workflow!
This workflow has been published but could be further improved with some additional meta data:- Keyword(s) in categories input, output, operation
You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .
This is the template for a new Snakemake workflow. Replace this text with a comprehensive description covering the purpose and domain. Insert your code into the respective folders, i.e.
scripts
,
rules
, and
envs
. Define the entry point of the workflow in the
Snakefile
and the main configuration in the
config.yaml
file.
Usage
If you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this (original) repository and, if available, its DOI (see above).
Step 1: Obtain a copy of this workflow
-
Create a new github repository using this workflow as a template .
-
Clone the newly created repository to your local system, into the place where you want to perform the data analysis.
Step 2: Configure workflow
Configure the workflow according to your needs via editing the files in the
config/
folder. Adjust
config.yaml
to configure the workflow execution, and
samples.tsv
to specify your sample setup.
Step 3: Install Snakemake
Install Snakemake using conda :
conda create -c bioconda -c conda-forge -n snakemake snakemake
For installation details, see the instructions in the Snakemake documentation .
Step 4: Execute workflow
Activate the conda environment:
conda activate snakemake
Test your configuration by performing a dry-run via
snakemake --use-conda -n
Execute the workflow locally via
snakemake --use-conda --cores $N
using
$N
cores or run it in a cluster environment via
snakemake --use-conda --cluster qsub --jobs 100
Step 5: Investigate results
After successful execution, you can create a self-contained interactive HTML report with all results via:
snakemake --report report.html
This report can, e.g., be forwarded to your collaborators. An example (using some trivial test data) can be seen here .
Step 6: Commit changes
Whenever you change something, don't forget to commit the changes back to your github copy of the repository:
git commit -a
git push
Code Snippets
13 14 | shell: "prokka {params.prokka} --cpus {threads} --outdir {output} --prefix {wildcards.sample} {input} 2> {log}" |
16 17 18 19 20 | shell: """ spades.py {params.spades} --threads {threads} -1 {input.fq1} -2 {input.fq2} -o {output[0]} 2> {log} seqkit seq {params.seqkit} {output[0]}/scaffolds.fasta > {output[1]} """ |
37 38 39 40 41 | shell: """ skesa {params.skesa} --cores {threads} --reads {input.fq1},{input.fq2} --contigs_out {output[0]} 2> {log} seqkit seq {params.seqkit} {output[0]} > {output[1]} """ |
56 57 | shell: "quast.py {params.quast} --threads {threads} -l spades,skesa -o {output} {input[0]} {input[1]} 2> {log}" |
66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 | run: import pandas as pd from shutil import copy quast = pd.read_csv(f"{input.quast}/report.tsv", sep="\t", header=0).set_index("Assembly", drop=False) quast.drop('Assembly', axis='columns', inplace=True) score = { i : 0 for i in quast.columns.to_list() } number_contigs = quast.loc['# contigs'].to_dict() largest_contig = quast.loc['Largest contig'].to_dict() total_length = quast.loc['Total length'].to_dict() n50 = quast.loc['N50'].to_dict() n75 = quast.loc['N75'].to_dict() predict_genes = quast.loc['# predicted genes (unique)'].to_dict() score[min(number_contigs, key=number_contigs.get)] += 1 score[max(largest_contig, key=largest_contig.get)] += 1 score[max(total_length, key=total_length.get)] += 1 score[max(n50, key=n50.get)] += 1 score[max(n75, key=n75.get)] += 1 score[max(predict_genes, key=predict_genes.get)] += 3 assembly = max(score, key=score.get) print(score) if assembly == 'spades': copy(f'{input.spades}', f'{output[0]}') elif assembly == 'skesa': copy(f'{input.skesa}', f'{output[0]}') |
109 110 | shell: "busco {params.busco} --cpu {threads} -i {input} --out_path $(dirname {output}) -o $(basename {output}) 2> {log}" |
15 16 | shell: "trim_galore {params.trim} --basename {wildcards.sample} --cores {threads} --output_dir {output.out_dir} {input} 2> {log}" |
Support
- Future updates
Related Workflows





