Help improve this workflow!
This workflow has been published but could be further improved with some additional meta data:- Keyword(s) in categories input, output, operation, topic
You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .
This is the template for a new Snakemake workflow. Replace this text with a comprehensive description covering the purpose and domain.
Insert your code into the respective folders, i.e.
scripts
,
rules
, and
envs
. Define the entry point of the workflow in the
Snakefile
and the main configuration in the
config.yaml
file.
Authors
- Chao Di (@dic)
Usage
If you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this (original) repository and, if available, its DOI (see above).
Step 1: Obtain a copy of this workflow
-
Create a new github repository using this workflow as a template .
-
Clone the newly created repository to your local system, into the place where you want to perform the data analysis.
Step 2: Configure workflow
Configure the workflow according to your needs via editing the files in the
config/
folder. Adjust
config.yaml
to configure the workflow execution, and
samples.tsv
to specify your sample setup.
Step 3: Install Snakemake
Install Snakemake using conda :
conda create -c bioconda -c conda-forge -n snakemake snakemake
For installation details, see the instructions in the Snakemake documentation .
Step 4: Execute workflow
Activate the conda environment:
conda activate snakemake
Test your configuration by performing a dry-run via
snakemake --use-conda -n
Execute the workflow locally via
snakemake --use-conda --cores $N
using
$N
cores or run it in a cluster environment via
snakemake --use-conda --cluster qsub --jobs 100
or
snakemake --use-conda --drmaa --jobs 100
If you not only want to fix the software stack but also the underlying OS, use
snakemake --use-conda --use-singularity
in combination with any of the modes above. See the Snakemake documentation for further details.
Step 5: Investigate results
After successful execution, you can create a self-contained interactive HTML report with all results via:
snakemake --report report.html
This report can, e.g., be forwarded to your collaborators. An example (using some trivial test data) can be seen here .
Step 6: Commit changes
Whenever you change something, don't forget to commit the changes back to your github copy of the repository:
git commit -a
git push
Step 7: Obtain updates from upstream
Whenever you want to synchronize your workflow copy with new developments from upstream, do the following.
-
Once, register the upstream repository in your local copy:
git remote add -f upstream git@github.com:snakemake-workflows/J2seq.git
orgit remote add -f upstream https://github.com/snakemake-workflows/J2seq.git
if you do not have setup ssh keys. -
Update the upstream version:
git fetch upstream
. -
Create a diff with the current version:
git diff HEAD upstream/master workflow > upstream-changes.diff
. -
Investigate the changes:
vim upstream-changes.diff
. -
Apply the modified diff via:
git apply upstream-changes.diff
. -
Carefully check whether you need to update the config files:
git diff HEAD upstream/master config
. If so, do it manually, and only where necessary, since you would otherwise likely overwrite your settings and samples.
Step 8: Contribute back
In case you have also changed or added steps, please consider contributing them back to the original repository:
-
Fork the original repo to a personal or lab account.
-
Clone the fork to your local system, to a different place than where you ran your analysis.
-
Copy the modified files from your analysis to the clone of your fork, e.g.,
cp -r workflow path/to/fork
. Make sure to not accidentally copy config file contents or sample sheets. Instead, manually update the example config files if necessary. -
Commit and push your changes to your fork.
-
Create a pull request against the original repository.
Testing
Test cases are in the subfolder
.test
. They are automatically executed via continuous integration with
Github Actions
.
Code Snippets
12 13 14 15 | shell: ''' samtools view -h {input} Ad5 -b > {output} 2> {log} ''' |
26 27 28 29 | shell: ''' samtools index {input} {output.bai} ''' |
28 29 30 31 | shell: ''' multiqc ../results/fastq_screen_output -o {params.outdir} &> {log} ''' |
16 17 | script: "../scripts/featureCount.R" |
34 35 | script: "../scripts/featureCount.R" |
52 53 | script: "../scripts/featureCount.R" |
68 69 | script: "../scripts/featureCount_segments.R" |
34 35 | shell: "samtools index {input} {output}" |
31 32 | shell: "samtools index {input} {output}" |
42 43 44 45 46 | shell: ''' rm -r ../results/multiqc/star/ multiqc STAR_align -o {params.outdir} &> {log} ''' |
17 18 19 20 21 22 | shell: ''' TEcount --sortByPos --format BAM --mode multi -b {input} \\ --GTF {params.gene} --TE {params.TE} \\ --stranded forward --project "../results/TEcount/TEcount.{wildcards.sample}" ''' |
32 33 34 35 36 37 38 39 40 41 42 | shell: ''' rm -f {output}; cut -f1 ../results/TEcount/TEcount.Ad5input1.cntTable | sed 's/"//g;s/gene\/TE/gene_TE/g' > {output}; for i in {input}; do cut -f2 $i | sed 's/merged_bam\///g;s/_merged_dedup_sorted.bam//g' > foo; paste {output} foo > foo1; mv foo1 {output}; done rm -f foo foo1 ''' |
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | library(Rsubread) library(dplyr) library(mgsub) log <- file(snakemake@log[[1]], open="wt") sink(log) sink(log, type="message") ## Count RPFs (normalized in RPKM) on CDS for each gene, using `featureCounts` ## run all bams together samples <- read.table(snakemake@input[["samples"]], header=T) bamfiles <- paste0("./merged_bam/", as.vector(samples$sample),"_merged_dedup_sorted.bam") ## run one bam file # bamfiles <- snakemake@input[["bamfile"]] RPFcounts <- featureCounts(files=bamfiles, annot.ext=snakemake@input[['gtf']], isGTFAnnotationFile=TRUE, GTF.featureType=snakemake@params[["featureType"]], GTF.attrType="gene_name", strandSpecific=snakemake@params[["strand"]], countMultiMappingReads=FALSE, juncCounts=TRUE, nthreads=snakemake@threads[[1]]) write.table(RPFcounts$counts, file=snakemake@output[[1]], sep="\t", quote=F, row.names = TRUE, col.names = NA) |
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | library(Rsubread) library(dplyr) library(mgsub) log <- file(snakemake@log[[1]], open="wt") sink(log) sink(log, type="message") ## Count RPFs (normalized in RPKM) on CDS for each gene, using `featureCounts` ## run all bams together samples <- read.table(snakemake@input[["samples"]], header=T) bamfiles <- paste0("./merged_bam/", as.vector(samples$sample),"_merged_dedup_sorted.bam") ## run one bam file # bamfiles <- snakemake@input[["bamfile"]] RPFcounts <- featureCounts(files=bamfiles, annot.ext=snakemake@input[['saf']], isGTFAnnotationFile=FALSE, fracOverlap=1, strandSpecific=snakemake@params[["strand"]], countMultiMappingReads=FALSE, juncCounts=TRUE, nthreads=snakemake@threads[[1]]) write.table(RPFcounts$counts, file=snakemake@output[[1]], sep="\t", quote=F, row.names = TRUE, col.names = NA) |
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | __author__ = "Julian de Ruiter" __copyright__ = "Copyright 2017, Julian de Ruiter" __email__ = "julianderuiter@gmail.com" __license__ = "MIT" from os import path from snakemake.shell import shell input_dirs = set(path.dirname(fp) for fp in snakemake.input) output_dir = path.dirname(snakemake.output[0]) output_name = path.basename(snakemake.output[0]) log = snakemake.log_fmt_shell(stdout=True, stderr=True) shell( "multiqc" " {snakemake.params}" " --force" " -o {output_dir}" " -n {output_name}" " {input_dirs}" " {log}") |
Support
- Future updates
Related Workflows





