ZARP: An automated workflow for processing of RNA-seq data
Help improve this workflow!
This workflow has been published but could be further improved with some additional meta data:- Keyword(s) in categories input, output
You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .
ZARP ( Zavolan-Lab Automated RNA-Seq Pipeline)
...is a generic RNA-Seq analysis workflow that allows users to process and analyze Illumina short-read sequencing libraries with minimum effort. The workflow relies on publicly available bioinformatics tools and currently handles single or paired-end stranded bulk RNA-seq data. The workflow is developed in Snakemake , a widely used workflow management system in the bioinformatics community.
According to the current ZARP implementation, reads are analyzed (pre-processed, aligned, quantified) with state-of-the-art tools to give meaningful initial insights into the quality and composition of an RNA-Seq library, reducing hands-on time for bioinformaticians and giving experimentalists the possibility to rapidly assess their data. Additional reports summarise the results of the individual steps and provide useful visualisations.
Requirements
The workflow has been tested on:
- CentOS 7.5
- Debian 10
- Ubuntu 16.04, 18.04
NOTE: Currently, we only support Linux execution.
Code Snippets
210 211 212 | shell: "(cat {input.reads} > {output.reads}) \ 1> {log.stdout} 2> {log.stderr} " |
261 262 263 264 265 266 267 | shell: "(mkdir -p {output.outdir}; \ fastqc --outdir {output.outdir} \ --threads {threads} \ {params.additional_params} \ {input.reads}) \ 1> {log.stdout} 2> {log.stderr}" |
317 318 319 320 321 322 323 | shell: "(mkdir -p {output.outdir}; \ fastqc --outdir {output.outdir} \ --threads {threads} \ {params.additional_params} \ {input.reads}) \ 1> {log.stdout} 2> {log.stderr}" |
388 389 390 391 392 393 394 395 396 397 398 399 400 | shell: "(mkdir -p {params.output_dir}; \ chmod -R 777 {params.output_dir}; \ STAR \ --runMode genomeGenerate \ --sjdbOverhang {params.sjdbOverhang} \ --genomeDir {params.output_dir} \ --genomeFastaFiles {input.genome} \ --runThreadN {threads} \ --outFileNamePrefix {params.outFileNamePrefix} \ --sjdbGTFfile {input.gtf}) \ {params.additional_params} \ 1> {log.stdout} 2> {log.stderr}" |
434 435 436 437 | shell: "(sort \ -k1,1 -k4,4n -k5,5nr {input.gtf} > {output.gtf} \ ) 1> {log.stdout} 2> {log.stderr}" |
481 482 483 484 485 486 487 | shell: "(gffread \ -w {output.transcriptome} \ -g {input.genome} \ {params.additional_params} \ {input.gtf}) \ 1> {log.stdout} 2> {log.stderr}" |
523 524 525 526 | shell: "(cat {input.transcriptome} {input.genome} \ 1> {output.genome_transcriptome}) \ 2> {log.stderr}" |
583 584 585 586 587 588 589 590 591 | shell: "(salmon index \ --transcripts {input.genome_transcriptome} \ --decoys {input.chr_names} \ --index {output.index} \ --kmerLen {params.kmerLen} \ --threads {threads}) \ {params.additional_params} \ 1> {log.stdout} 2> {log.stderr}" |
626 627 628 629 630 631 632 633 | shell: "(mkdir -p {params.output_dir}; \ chmod -R 777 {params.output_dir}; \ kallisto index \ {params.additional_params} \ -i {output.index} \ {input.transcriptome}) \ 1> {log.stdout} 2> {log.stderr}" |
673 674 675 676 677 678 | shell: "(gtf2bed12 \ --gtf {input.gtf} \ --bed12 {output.bed12}); \ {params.additional_params} \ 1> {log.stdout} 2> {log.stderr}" |
729 730 731 732 733 734 735 | shell: "(samtools sort \ -o {output.bam} \ -@ {threads} \ {params.additional_params} \ {input.bam}) \ 1> {log.stdout} 2> {log.stderr}" |
786 787 788 789 790 | shell: "(samtools index \ {params.additional_params} \ {input.bam} {output.bai};) \ 1> {log.stdout} 2> {log.stderr}" |
859 860 861 862 863 864 865 866 | shell: "(calculate-tin.py \ -i {input.bam} \ -r {input.transcripts_bed12} \ --names {params.sample} \ -p {threads} \ {params.additional_params} \ > {output.TIN_score};) 2> {log.stderr}" |
945 946 947 948 949 950 951 952 953 | shell: "(salmon quantmerge \ --quants {params.salmon_in} \ --genes \ --names {params.sample_name_list} \ --column {params.salmon_merge_on} \ --output {output.salmon_out};) \ {params.additional_params} \ 1> {log.stdout} 2> {log.stderr}" |
1032 1033 1034 1035 1036 1037 1038 1039 | shell: "(salmon quantmerge \ --quants {params.salmon_in} \ --names {params.sample_name_list} \ --column {params.salmon_merge_on} \ --output {output.salmon_out}) \ {params.additional_params} \ 1> {log.stdout} 2> {log.stderr}" |
1116 1117 1118 1119 1120 1121 1122 1123 1124 | shell: "(merge_kallisto.R \ --input {params.tables} \ --names {params.sample_name_list} \ --txOut FALSE \ --anno {input.gtf} \ --output {params.dir_out} \ {params.additional_params} ) \ 1> {log.stdout} 2> {log.stderr}" |
1197 1198 1199 1200 1201 1202 1203 | shell: "(merge_kallisto.R \ --input {params.tables} \ --names {params.sample_name_list} \ --output {params.dir_out} \ {params.additional_params}) \ 1> {log.stdout} 2> {log.stderr}" |
1242 1243 1244 1245 1246 1247 | shell: "(zpca-tpm \ --tpm {input.tpm} \ --out {output.out} \ {params.additional_params}) \ 1> {log.stdout} 2> {log.stderr}" |
1284 1285 1286 1287 1288 1289 | shell: "(zpca-tpm \ --tpm {input.tpm} \ --out {output.out} \ {params.additional_params}) \ 1> {log.stdout} 2> {log.stderr}" |
1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 | shell: "(mkdir -p {params.out_dir}; \ chmod -R 777 {params.out_dir}; \ STAR \ --runMode inputAlignmentsFromBAM \ --runThreadN {threads} \ --inputBAMfile {input.bam} \ --outWigType bedGraph \ --outFileNamePrefix {params.prefix}) \ {params.additional_params} \ 1> {log.stdout} 2> {log.stderr}" |
1485 1486 1487 1488 | shell: "(cp {input.plus} {output.plus}; \ cp {input.minus} {output.minus};) \ 1>{log.stdout} 2>{log.stderr}" |
1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 | shell: "(mkdir -p {output.temp_dir}; \ alfa -a {input.gtf} \ -g {params.genome_index} \ --chr_len {input.chr_len} \ --temp_dir {output.temp_dir} \ -p {threads} \ -o {params.out_dir} \ {params.additional_params}) \ &> {log}" |
1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 | shell: "(mkdir -p {output.temp_dir};\ cd {params.out_dir}; \ alfa \ -g {params.genome_index} \ --bedgraph {params.plus} {params.minus} {params.name} \ -s {params.alfa_orientation} \ --temp_dir {params.temp_dir} \ {params.additional_params}) \ &> {log}" |
1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 | shell: "(python {input.script} \ --config {output.multiqc_config} \ --intro-text '{params.multiqc_intro_text}' \ --custom-logo '{params.logo_path}' \ --url '{params.url}' \ --author-name '{params.author_name}' \ --author-email '{params.author_email}' \ {params.additional_params}) \ 1> {log.stdout} 2> {log.stderr}" |
1861 1862 1863 1864 1865 1866 1867 1868 | shell: "(multiqc \ --outdir {output.multiqc_report} \ --config {input.multiqc_config} \ {params.additional_params} \ {params.results_dir} \ {params.log_dir};) \ 1> {log.stdout} 2> {log.stderr}" |
1916 1917 1918 1919 1920 | shell: "(sortBed \ -i {input.bg} \ {params.additional_params} \ > {output.sorted_bg};) 2> {log.stderr}" |
1979 1980 1981 1982 1983 1984 1985 | shell: "(bedGraphToBigWig \ {params.additional_params} \ {input.sorted_bg} \ {input.chr_sizes} \ {output.bigWig};) \ 1> {log.stdout} 2> {log.stderr}" |
Support
- Future updates
Related Workflows





