circRNA quantification, differential expression analysis and miRNA target prediction of RNA-Seq data
Help improve this workflow!
This workflow has been published but could be further improved with some additional meta data:- Keyword(s) in categories input, output, operation
You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .
nf-core/circrna is a best-practice analysis pipeline for the quantification, miRNA target prediction and differential expression analysis of circular RNAs in paired-end RNA sequencing data.
The pipeline is built using Nextflow , a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker containers making installation trivial and results highly reproducible.
-
Install
Nextflow
(>=21.04.0
) -
Install any of
Docker
,Singularity
,Podman
,Shifter
orCharliecloud
for full pipeline reproducibility (please only useConda
as a last resort; see docs ) -
Download the pipeline and test it on a minimal dataset with a single command:
nextflow run nf-core/circrna -profile test,
-
Start running your own analysis!
nextflow run nf-core/circrna -profile --module 'circrna_discovery, mirna_prediction, differential_expression' --tool 'circexplorer2' --input 'samples.csv' --input_type 'fastq' --phenotype 'phenotype.csv'
-
Code Snippets
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | """ grep -vf ${workflow.projectDir}/bin/unwanted_biotypes.txt $gtf > filt.gtf mv $bed circs.bed annotate_outputs.sh $exon_boundary &> ${prefix}.log mv master_bed12.bed ${prefix}.bed.tmp awk -v FS="\t" '{print \$11}' ${prefix}.bed.tmp > mature_len.tmp awk -v FS="," '{for(i=t=0;i<NF;) t+=\$++i; \$0=t}1' mature_len.tmp > mature_length paste ${prefix}.bed.tmp mature_length > ${prefix}.bed cat <<-END_VERSIONS > versions.yml "${task.process}": bedtools: \$(bedtools --version | sed -e "s/bedtools v//g") ucsc: $VERSION END_VERSIONS """ |
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | """ # remove redundant biotypes from GTF. grep -vf ${workflow.projectDir}/bin/unwanted_biotypes.txt $gtf > filt.gtf # generate circrna BED file. tail -n +2 $circrna_matrix | awk '{print \$1}' > IDs.txt ID_to_BED.sh IDs.txt cat *.bed > merged.txt && rm IDs.txt && rm *.bed && mv merged.txt circs.bed # Re-use annotation script to identify the host gene. annotate_outputs.sh $exon_boundary &> annotation.log awk -v OFS="\t" '{print \$4, \$14}' master_bed12.bed > circrna_host-gene.txt cat <<-END_VERSIONS > versions.yml "${task.process}": bedtools: \$(bedtools --version | sed -e "s/bedtools v//g") ucsc: $VERSION END_VERSIONS """ |
19 20 21 22 23 | """ awk '{if(\$13 >= ${bsj_reads}) print \$0}' ${prefix}.txt | awk -v OFS="\t" '{print \$1,\$2,\$3,\$6,\$13}' > ${prefix}_${meta.tool}.bed awk -v OFS="\t" '{print \$1, \$2, \$3, \$1":"\$2"-"\$3":"\$4, \$5, \$4}' ${prefix}_${meta.tool}.bed > ${prefix}_${meta.tool}_circs.bed """ |
24 25 26 27 28 29 30 31 32 33 34 35 36 | """ gtfToGenePred \ $args \ $gtf \ ${prefix}.genepred awk -v OFS="\t" '{print \$12, \$1, \$2, \$3, \$4, \$5, \$6, \$7, \$8, \$9, \$10}' ${prefix}.genepred > ${prefix}.txt cat <<-END_VERSIONS > versions.yml "${task.process}": ucsc: $VERSION END_VERSIONS """ |
27 28 29 30 31 32 33 34 35 36 37 38 39 | """ mkdir -p star_dir && mv *.tab *.junction *.sam star_dir postProcessStarAlignment.pl --starDir star_dir/ --outDir ./ awk '{if(\$5 >= ${bsj_reads}) print \$0}' ${prefix}.filteredJunctions.bed | awk -v OFS="\t" -F"\t" '{print \$1,\$2,\$3,\$6,\$5}' > ${prefix}_circrna_finder.bed awk -v OFS="\t" '{print \$1, \$2, \$3, \$1":"\$2"-"\$3":"\$4, \$5, \$4}' ${prefix}_circrna_finder.bed > ${prefix}_circrna_finder_circs.bed cat <<-END_VERSIONS > versions.yml "${task.process}": circRNA_finder: $VERSION END_VERSIONS """ |
24 25 26 27 28 29 30 31 | """ prepare_circ_test.R cat <<-END_VERSIONS > versions.yml "${task.process}": r-base: \$(echo \$(R --version 2>&1) | sed 's/^.*R version //; s/ .*\$//') END_VERSIONS """ |
22 23 24 25 26 27 28 29 30 31 32 | """ circ_test.R $circ_csv $linear_csv $phenotype cat <<-END_VERSIONS > versions.yml "${task.process}": r-base: \$(echo \$(R --version 2>&1) | sed 's/^.*R version //; s/ .*\$//') aod: \$(Rscript -e "library(aod); cat(as.character(packageVersion('aod')))") ggplot2: \$(Rscript -e "library(ggplot2); cat(as.character(packageVersion('ggplot2')))") plyr: \$(Rscript -e "library(plyr); cat(as.character(packageVersion('plyr')))") END_VERSIONS """ |
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | """ CIRIquant \\ -t ${task.cpus} \\ -1 ${reads[0]} \\ -2 ${reads[1]} \\ --config $yml \\ --no-gene \\ -o ${prefix} \\ -p ${prefix} cat <<-END_VERSIONS > versions.yml "${task.process}": bwa: \$(echo \$(bwa 2>&1) | sed 's/^.*Version: //; s/Contact:.*\$//') ciriquant : \$(echo \$(CIRIquant --version 2>&1) | sed 's/CIRIquant //g' ) samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//') stringtie: \$(stringtie --version 2>&1) hisat2: $VERSION END_VERSIONS """ |
18 19 20 21 22 23 24 25 26 27 28 29 30 | """ grep -v "#" ${prefix}.gtf | awk '{print \$14}' | cut -d '.' -f1 > counts grep -v "#" ${prefix}.gtf | awk -v OFS="\t" '{print \$1,\$4,\$5,\$7}' > ${prefix}.tmp paste ${prefix}.tmp counts > ${prefix}_unfilt.bed awk '{if(\$5 >= ${bsj_reads}) print \$0}' ${prefix}_unfilt.bed > ${prefix}_filt.bed grep -v '^\$' ${prefix}_filt.bed > ${prefix}_ciriquant awk -v OFS="\t" '{\$2-=1;print}' ${prefix}_ciriquant > ${prefix}_ciriquant.bed rm ${prefix}.gtf awk -v OFS="\t" '{print \$1, \$2, \$3, \$1":"\$2"-"\$3":"\$4, \$5, \$4}' ${prefix}_ciriquant.bed > ${prefix}_ciriquant_circs.bed """ |
28 29 30 31 32 33 34 35 36 | """ BWA=`which bwa` HISAT2=`which hisat2` STRINGTIE=`which stringtie` SAMTOOLS=`which samtools` touch travis.yml printf "name: ciriquant\ntools:\n bwa: \$BWA\n hisat2: \$HISAT2\n stringtie: \$STRINGTIE\n samtools: \$SAMTOOLS\n\nreference:\n fasta: ${fasta_path}\n gtf: ${gtf_path}\n bwa_index: ${bwa_path}/${bwa_prefix}\n hisat_index: ${hisat2_path}/${hisat2_prefix}" >> travis.yml """ |
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | """ python ${workflow.projectDir}/bin/circRNA_counts_matrix.py > matrix.txt ## handle non-canon chromosomes here (https://stackoverflow.com/questions/71479919/joining-columns-based-on-number-of-fields) n_samps=\$(ls *.bed | wc -l) canon=\$(awk -v a="\$n_samps" 'BEGIN {print a + 4}') awk -v n="\$canon" '{ for (i = 2; i <= NF - n + 1; ++i) { \$1 = \$1"-"\$i; \$i=""; } } 1' matrix.txt | awk -v OFS="\t" '\$1=\$1' > circRNA_matrix.txt Rscript ${workflow.projectDir}/bin/reformat_count_matrix.R cat <<-END_VERSIONS > versions.yml "${task.process}": r-base: \$(echo \$(R --version 2>&1) | sed 's/^.*R version //; s/ .*\$//') argparser: \$(Rscript -e "library(arparser); cat(as.character(packageVersion('argparser')))") dplyr: \$(Rscript -e "library(dplyr); cat(as.character(packageVersion('dplyr')))") END_VERSIONS """ |
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | """ ## make list of files for R to read ls *.bed > samples.csv ## Add catch for empty bed file and delete bash ${workflow.projectDir}/bin/check_empty.sh ## Use intersection of "n" (params.tool_filter) circRNAs called by tools ## remove duplicate IDs, keep highest count. Rscript ${workflow.projectDir}/bin/consolidate_algorithms_intersection.R samples.csv $tool_filter $duplicates_fun mv combined_counts.bed ${prefix}.bed cat <<-END_VERSIONS > versions.yml "${task.process}": r-base: \$(echo \$(R --version 2>&1) | sed 's/^.*R version //; s/ .*\$//') argparser: \$(Rscript -e "library(arparser); cat(as.character(packageVersion('argparser')))") dplyr: \$(Rscript -e "library(dplyr); cat(as.character(packageVersion('dplyr')))") END_VERSIONS """ |
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | """ # Strip tool name from BED files (no consolidation prior to this step for 1 tool) for b in *.bed; do basename=\${b%".bed"}; sample_name=\${basename%"_${tool_name}"}; mv \$b \${sample_name}.bed done python ${workflow.projectDir}/bin/circRNA_counts_matrix.py > matrix.txt ## handle non-canon chromosomes here (https://stackoverflow.com/questions/71479919/joining-columns-based-on-number-of-fields) n_samps=\$(ls *.bed | wc -l) canon=\$(awk -v a="\$n_samps" 'BEGIN {print a + 4}') awk -v n="\$canon" '{ for (i = 2; i <= NF - n + 1; ++i) { \$1 = \$1"-"\$i; \$i=""; } } 1' matrix.txt | awk -v OFS="\t" '\$1=\$1' > circRNA_matrix.txt Rscript ${workflow.projectDir}/bin/reformat_count_matrix.R cat <<-END_VERSIONS > versions.yml "${task.process}": r-base: \$(echo \$(R --version 2>&1) | sed 's/^.*R version //; s/ .*\$//') argparser: \$(Rscript -e "library(arparser); cat(as.character(packageVersion('argparser')))") dplyr: \$(Rscript -e "library(dplyr); cat(as.character(packageVersion('dplyr')))") END_VERSIONS """ |
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | """ sed -i 's/^chr//g' $gtf mkdir ${prefix} && mv ${prefix}.Chimeric.out.junction ${prefix} && printf "${prefix}/${prefix}.Chimeric.out.junction" > samplesheet DCC @samplesheet -D -an $gtf -Pi -ss -F -M -Nr 1 1 -fg -A $fasta -N -T ${task.cpus} awk '{print \$6}' CircCoordinates >> strand paste CircRNACount strand | tail -n +2 | awk -v OFS="\t" '{print \$1,\$2,\$3,\$5,\$4}' >> ${prefix}.txt cat <<-END_VERSIONS > versions.yml "${task.process}": dcc: \$(DCC --version) END_VERSIONS """ |
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | """ sed -i 's/^chr//g' $gtf mkdir ${prefix} && mv ${prefix}.Chimeric.out.junction ${prefix} && printf "${prefix}/${prefix}.Chimeric.out.junction" > samplesheet mkdir ${prefix}_mate1 && mv ${prefix}_mate1.Chimeric.out.junction ${prefix}_mate1 && printf "${prefix}_mate1/${prefix}_mate1.Chimeric.out.junction" > mate1file mkdir ${prefix}_mate2 && mv ${prefix}_mate2.Chimeric.out.junction ${prefix}_mate2 && printf "${prefix}_mate2/${prefix}_mate2.Chimeric.out.junction" > mate2file DCC @samplesheet -mt1 @mate1file -mt2 @mate2file -D -an $gtf -Pi -ss -F -M -Nr 1 1 -fg -A $fasta -N -T ${task.cpus} awk '{print \$6}' CircCoordinates >> strand paste CircRNACount strand | tail -n +2 | awk -v OFS="\t" '{print \$1,\$2,\$3,\$5,\$4}' >> ${prefix}.txt cat <<-END_VERSIONS > versions.yml "${task.process}": dcc: \$(DCC --version) END_VERSIONS """ |
18 19 20 21 22 | """ awk '{if(\$5 >= ${bsj_reads}) print \$0}' ${prefix}.txt > ${prefix}_dcc.filtered awk -v OFS="\t" '{\$2-=1;print}' ${prefix}_dcc.filtered > ${prefix}_dcc.bed awk -v OFS="\t" '{print \$1, \$2, \$3, \$1":"\$2"-"\$3":"\$4, \$5, \$4}' ${prefix}_dcc.bed > ${prefix}_dcc_circs.bed """ |
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | """ ## prepDE && circRNA counts headers are sorted such that uppercase preceedes lowercase i.e Z before a ## reformat the phenotype file to match the order of the samples. head -n 1 $phenotype > header tail -n +2 $phenotype | LC_COLLATE=C sort > sorted_pheno cat header sorted_pheno > tmp && rm phenotype.csv && mv tmp phenotype.csv DEA.R $gene_matrix $phenotype $circrna_matrix $species ensembl_database_map.txt mv boxplots/ circRNA/ cat <<-END_VERSIONS > versions.yml "${task.process}": r-base: \$(echo \$(R --version 2>&1) | sed 's/^.*R version //; s/ .*\$//') argparser: \$(Rscript -e "library(argparser); cat(as.character(packageVersion('argparser')))") biomart: \$(Rscript -e "library(biomaRt); cat(as.character(packageVersion('biomaRt')))") deseq2: \$(Rscript -e "library(DESeq2); cat(as.character(packageVersion('DESeq2')))") dplyr: \$(Rscript -e "library(dplyr); cat(as.character(packageVersion('dplyr')))") enhancedvolcano: \$(Rscript -e "library(EnhancedVolcano); cat(as.character(packageVersion('EnhancedVolcano')))") gplots: \$(Rscript -e "library(gplots); cat(as.character(packageVersion('gplots')))") ggplot2: \$(Rscript -e "library(ggplot2); cat(as.character(packageVersion('ggplot2')))") ggpubr: \$(Rscript -e "library(ggpubr); cat(as.character(packageVersion('ggpubr')))") ihw: \$(Rscript -e "library(IHW); cat(as.character(packageVersion('IHW')))") pvclust: \$(Rscript -e "library(pvclust); cat(as.character(packageVersion('pvclust')))") pcatools: \$(Rscript -e "library(PCAtools); cat(as.character(packageVersion('PCAtools')))") pheatmap: \$(Rscript -e "library(pheatmap); cat(as.character(packageVersion('pheatmap')))") rcolorbrewer: \$(Rscript -e "library(RColorBrewer); cat(as.character(packageVersion('RColorBrewer')))") END_VERSIONS """ |
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | """ ## FASTA sequences (bedtools does not like the extra annotation info - split will not work properly) cut -d\$'\t' -f1-12 ${prefix}.bed > bed12.tmp bedtools getfasta -fi $fasta -bed bed12.tmp -s -split -name > circ_seq.tmp ## clean fasta header grep -A 1 '>' circ_seq.tmp | cut -d: -f1,2,3 > ${prefix}.fa && rm circ_seq.tmp ## add backsplice sequence for miRanda Targetscan, publish canonical FASTA to results. rm $fasta bash ${workflow.projectDir}/bin/backsplice_gen.sh ${prefix}.fa cat <<-END_VERSIONS > versions.yml "${task.process}": bedtools: \$(bedtools --version | sed -e "s/bedtools v//g") END_VERSIONS """ |
23 24 25 26 27 28 29 30 | """ unmapped2anchors.py $bam | gzip > ${prefix}_anchors.qfa.gz cat <<-END_VERSIONS > versions.yml "${task.process}": find_circ: $VERSION END_VERSIONS """ |
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | """ grep CIRCULAR $bed | \ grep -v chrM | \ awk '\$5>=${bsj_reads}' | \ grep UNAMBIGUOUS_BP | grep ANCHOR_UNIQUE | \ maxlength.py 100000 \ > ${prefix}.txt tail -n +2 ${prefix}.txt | awk -v OFS="\t" '{print \$1,\$2,\$3,\$6,\$5}' > ${prefix}_find_circ.bed awk -v OFS="\t" '{print \$1, \$2, \$3, \$1":"\$2"-"\$3":"\$4, \$5, \$4}' ${prefix}_find_circ.bed > ${prefix}_find_circ_circs.bed cat <<-END_VERSIONS > versions.yml "${task.process}": find_circ: $VERSION END_VERSIONS """ |
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | """ INDEX=`find -L ./ -name "*.rev.1.bt2" | sed "s/.rev.1.bt2//"` [ -z "\$INDEX" ] && INDEX=`find -L ./ -name "*.rev.1.bt2l" | sed "s/.rev.1.bt2l//"` [ -z "\$INDEX" ] && echo "Bowtie2 index files not found" 1>&2 && exit 1 bowtie2 \\ --threads $task.cpus \\ --reorder \\ --mm \\ -D 20 \\ --score-min=C,-15,0 \\ -q \\ -x \$INDEX \\ -U $anchors | \\ find_circ.py --genome=$fasta --prefix=${prefix} --stats=${prefix}.sites.log --reads=${prefix}.sites.reads > ${prefix}.sites.bed cat <<-END_VERSIONS > versions.yml "${task.process}": bowtie2: \$(echo \$(bowtie2 --version 2>&1) | sed 's/^.*bowtie2-align-s version //; s/ .*\$//') find_circ: $VERSION END_VERSIONS """ |
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | """ $handleGzip_R1 mapsplice.py \\ -c $chromosomes \\ -x $gtf_prefix \\ -1 ${read1} \\ -p ${task.cpus} \\ --bam \\ --gene-gtf $gtf \\ -o $prefix \\ $args cat <<-END_VERSIONS > versions.yml "${task.process}": mapsplice: $VERSION END_VERSIONS """ |
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 | """ $handleGzip_R1 $handleGzip_R2 mapsplice.py \\ -c $chromosomes \\ -x $gtf_prefix \\ -1 ${read1} \\ -2 ${read2} \\ -p ${task.cpus} \\ --bam \\ --gene-gtf $gtf \\ -o $prefix \\ $args cat <<-END_VERSIONS > versions.yml "${task.process}": mapsplice: $VERSION END_VERSIONS """ |
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | """ ## reformat and sort miRanda, TargetScan outputs, convert to BED for overlaps. tail -n +2 $targetscan | sort -k1,1 -k4n | awk -v OFS="\t" '{print \$1, \$2, \$4, \$5, \$9}' | awk -v OFS="\t" '{print \$2, \$3, \$4, \$1, "0", \$5}' > targetscan.bed tail -n +2 $miranda | sort -k2,2 -k7n | awk -v OFS="\t" '{print \$2, \$1, \$3, \$4, \$7, \$8}' | awk -v OFS="\t" '{print \$2, \$5, \$6, \$1, \$3, \$4}' | sed 's/^[^-]*-//g' > miranda.bed ## intersect, consolidate miRanda, TargetScan information about miRs. ## -wa to output miRanda hits - targetscan makes it difficult to resolve duplicate miRNAs at MRE sites. bedtools intersect -a miranda.bed -b targetscan.bed -wa > ${prefix}.mirnas.tmp bedtools intersect -a targetscan.bed -b miranda.bed | awk '{print \$6}' > mirna_type ## remove duplicate miRNA entries at MRE sites. ## strategy: sory by circs, sort by start position, sort by site type - the goal is to take the best site type (i.e rank site type found at MRE site). paste ${prefix}.mirnas.tmp mirna_type | sort -k3,3 -k2n -k7r | awk -v OFS="\t" '{print \$4,\$1,\$2,\$3,\$5,\$6,\$7}' | awk -F "\t" '{if (!seen[\$1,\$2,\$3,\$4,\$5,\$6]++)print}' | sort -k1,1 -k3n > ${prefix}.mirna_targets.tmp echo -e "circRNA\tmiRNA\tStart\tEnd\tScore\tEnergy_KcalMol\tSite_type" | cat - ${prefix}.mirna_targets.tmp > ${prefix}.mirna_targets.txt cat <<-END_VERSIONS > versions.yml "${task.process}": bedtools: \$(bedtools --version | sed -e "s/bedtools v//g") END_VERSIONS """ |
21 22 23 24 25 26 27 28 29 30 | """ check_samplesheet.py \\ $samplesheet \\ samplesheet.valid.csv cat <<-END_VERSIONS > versions.yml "${task.process}": python: \$(python --version | sed 's/Python //g') END_VERSIONS """ |
19 20 21 22 23 24 25 | """ grep ';C;' ${prefix}.sngl.bed | awk -v OFS="\t" '{print \$1,\$2,\$3,\$6}' | sort | uniq -c | awk -v OFS="\t" '{print \$2,\$3,\$4,\$5,\$1}' > ${prefix}_collapsed.bed awk -v OFS="\t" -v BSJ=${bsj_reads} '{if(\$5>=BSJ) print \$0}' ${prefix}_collapsed.bed > ${prefix}_segemehl.bed awk -v OFS="\t" '{print \$1, \$2, \$3, \$1":"\$2"-"\$3":"\$4, \$5, \$4}' ${prefix}_segemehl.bed > ${prefix}_segemehl_circs.bed """ |
15 16 17 | """ cat *.tab | awk -v BSJ=${bsj_reads} '(\$7 >= BSJ && \$6==0)' | cut -f1-6 | sort | uniq > dataset.SJ.out.tab """ |
20 21 22 23 24 | """ for file in \$(ls *.gtf); do sample_id=\${file%".transcripts.gtf"}; touch samples.txt; printf "\$sample_id\t\$file\n" >> samples.txt ; done prepDE.py -i samples.txt """ |
15 16 17 | """ bash ${workflow.projectDir}/bin/targetscan_format.sh $mature """ |
24 25 26 27 28 29 30 31 32 33 34 35 36 | """ ##format for targetscan cat $fasta | grep ">" | sed 's/>//g' > id cat $fasta | grep -v ">" > seq paste id seq | awk -v OFS="\t" '{print \$1, "0000", \$2}' > ${prefix}_ts.txt # run targetscan targetscan_70.pl mature.txt ${prefix}_ts.txt ${prefix}.txt cat <<-END_VERSIONS > versions.yml "${task.process}": targetscan: $VERSION END_VERSIONS """ |
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 | """ INDEX=`find -L ./ -name "*.rev.1.bt2" | sed "s/\\.rev.1.bt2\$//"` [ -z "\$INDEX" ] && INDEX=`find -L ./ -name "*.rev.1.bt2l" | sed "s/\\.rev.1.bt2l\$//"` [ -z "\$INDEX" ] && echo "Bowtie2 index files not found" 1>&2 && exit 1 bowtie2 \\ -x \$INDEX \\ $reads_args \\ --threads $task.cpus \\ $unaligned \\ $args \\ 2> ${prefix}.bowtie2.log \\ | samtools $samtools_command $args2 --threads $task.cpus -o ${prefix}.bam - if [ -f ${prefix}.unmapped.fastq.1.gz ]; then mv ${prefix}.unmapped.fastq.1.gz ${prefix}.unmapped_1.fastq.gz fi if [ -f ${prefix}.unmapped.fastq.2.gz ]; then mv ${prefix}.unmapped.fastq.2.gz ${prefix}.unmapped_2.fastq.gz fi cat <<-END_VERSIONS > versions.yml "${task.process}": bowtie2: \$(echo \$(bowtie2 --version 2>&1) | sed 's/^.*bowtie2-align-s version //; s/ .*\$//') samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//') pigz: \$( pigz --version 2>&1 | sed 's/pigz //g' ) END_VERSIONS """ |
22 23 24 25 26 27 28 29 | """ mkdir bowtie2 bowtie2-build $args --threads $task.cpus $fasta bowtie2/${fasta.baseName} cat <<-END_VERSIONS > versions.yml "${task.process}": bowtie2: \$(echo \$(bowtie2 --version 2>&1) | sed 's/^.*bowtie2-align-s version //; s/ .*\$//') END_VERSIONS """ |
22 23 24 25 26 27 28 29 | """ mkdir bowtie bowtie-build --threads $task.cpus $fasta bowtie/${fasta.baseName} cat <<-END_VERSIONS > versions.yml "${task.process}": bowtie: \$(echo \$(bowtie --version 2>&1) | sed 's/^.*bowtie-align-s version //; s/ .*\$//') END_VERSIONS """ |
22 23 24 25 26 27 28 29 30 31 32 33 34 | """ mkdir bwa bwa \\ index \\ $args \\ -p bwa/${fasta.baseName} \\ $fasta cat <<-END_VERSIONS > versions.yml "${task.process}": bwa: \$(echo \$(bwa 2>&1) | sed 's/^.*Version: //; s/Contact:.*\$//') END_VERSIONS """ |
37 38 39 40 41 42 43 44 45 46 47 48 49 50 | """ mkdir bwa touch bwa/genome.amb touch bwa/genome.ann touch bwa/genome.bwt touch bwa/genome.pac touch bwa/genome.sa cat <<-END_VERSIONS > versions.yml "${task.process}": bwa: \$(echo \$(bwa 2>&1) | sed 's/^.*Version: //; s/Contact:.*\$//') END_VERSIONS """ |
26 27 28 29 30 31 32 33 | """ cat ${readList.join(' ')} > ${prefix}.merged.fastq.gz cat <<-END_VERSIONS > versions.yml "${task.process}": cat: \$(echo \$(cat --version 2>&1) | sed 's/^.*coreutils) //; s/ .*\$//') END_VERSIONS """ |
40 41 42 43 44 45 46 47 48 | """ cat ${read1.join(' ')} > ${prefix}_1.merged.fastq.gz cat ${read2.join(' ')} > ${prefix}_2.merged.fastq.gz cat <<-END_VERSIONS > versions.yml "${task.process}": cat: \$(echo \$(cat --version 2>&1) | sed 's/^.*coreutils) //; s/ .*\$//') END_VERSIONS """ |
57 58 59 60 61 62 63 64 | """ touch ${prefix}.merged.fastq.gz cat <<-END_VERSIONS > versions.yml "${task.process}": cat: \$(echo \$(cat --version 2>&1) | sed 's/^.*coreutils) //; s/ .*\$//') END_VERSIONS """ |
68 69 70 71 72 73 74 75 76 | """ touch ${prefix}_1.merged.fastq.gz touch ${prefix}_2.merged.fastq.gz cat <<-END_VERSIONS > versions.yml "${task.process}": cat: \$(echo \$(cat --version 2>&1) | sed 's/^.*coreutils) //; s/ .*\$//') END_VERSIONS """ |
25 26 27 28 29 30 31 32 33 34 35 36 37 38 | """ CIRCexplorer2 \\ annotate \\ -r $gene_annotation \\ -g $fasta \\ -b $junctions \\ -o ${prefix}.txt \\ $args cat <<-END_VERSIONS > versions.yml "${task.process}": circexplorer2: \$(echo \$(CIRCexplorer2 --version 2>&1) ) END_VERSIONS """ |
42 43 44 45 46 47 48 49 | """ touch ${prefix}.txt cat <<-END_VERSIONS > versions.yml "${task.process}": circexplorer2: \$(echo \$(CIRCexplorer2 --version 2>&1) ) END_VERSIONS """ |
25 26 27 28 29 30 31 32 33 34 35 36 37 | """ CIRCexplorer2 \\ parse \\ $aligner \\ $fusions \\ -b ${prefix}.bed \\ $args cat <<-END_VERSIONS > versions.yml "${task.process}": circexplorer2: \$( echo \$(CIRCexplorer2 --version 2>&1) ) END_VERSIONS """ |
41 42 43 44 45 46 47 48 | """ touch ${prefix}.bed cat <<-END_VERSIONS > versions.yml "${task.process}": circexplorer2: \$( echo \$(CIRCexplorer2 --version 2>&1) ) END_VERSIONS """ |
28 29 30 31 32 33 34 35 36 37 38 | """ printf "%s %s\\n" $rename_to | while read old_name new_name; do [ -f "\${new_name}" ] || ln -s \$old_name \$new_name done fastqc $args --threads $task.cpus $renamed_files cat <<-END_VERSIONS > versions.yml "${task.process}": fastqc: \$( fastqc --version | sed -e "s/FastQC v//g" ) END_VERSIONS """ |
42 43 44 45 46 47 48 49 50 | """ touch ${prefix}.html touch ${prefix}.zip cat <<-END_VERSIONS > versions.yml "${task.process}": fastqc: \$( fastqc --version | sed -e "s/FastQC v//g" ) END_VERSIONS """ |
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | """ INDEX=`find -L ./ -name "*.1.ht2" | sed 's/\\.1.ht2\$//'` hisat2 \\ -x \$INDEX \\ -U $reads \\ $strandedness \\ --known-splicesite-infile $splicesites \\ --summary-file ${prefix}.hisat2.summary.log \\ --threads $task.cpus \\ $seq_center \\ $unaligned \\ $args \\ | samtools view -bS -F 4 -F 256 - > ${prefix}.bam cat <<-END_VERSIONS > versions.yml "${task.process}": hisat2: $VERSION samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//') END_VERSIONS """ |
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 | """ INDEX=`find -L ./ -name "*.1.ht2" | sed 's/\\.1.ht2\$//'` hisat2 \\ -x \$INDEX \\ -1 ${reads[0]} \\ -2 ${reads[1]} \\ $strandedness \\ --known-splicesite-infile $splicesites \\ --summary-file ${prefix}.hisat2.summary.log \\ --threads $task.cpus \\ $seq_center \\ $unaligned \\ --no-mixed \\ --no-discordant \\ $args \\ | samtools view -bS -F 4 -F 8 -F 256 - > ${prefix}.bam if [ -f ${prefix}.unmapped.fastq.1.gz ]; then mv ${prefix}.unmapped.fastq.1.gz ${prefix}.unmapped_1.fastq.gz fi if [ -f ${prefix}.unmapped.fastq.2.gz ]; then mv ${prefix}.unmapped.fastq.2.gz ${prefix}.unmapped_2.fastq.gz fi cat <<-END_VERSIONS > versions.yml "${task.process}": hisat2: $VERSION samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//') END_VERSIONS """ |
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 | """ mkdir hisat2 $extract_exons hisat2-build \\ -p $task.cpus \\ $ss \\ $exon \\ $args \\ $fasta \\ hisat2/${fasta.baseName} cat <<-END_VERSIONS > versions.yml "${task.process}": hisat2: $VERSION END_VERSIONS """ |
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | """ miranda \\ $mirbase \\ $query \\ $args \\ -out ${prefix}.out echo "miRNA\tTarget\tScore\tEnergy_KcalMol\tQuery_Start\tQuery_End\tSubject_Start\tSubject_End\tAln_len\tSubject_Identity\tQuery_Identity" > ${prefix}.txt grep -A 1 "Scores for this hit:" ${prefix}.out | sort | grep ">" | cut -c 2- | tr ' ' '\t' >> ${prefix}.txt cat <<-END_VERSIONS > versions.yml "${task.process}": miranda: \$(echo \$(miranda -v | sed -n 4p | sed 's/^.*miranda v//; s/microRNA.*\$//' )) END_VERSIONS """ |
42 43 44 45 46 47 48 49 | """ touch ${prefix}.txt cat <<-END_VERSIONS > versions.yml "${task.process}": miranda: \$(echo \$(miranda -v | sed -n 4p | sed 's/^.*miranda v//; s/microRNA.*\$//' )) END_VERSIONS """ |
28 29 30 31 32 33 34 35 36 37 38 39 40 | """ multiqc \\ --force \\ $args \\ $config \\ $extra_config \\ . cat <<-END_VERSIONS > versions.yml "${task.process}": multiqc: \$( multiqc --version | sed -e "s/multiqc, version //g" ) END_VERSIONS """ |
43 44 45 46 47 48 49 50 51 52 | """ touch multiqc_data touch multiqc_plots touch multiqc_report.html cat <<-END_VERSIONS > versions.yml "${task.process}": multiqc: \$( multiqc --version | sed -e "s/multiqc, version //g" ) END_VERSIONS """ |
24 25 26 27 28 29 30 31 32 33 34 35 | """ samtools \\ index \\ -@ ${task.cpus-1} \\ $args \\ $input cat <<-END_VERSIONS > versions.yml "${task.process}": samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//') END_VERSIONS """ |
38 39 40 41 42 43 44 45 46 47 | """ touch ${input}.bai touch ${input}.crai touch ${input}.csi cat <<-END_VERSIONS > versions.yml "${task.process}": samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//') END_VERSIONS """ |
25 26 27 28 29 30 31 | """ samtools sort $args -@ $task.cpus -o ${prefix}.bam -T $prefix $bam cat <<-END_VERSIONS > versions.yml "${task.process}": samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//') END_VERSIONS """ |
35 36 37 38 39 40 41 42 | """ touch ${prefix}.bam cat <<-END_VERSIONS > versions.yml "${task.process}": samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//') END_VERSIONS """ |
38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | """ samtools \\ view \\ --threads ${task.cpus-1} \\ ${reference} \\ ${readnames} \\ $args \\ -o ${prefix}.${file_type} \\ $input \\ $args2 cat <<-END_VERSIONS > versions.yml "${task.process}": samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//') END_VERSIONS """ |
57 58 59 60 61 62 63 64 65 | """ touch ${prefix}.bam touch ${prefix}.cram cat <<-END_VERSIONS > versions.yml "${task.process}": samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//') END_VERSIONS """ |
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | """ mkdir -p $prefix segemehl.x \\ -t $task.cpus \\ -d $fasta \\ -i $index \\ $reads \\ $args \\ -o ${prefix}/${prefix}.${suffix} cat <<-END_VERSIONS > versions.yml "${task.process}": segemehl: \$(echo \$(segemehl.x 2>&1 | grep "ge5dee" | awk -F Z '{print substr(\$1, 2, 6)}' )) END_VERSIONS """ |
47 48 49 50 51 52 53 54 55 | """ mkdir -p $prefix touch ${prefix}/${prefix}.${suffix} cat <<-END_VERSIONS > versions.yml "${task.process}": segemehl: \$(echo \$(segemehl.x 2>&1 | grep "ge5dee" | awk -F Z '{print substr(\$1, 2, 6)}' )) END_VERSIONS """ |
23 24 25 26 27 28 29 30 31 32 33 34 | """ segemehl.x \\ -t $task.cpus \\ -d $fasta \\ -x ${prefix}.idx \\ $args cat <<-END_VERSIONS > versions.yml "${task.process}": segemehl: \$(echo \$(segemehl.x 2>&1 | grep "ge5dee" | awk -F Z '{print substr(\$1, 2, 6)}' )) END_VERSIONS """ |
38 39 40 41 42 43 44 45 | """ touch ${prefix}.idx cat <<-END_VERSIONS > versions.yml "${task.process}": segemehl: \$(echo \$(segemehl.x 2>&1 | grep "ge5dee" | awk -F Z '{print substr(\$1, 2, 6)}' )) END_VERSIONS """ |
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 | """ STAR \\ --genomeDir $index \\ --readFilesIn $reads \\ --runThreadN $task.cpus \\ --outFileNamePrefix $prefix. \\ $out_sam_type \\ $ignore_gtf \\ $seq_center \\ $args $mv_unsorted_bam if [ -f ${prefix}.Unmapped.out.mate1 ]; then mv ${prefix}.Unmapped.out.mate1 ${prefix}.unmapped_1.fastq gzip ${prefix}.unmapped_1.fastq fi if [ -f ${prefix}.Unmapped.out.mate2 ]; then mv ${prefix}.Unmapped.out.mate2 ${prefix}.unmapped_2.fastq gzip ${prefix}.unmapped_2.fastq fi cat <<-END_VERSIONS > versions.yml "${task.process}": star: \$(STAR --version | sed -e "s/STAR_//g") samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//') gawk: \$(echo \$(gawk --version 2>&1) | sed 's/^.*GNU Awk //; s/, .*\$//') END_VERSIONS """ |
76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 | """ touch ${prefix}Xd.out.bam touch ${prefix}.Log.final.out touch ${prefix}.Log.out touch ${prefix}.Log.progress.out touch ${prefix}.sortedByCoord.out.bam touch ${prefix}.toTranscriptome.out.bam touch ${prefix}.Aligned.unsort.out.bam touch ${prefix}.unmapped_1.fastq.gz touch ${prefix}.unmapped_2.fastq.gz touch ${prefix}.tab touch ${prefix}.Chimeric.out.junction touch ${prefix}.out.sam cat <<-END_VERSIONS > versions.yml "${task.process}": star: \$(STAR --version | sed -e "s/STAR_//g") samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//') gawk: \$(echo \$(gawk --version 2>&1) | sed 's/^.*GNU Awk //; s/, .*\$//') END_VERSIONS """ |
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | """ mkdir star STAR \\ --runMode genomeGenerate \\ --genomeDir star/ \\ --genomeFastaFiles $fasta \\ --sjdbGTFfile $gtf \\ --runThreadN $task.cpus \\ $memory \\ $args cat <<-END_VERSIONS > versions.yml "${task.process}": star: \$(STAR --version | sed -e "s/STAR_//g") samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//') gawk: \$(echo \$(gawk --version 2>&1) | sed 's/^.*GNU Awk //; s/, .*\$//') END_VERSIONS """ |
45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 | """ samtools faidx $fasta NUM_BASES=`gawk '{sum = sum + \$2}END{if ((log(sum)/log(2))/2 - 1 > 14) {printf "%.0f", 14} else {printf "%.0f", (log(sum)/log(2))/2 - 1}}' ${fasta}.fai` mkdir star STAR \\ --runMode genomeGenerate \\ --genomeDir star/ \\ --genomeFastaFiles $fasta \\ --sjdbGTFfile $gtf \\ --runThreadN $task.cpus \\ --genomeSAindexNbases \$NUM_BASES \\ $memory \\ $args cat <<-END_VERSIONS > versions.yml "${task.process}": star: \$(STAR --version | sed -e "s/STAR_//g") samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//') gawk: \$(echo \$(gawk --version 2>&1) | sed 's/^.*GNU Awk //; s/, .*\$//') END_VERSIONS """ |
70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 | """ mkdir star touch star/Genome touch star/Log.out touch star/SA touch star/SAindex touch star/chrLength.txt touch star/chrName.txt touch star/chrNameLength.txt touch star/chrStart.txt touch star/exonGeTrInfo.tab touch star/exonInfo.tab touch star/geneInfo.tab touch star/genomeParameters.txt touch star/sjdbInfo.txt touch star/sjdbList.fromGTF.out.tab touch star/sjdbList.out.tab touch star/transcriptInfo.tab cat <<-END_VERSIONS > versions.yml "${task.process}": star: \$(STAR --version | sed -e "s/STAR_//g") samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//') gawk: \$(echo \$(gawk --version 2>&1) | sed 's/^.*GNU Awk //; s/, .*\$//') END_VERSIONS """ |
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | """ stringtie \\ $bam \\ $strandedness \\ $reference \\ -o ${prefix}.transcripts.gtf \\ -A ${prefix}.gene.abundance.txt \\ $coverage \\ $ballgown \\ -p $task.cpus \\ $args cat <<-END_VERSIONS > versions.yml "${task.process}": stringtie: \$(stringtie --version 2>&1) END_VERSIONS """ |
57 58 59 60 61 62 63 64 65 66 67 | """ touch ${prefix}.transcripts.gtf touch ${prefix}.gene.abundance.txt touch ${prefix}.coverage.gtf touch ${prefix}.ballgown cat <<-END_VERSIONS > versions.yml "${task.process}": stringtie: \$(stringtie --version 2>&1) END_VERSIONS """ |
41 42 43 44 45 46 47 48 49 50 51 52 53 54 | """ [ ! -f ${prefix}.fastq.gz ] && ln -s $reads ${prefix}.fastq.gz trim_galore \\ $args \\ --cores $cores \\ --gzip \\ ${prefix}.fastq.gz cat <<-END_VERSIONS > versions.yml "${task.process}": trimgalore: \$(echo \$(trim_galore --version 2>&1) | sed 's/^.*version //; s/Last.*\$//') cutadapt: \$(cutadapt --version) END_VERSIONS """ |
56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 | """ [ ! -f ${prefix}_1.fastq.gz ] && ln -s ${reads[0]} ${prefix}_1.fastq.gz [ ! -f ${prefix}_2.fastq.gz ] && ln -s ${reads[1]} ${prefix}_2.fastq.gz trim_galore \\ $args \\ --cores $cores \\ --paired \\ --gzip \\ ${prefix}_1.fastq.gz \\ ${prefix}_2.fastq.gz cat <<-END_VERSIONS > versions.yml "${task.process}": trimgalore: \$(echo \$(trim_galore --version 2>&1) | sed 's/^.*version //; s/Last.*\$//') cutadapt: \$(cutadapt --version) END_VERSIONS """ |
Support
-
https://workflowhub.eu/workflows/271
-
10.1186/gb-2009-10-3-r25
-
10.1093/bioinformatics/btp324
-
https://doi.org/10.1101/gr.202895.115
-
https://doi.org/10.1016/j.celrep.2014.10.062
-
https://doi.org/10.1038/s41467-019-13840-9
-
https://doi.org/10.1093/bioinformatics/btv656
-
https://doi.org/10.1038/nature11928
-
10.1038/s41587-019-0201-4
-
https://doi.org/10.1093/nar/gkq622
-
https://doi.org/10.1186/gb-2003-5-1-r1
-
10.1038/nprot.2009.97
-
10.1093/bioinformatics/btu393
-
10.1186/s13059-014-0550-8
-
10.1038/nmeth.3885
-
https://doi.org/10.1093/bioinformatics/btl117
-
https://doi.org/10.12688/f1000research.7563.1
-
10.1093/bioinformatics/btp352
-
https://doi.org/10.1371/journal.pcbi.1000502
-
10.1093/bioinformatics/bts635
-
10.1038/nbt.3122
-
https://doi.org/10.7554/elife.05005
-
10.1186/1748-7188-6-26
-
https://doi.org/10.1101/2021.03.22.436400
- Future updates
Related Workflows





