Analysis of long non-coding RNAs from RNA-seq datasets
Help improve this workflow!
This workflow has been published but could be further improved with some additional meta data:- Keyword(s) in categories input, output, operation
You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .
A Nextflow-based pipeline for comprehensive analyses of long non-coding RNAs from RNA-seq datasets
Introduction
Recently, long noncoding RNA molecules (lncRNA) captured widespread attentions for their critical roles in diverse biological process and important implications in variety of human diseases and cancers. Identification and profiling of lncRNAs is a fundamental step to advance our knowledge on their function and regulatory mechanisms. However, RNA sequencing based lncRNA discovery is currently limited due to complicated operations and implementation of the tools involved. Therefore, we present a one-stop multi-tool integrated pipeline called LncPipe focused on characterizing lncRNAs from raw transcriptome sequencing data. The pipeline was developed based on a popular workflow framework Nextflow , composed of four core procedures including reads alignment, assembly, identification and quantification. It contains various unique features such as well-designed lncRNAs annotation strategy, optimized calculating efficiency, diversified classification and interactive analysis report. LncPipe allows users additional control in interuppting the pipeline, resetting parameters from command line, modifying main script directly and resume analysis from previous checkpoint.
Documentation
The nf-core/lncpipe pipeline comes with documentation about the pipeline, found in the
docs/
directory:
-
Pipeline configuration
Schematic diagram
Acknowledgment
Thanks to the author of AfterQC /fastp, Shifu Chen, for his help on providing a gzip output support to meet the require of LncPipe. Thanks to the internal test by Hongwan Zhang and Yan Wang from SYSUCC Cancer bioinformatics platform.
And also many thanks to the wonderful guys @apeltzer, @ewels and others from nf-core that help me to polish the code and structure of lncpipe.
Citation
For details of LncPipe, plz read the article beblow :happy:
Qi Zhao, Yu Sun, Dawei Wang, Hongwan Zhang, Kai Yu, Jian Zheng, Zhixiang Zuo. LncPipe: A Nextflow-based pipeline for identification and analysis of long non-coding RNAs from RNA-Seq data. J Genet Genomics. 2018 Jul 20;45(7):399-401
Code Snippets
208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 | ''' set -o pipefail touch filenames.txt perl -lpe 's/ ([^"]\\S+) ;/ "$1" ;/g' !{gencode_annotation_gtf} > gencode_annotation_gtf_mod.gtf perl -lpe 's/ ([^"]\\S+) ;/ "$1" ;/g' !{lncipedia_gtf} > lncipedia_mod.gtf echo gencode_annotation_gtf_mod.gtf >>filenames.txt echo lncipedia_mod.gtf >>filenames.txt stringtie --merge -o merged_lncRNA.gtf filenames.txt cat gencode_annotation_gtf_mod.gtf |grep "protein_coding" > gencode_protein_coding.gtf gffcompare -r gencode_protein_coding.gtf -p !{task.cpus} merged_lncRNA.gtf awk '$3 =="u"||$3=="x"{print $5}' gffcmp.merged_lncRNA.gtf.tmap |sort|uniq|perl !{baseDir}/bin/extract_gtf_by_name.pl merged_lncRNA.gtf - > merged.filter.gtf mv merged.filter.gtf known.lncRNA.gtf ''' |
228 229 230 231 232 233 234 235 236 237 | ''' set -o pipefail cuffmerge -o merged_lncRNA !{lncRNA_gtflistfile} cat !{gencode_annotation_gtf} |grep "protein_coding" > gencode_protein_coding.gtf cuffcompare -o merged_lncRNA -r gencode_protein_coding.gtf -p !{task.cpus} merged_lncRNA/merged.gtf awk '$3 =="u"||$3=="x"{print $5}' merged_lncRNA/merged_lncRNA.merged.gtf.tmap |sort|uniq|perl !{baseDir}/bin/extract_gtf_by_name.pl merged_lncRNA/merged.gtf - > merged.filter.gtf mv merged.filter.gtf known.lncRNA.gtf ''' |
276 277 278 279 280 281 282 283 284 285 | """ mkdir star_index STAR \ --runMode genomeGenerate \ --runThreadN ${star_threads} \ --sjdbGTFfile $gencode_annotation_gtf \ --sjdbOverhang 149 \ --genomeDir star_index/ \ --genomeFastaFiles $fasta """ |
303 304 305 | """ bowtie2-build !{fasta} genome_bt2 """ |
325 326 327 328 329 330 | """ #for human genome it will take more than 160GB memory and take really long time (6 more hours), thus we recommand to down pre-build genome from hisat website extract_splice_sites.py !{gencode_annotation_gtf} >genome_ht2.ss extract_exons.py !{gencode_annotation_gtf} > genome_ht2.exon hisat2-build -p !{hisat2_index_threads} --ss genome_ht2.ss --exo genome_ht2.exon !{fasta} genome_ht2 """ |
365 366 367 | ''' fastqc -t !{task.cpus} !{fastq_file[0]} !{fastq_file[1]} ''' |
391 392 393 | ''' after.py -z -1 !{fastq_file[0]} -g ./ ''' |
395 396 397 | ''' after.py -z -1 !{fastq_file[0]} -2 !{fastq_file[1]} -g ./ ''' |
424 425 426 427 | ''' fastp -i !{fastq_file[0]} -o !{samplename}.qc.gz -h !{samplename}_fastp.html ''' |
429 430 431 | ''' fastp -i !{fastq_file[0]} -I !{fastq_file[1]} -o !{samplename}_1.qc.fq.gz -O !{samplename}_2.qc.fq.gz -h !{samplename}_fastp.html ''' |
470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 | """ STAR --runThreadN !{task.cpus} \ --twopassMode Basic \ --genomeDir !{star_index} \ --readFilesIn !{pair} \ --readFilesCommand zcat \ --outSAMtype BAM SortedByCoordinate \ --chimSegmentMin 20 \ --outFilterIntronMotifs RemoveNoncanonical \ --outFilterMultimapNmax 20 \ --alignIntronMin 20 \ --alignIntronMax 1000000 \ --alignMatesGapMax 1000000 \ --outFilterType BySJout \ --alignSJoverhangMin 8 \ --alignSJDBoverhangMin 1 \ --outFileNamePrefix !{file_tag_new} """ |
490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 | ''' STAR --runThreadN !{task.cpus} \ --twopassMode Basic --genomeDir !{star_index} \ --readFilesIn !{pair[0]} !{pair[1]} \ --readFilesCommand zcat \ --outSAMtype BAM SortedByCoordinate \ --chimSegmentMin 20 \ --outFilterIntronMotifs RemoveNoncanonical \ --outFilterMultimapNmax 20 \ --alignIntronMin 20 \ --alignIntronMax 1000000 \ --alignMatesGapMax 1000000 \ --outFilterType BySJout \ --alignSJoverhangMin 8 \ --alignSJDBoverhangMin 1 \ --outFileNamePrefix !{file_tag_new} ''' |
541 542 543 544 | ''' tophat -p !{task.cpus} -G !{gtf} -–no-novel-juncs -o !{samplename}_thout --library-type !{strand_str} !{index_base} !{pair} ''' |
547 548 549 | ''' tophat -p !{task.cpus} -G !{gtf} -–no-novel-juncs -o !{samplename}_thout --library-type !{strand_str} !{index_base} !{pair[0]} !{pair[1]} ''' |
580 581 582 583 584 585 586 587 588 | ''' mkdir tmp hisat2 -p !{task.cpus} --dta -x !{index_base} -U !{pair} -S !{file_tag_new}.sam 2>!{file_tag_new}.hisat2_summary.txt sambamba view -S -f bam -t !{task.cpus} !{file_tag_new}.sam -o temp.bam sambamba sort -o !{file_tag_new}.sort.bam --tmpdir ./tmp -t !{task.cpus} temp.bam rm !{file_tag_new}.sam rm temp.bam ''' |
591 592 593 594 595 596 597 | ''' mkdir tmp hisat2 -p !{task.cpus} --dta -x !{index_base} -1 !{pair[0]} -2 !{pair[1]} -S !{file_tag_new}.sam 2> !{file_tag_new}.hisat2_summary.txt sambamba view -S -f bam -t !{hisat2_threads} !{file_tag_new}.sam -o temp.bam sambamba sort -o !{file_tag_new}.sort.bam --tmpdir ./tmp -t !{task.cpus} temp.bam rm !{file_tag_new}.sam ''' |
602 603 604 605 606 607 608 609 610 | ''' mkdir tmp hisat2 -p !{task.cpus} --dta --rna-strandness !{params.hisat_strand} -x !{index_base} -U !{pair} -S !{file_tag_new}.sam 2>!{file_tag_new}.hisat2_summary.txt sambamba view -S -f bam -t !{hisat2_threads} !{file_tag_new}.sam -o temp.bam sambamba sort -o !{file_tag_new}.sort.bam --tmpdir ./tmp -t !{hisat2_threads} temp.bam rm !{file_tag_new}.sam rm temp.bam ''' |
613 614 615 616 617 618 619 | ''' mkdir tmp hisat2 -p !{task.cpus} --dta --rna-strandness !{params.hisat_strand} -x !{index_base} -1 !{pair[0]} -2 !{pair[1]} -S !{file_tag_new}.sam 2> !{file_tag_new}.hisat2_summary.txt sambamba view -S -f bam -t !{task.cpus} !{file_tag_new}.sam -o temp.bam sambamba sort -o !{file_tag_new}.sort.bam --tmpdir ./tmp -t !{task.cpus} temp.bam rm !{file_tag_new}.sam ''' |
647 648 649 650 | ''' #run stringtie stringtie -p !{task.cpus} -G !{gencode_annotation_gtf} -l stringtie_!{file_tag_new} -o stringtie_!{file_tag_new}_transcripts.gtf !{alignment_bam} ''' |
652 653 654 655 | ''' #run stringtie stringtie -p !{task.cpus} -G !{gencode_annotation_gtf} --rf -l stringtie_!{file_tag_new} -o stringtie_!{file_tag_new}_transcripts.gtf !{alignment_bam} ''' |
683 684 685 686 687 | ''' stringtie --merge -p !{task.cpus} -o merged.gtf !{gtf_filenames} ''' |
711 712 713 714 715 716 717 718 719 720 721 722 723 | ''' #run cufflinks cufflinks -g !{gencode_annotation_gtf} \ -b !{fasta} \ --library-type !{strand_str}\ --max-multiread-fraction 0.25 \ --3-overhang-tolerance 2000 \ -o Cufout_!{file_tag_new} \ -p !{task.cpus} !{alignment_bam} mv Cufout_!{file_tag_new}/transcripts.gtf Cufout_!{file_tag_new}_transcripts.gtf ''' |
726 727 728 729 730 731 732 733 734 735 736 737 738 | ''' #run cufflinks cufflinks -g !{gencode_annotation_gtf} \ -b !{fasta} \ --library-type !{strand_str} \ --max-multiread-fraction 0.25 \ --3-overhang-tolerance 2000 \ -o Cufout_!{file_tag_new} \ -p !{task.cpus} !{alignment_bam} mv Cufout_!{file_tag_new}/transcripts.gtf Cufout_!{file_tag_new}_transcripts.gtf ''' |
771 772 773 774 775 776 777 778 | ''' mkdir CUFFMERGE cuffmerge -o CUFFMERGE \ -s !{fasta} \ -p !{task.cpus} \ !{gtf_filenames} ''' |
817 818 819 | ''' fastqc -t !{task.cpus} !{fastq_file[0]} !{fastq_file[1]} ''' |
845 846 847 | ''' after.py -z -1 !{fastq_file[0]} -g ./ ''' |
849 850 851 | ''' after.py -z -1 !{fastq_file[0]} -2 !{fastq_file[1]} -g ./ ''' |
878 879 880 881 | ''' fastp -i !{fastq_file[0]} -o !{samplename}.qc.gz -h !{samplename}_fastp.html ''' |
883 884 885 | ''' fastp -i !{fastq_file[0]} -I !{fastq_file[1]} -o !{samplename}_1.qc.fq.gz -O !{samplename}_2.qc.fq.gz -h !{samplename}_fastp.html ''' |
914 915 916 917 | ''' #!/bin/sh gffcompare -r !{gencode_annotation_gtf} -p !{task.cpus} !{mergeGtfFile} -o merged_lncRNA ''' |
938 939 940 941 942 943 944 945 946 947 948 949 950 951 | ''' # filtering novel lncRNA based on cuffmerged trascripts awk '$3 =="x"||$3=="u"||$3=="i"{print $0}' !{comparedTmap} > novel.gtf.tmap # excluding length smaller than 200 nt awk '$10 >200{print}' novel.gtf.tmap > novel.longRNA.gtf.tmap # extract gtf awk '{print $5}' novel.longRNA.gtf.tmap |perl !{baseDir}/bin/extract_gtf_by_name.pl !{mergedGTF} - >novel.longRNA.gtf awk '{if($3=="exon"){print $0}}' novel.longRNA.gtf > novel.longRNA.format.gtf perl !{baseDir}/bin/get_exoncount.pl novel.longRNA.format.gtf > novel.longRNA.exoncount.txt # gtf2gff3 #check whether required # get fasta from gtf gffread novel.longRNA.gtf -g !{fasta} -w novel.longRNA.fa -W ''' |
968 969 970 971 972 973 | ''' PLEK.py -fasta !{novel_lncRNA_fasta} \ -out novel.longRNA.PLEK.out \ -thread !{task.cpus} exit 0 ''' |
983 984 985 986 987 988 | ''' cpat.py -g !{novel_lncRNA_fasta} \ -x !{baseDir}/bin/cpat_model/Human_Hexamer.tsv \ -d !{baseDir}/bin/cpat_model/Human_logitModel.RData \ -o novel.longRNA.CPAT.out ''' |
990 991 992 993 994 995 | ''' cpat.py -g !{novel_lncRNA_fasta} \ -x !{baseDir}/bin/cpat_model/Mouse_Hexamer.tsv \ -d !{baseDir}/bin/cpat_model/Mouse_logitModel.RData \ -o novel.longRNA.CPAT.out ''' |
998 999 1000 1001 1002 1003 | ''' cpat.py -g !{novel_lncRNA_fasta} \ -x !{baseDir}/bin/cpat_model/zebrafish_Hexamer.tsv \ -d !{baseDir}/bin/cpat_model/zebrafish_logitModel.RData \ -o novel.longRNA.CPAT.out ''' |
1005 1006 1007 1008 1009 1010 | ''' cpat.py -g !{novel_lncRNA_fasta} \ -x !{baseDir}/bin/cpat_model/fly_Hexamer.tsv \ -d !{baseDir}/bin/cpat_model/fly_logitModel.RData \ -o novel.longRNA.CPAT.out ''' |
1034 1035 1036 1037 1038 1039 1040 1041 | ''' #merged transcripts perl !{baseDir}/bin/integrate_novel_transcripts.pl > novel.longRNA.txt awk '$4 >1{print $1}' novel.longRNA.txt|perl !{baseDir}/bin/extract_gtf_by_name.pl !{cuffmergegtf} - > novel.longRNA.stringent.gtf # retain lncRNA only by coding ability awk '$4 >1&&$5=="lncRNA"{print $1}' novel.longRNA.txt|perl !{baseDir}/bin/extract_gtf_by_name.pl !{cuffmergegtf} - > novel.lncRNA.stringent.gtf awk '$4 >1&&$5=="TUCP"{print $1}' novel.longRNA.txt|perl !{baseDir}/bin/extract_gtf_by_name.pl !{cuffmergegtf} - > novel.TUCP.stringent.gtf ''' |
1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 | ''' gffcompare -G -o filter \ -r !{knowlncRNAgtf} \ -p !{task.cpus} !{novel_lncRNA_stringent_Gtf} awk '$3 =="u"||$3=="x"{print $5}' filter.novel.lncRNA.stringent.gtf.tmap |sort|uniq| \ perl !{baseDir}/bin/extract_gtf_by_name.pl !{novel_lncRNA_stringent_Gtf} - > novel.lncRNA.stringent.filter.gtf #rename lncRNAs according to neighbouring protein coding genes awk '$3 =="gene"{print }' !{gencode_protein_coding_gtf} | perl -F'\\t' -lane '$F[8]=~/gene_id "(.*?)";/ && print join qq{\\t},@F[0,3,4],$1,@F[5,6,1,2,7,8,9]' - | \ sort-bed - > gencode.protein_coding.gene.bed gtf2bed < novel.lncRNA.stringent.filter.gtf |sort-bed - > novel.lncRNA.stringent.filter.bed gtf2bed < !{knowlncRNAgtf} |sort-bed - > known.lncRNA.bed perl !{baseDir}/bin/rename_lncRNA_2.pl gencode_annotation_gtf_mod.gtf lncipedia_mod.gtf # mv lncRNA.final.v2.gtf all_lncRNA_for_classifier.gtf grep -v NA-1-1 lncRNA.final.v2.gtf > all_lncRNA_for_classifier.gtf perl !{baseDir}/bin/rename_proteincoding.pl !{gencode_protein_coding_gtf}> protein_coding.final.gtf cat all_lncRNA_for_classifier.gtf protein_coding.final.gtf > final_all.gtf gffread final_all.gtf -g !{fasta} -w final_all.fa -W gffread all_lncRNA_for_classifier.gtf -g !{fasta} -w lncRNA.fa -W gffread protein_coding.final.gtf -g !{fasta} -w protein_coding.fa -W #classification perl !{baseDir}/bin/lincRNA_classification.pl all_lncRNA_for_classifier.gtf !{gencode_protein_coding_gtf} lncRNA_classification.txt ''' |
1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 | ''' gffcompare -G -o filter \ -r !{knowlncRNAgtf} \ -p !{task.cpus} !{novel_lncRNA_stringent_Gtf} awk '$3 =="u"||$3=="x"{print $5}' filter.novel.lncRNA.stringent.gtf.tmap |sort|uniq| \ perl !{baseDir}/bin/extract_gtf_by_name.pl !{novel_lncRNA_stringent_Gtf} - > novel.lncRNA.stringent.filter.gtf #rename lncRNAs according to neighbouring protein coding genes awk '$3 =="gene"{print }' !{gencode_protein_coding_gtf} | perl -F'\\t' -lane '$F[8]=~/gene_id "(.*?)";/ && print join qq{\\t},@F[0,3,4],$1,@F[5,6,1,2,7,8,9]' - | \ sort-bed - > gencode.protein_coding.gene.bed gtf2bed < novel.lncRNA.stringent.filter.gtf |sort-bed - > novel.lncRNA.stringent.filter.bed gtf2bed < !{knowlncRNAgtf} |sort-bed - > known.lncRNA.bed perl !{baseDir}/bin/rename_lncRNA_2.pl non_human_mod.gtf # mv lncRNA.final.v2.gtf all_lncRNA_for_classifier.gtf grep -v NA-1-1 lncRNA.final.v2.gtf > all_lncRNA_for_classifier.gtf perl !{baseDir}/bin/rename_proteincoding.pl !{gencode_protein_coding_gtf}> protein_coding.final.gtf cat all_lncRNA_for_classifier.gtf protein_coding.final.gtf > final_all.gtf gffread final_all.gtf -g !{fasta} -w final_all.fa -W gffread all_lncRNA_for_classifier.gtf -g !{fasta} -w lncRNA.fa -W gffread protein_coding.final.gtf -g !{fasta} -w protein_coding.fa -W #classification perl !{baseDir}/bin/lincRNA_classification.pl all_lncRNA_for_classifier.gtf !{gencode_protein_coding_gtf} lncRNA_classification.txt ''' |
1144 1145 1146 1147 1148 1149 | ''' cpat.py -g !{lncRNA_final_cpat_fasta} \ -x !{baseDir}/bin/cpat_model/Human_Hexamer.tsv \ -d !{baseDir}/bin/cpat_model/Human_logitModel.RData \ -o lncRNA.final.CPAT.out ''' |
1151 1152 1153 1154 1155 1156 | ''' cpat.py -g !{lncRNA_final_cpat_fasta} \ -x !{baseDir}/bin/cpat_model/Mouse_Hexamer.tsv \ -d !{baseDir}/bin/cpat_model/Mouse_logitModel.RData \ -o lncRNA.final.CPAT.out ''' |
1159 1160 1161 1162 1163 1164 | ''' cpat.py -g !{lncRNA_final_cpat_fasta} \ -x !{baseDir}/bin/cpat_model/zebrafish_Hexamer.tsv \ -d !{baseDir}/bin/cpat_model/zebrafish_logitModel.RData \ -o lncRNA.final.CPAT.out ''' |
1166 1167 1168 1169 1170 1171 | ''' cpat.py -g !{lncRNA_final_cpat_fasta} \ -x !{baseDir}/bin/cpat_model/fly_Hexamer.tsv \ -d !{baseDir}/bin/cpat_model/fly_logitModel.RData \ -o lncRNA.final.CPAT.out ''' |
1181 1182 1183 1184 1185 1186 | ''' cpat.py -g !{final_coding_gene_for_CPAT} \ -x !{baseDir}/bin/cpat_model/Human_Hexamer.tsv \ -d !{baseDir}/bin/cpat_model/Human_logitModel.RData \ -o protein_coding.final.CPAT.out ''' |
1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 | ''' #!/usr/bin/perl -w #since CPAT arbitrarily transforms gene names into upper case, we apply 'uc' function to keep the genenames' consistency. use strict; open OUT,">basic_charac.txt" or die; open FH,"all_lncRNA_for_classifier.gtf" or die; my %class; my %g2t; my %trans_len; my %exon_num; while(<FH>){ chomp; my @field=split "\t"; $_=~/gene_id "(.+?)"/; my $gid=$1; $_=~/transcript_id "(.+?)"/; my $tid=uc($1); $class{$tid}=$field[1]; $g2t{$tid}=$gid; my $len=$field[4]-$field[3]; $trans_len{$tid}=(exists $trans_len{$tid})?$trans_len{$tid}+$len:$len; $exon_num{$tid}=(exists $exon_num{$tid})?$exon_num{$tid}+1:1; } open FH,"protein_coding.final.gtf" or die; while(<FH>){ chomp; my @field=split "\t"; $_=~/gene_id "(.+?)"/; my $gid=uc($1); $_=~/transcript_id "(.+?)"/; my $tid=$1; $class{$tid}="protein_coding"; $g2t{$tid}=$gid; my $len=$field[4]-$field[3]; $trans_len{$tid}=(exists $trans_len{$tid})?$trans_len{$tid}+$len:$len; $exon_num{$tid}=(exists $exon_num{$tid})?$exon_num{$tid}+1:1; } my %lin_class; open IN,"lncRNA_classification.txt" or die; #change the file name while(<IN>){ chomp; my @data = split /\\t/,$_; $lin_class{$data[0]} = $data[1]; } open FH,"lncRNA.final.CPAT.out" or die; <FH>; while(<FH>){ chomp; my @field=split "\t"; my $tid=uc($field[0]); my $class; if (defined($lin_class{$tid})){ $class = $lin_class{$tid}; }else{ $class = 'NA'; } print OUT $g2t{$tid}."\t".$tid."\t".$class{$tid}."\t".$field[5]."\t".$trans_len{$tid}."\t".$exon_num{$tid}."\t".$class."\n"; } open FH,"protein_coding.final.CPAT.out" or die; <FH>; while(<FH>){ chomp; my @field=split "\t"; my $tid=uc($field[0]); my $class; if (defined($lin_class{$tid})){ $class = $lin_class{$tid}; }else{ $class = 'protein_coding'; } print OUT $g2t{$tid}."\t".$tid."\t".$class{$tid}."\t".$field[5]."\t".$trans_len{$tid}."\t".$exon_num{$tid}."\t".$class."\n"; } ''' |
1310 1311 1312 1313 1314 | ''' sambamba view !{bamfile} > !{samplename}.sam # resolved error caused by bam and htseq version conflicts htseq-count -t exon -i gene_id -s no -r pos -f sam !{samplename}.sam !{final_gtf} > !{samplename}.htseq.count rm !{samplename}.sam ''' |
1316 1317 1318 1319 1320 | ''' sambamba view !{bamfile} > !{samplename}.sam # resolved error caused by bam and htseq version conflicts htseq-count -t exon -i gene_id -r pos -f sam !{samplename}.sam !{final_gtf} > !{samplename}.htseq.count rm !{samplename}.sam ''' |
1336 1337 1338 1339 1340 | ''' #index kallisto reference kallisto index -i transcripts.idx !{transript_fasta} ''' |
1361 1362 1363 1364 1365 | ''' #quantification by kallisto in single end mode kallisto quant -i !{kallistoIndex} -o !{file_tag_new}_kallisto -t !{task.cpus} -b 100 --single -l 180 -s 20 !{pair} mv !{file_tag_new}_kallisto/abundance.tsv !{file_tag_new}_abundance.tsv ''' |
1370 1371 1372 1373 1374 | ''' #quantification by kallisto kallisto quant -i !{kallistoIndex} -o !{file_tag_new}_kallisto -t !{task.cpus} -b 100 !{pair[0]} !{pair[1]} mv !{file_tag_new}_kallisto/abundance.tsv !{file_tag_new}_abundance.tsv ''' |
1396 1397 1398 1399 1400 | ''' #index kallisto reference kallisto index -i transcripts.idx !{transript_fasta} ''' |
1421 1422 1423 1424 1425 1426 | ''' #quantification by kallisto in single end mode kallisto quant -i !{kallistoIndex} -o !{file_tag_new}_kallisto -t !{task.cpus} -b 100 --single -l 180 -s 20 !{pair} mv !{file_tag_new}_kallisto/abundance.tsv !{file_tag_new}_abundance.tsv ''' |
1431 1432 1433 1434 1435 | ''' #quantification by kallisto kallisto quant -i !{kallistoIndex} -o !{file_tag_new}_kallisto -t !{task.cpus} -b 100 !{pair[0]} !{pair[1]} mv !{file_tag_new}_kallisto/abundance.tsv !{file_tag_new}_abundance.tsv ''' |
1459 1460 1461 1462 | ''' perl !{baseDir}/bin/get_map_table.pl final_all.gtf > map.file R CMD BATCH !{baseDir}/bin/get_htseq_matrix.R ''' |
1478 1479 1480 1481 | ''' perl !{baseDir}/bin/get_map_table.pl --gtf_file=final_all.gtf > map.file R CMD BATCH !{baseDir}/bin/get_kallisto_matrix.R ''' |
1519 1520 1521 | """ Rscript -e "library(LncPipeReporter);run_reporter(input='.', output = 'reporter.html',output_dir='./LncPipeReports',de.method=\'${detools}\',theme = 'npg',cdf.percent = ${lncRep_cdf_percent},max.lncrna.len = ${lncRep_max_lnc_len},min.expressed.sample = ${lncRep_min_expressed_sample}, ask = FALSE)" """ |
1540 1541 1542 1543 | """ perl -F':|,' -lanE'BEGIN{say qq{SampleID\tcondition}} $del = shift @F; say qq{$_\t$del} for @F' ${design} > design.matrix Rscript -e "library(LncPipeReporter);run_reporter(input='.', output = 'reporter.html',output_dir='./LncPipeReports',de.method=\'${detools}\',theme = 'npg',cdf.percent = ${lncRep_cdf_percent},max.lncrna.len = ${lncRep_max_lnc_len},min.expressed.sample = ${lncRep_min_expressed_sample}, ask = FALSE)" """ |
1565 1566 1567 | """ Rscript -e "library(LncPipeReporter);run_reporter(input='.', output = 'reporter.html',output_dir='./LncPipeReports',de.method=\'${detools}\',theme = 'npg',cdf.percent = ${lncRep_cdf_percent},max.lncrna.len = ${lncRep_max_lnc_len},min.expressed.sample = ${lncRep_min_expressed_sample}, ask = FALSE)" """ |
1586 1587 1588 | """ Rscript -e "library(LncPipeReporter);run_reporter(input='.', output = 'reporter.html',output_dir='./LncPipeReports',de.method=\'${detools}\',theme = 'npg',cdf.percent = ${lncRep_cdf_percent},max.lncrna.len = ${lncRep_max_lnc_len},min.expressed.sample = ${lncRep_min_expressed_sample}, ask = FALSE)" """ |
Support
- Future updates
Related Workflows





