Analysis of Dual RNA-seq data - an experimental method for interrogating host-pathogen interactions through simultaneous RNA-seq.
Dual RNA-seq pipeline
nf-core/dualrnaseq is a bioinformatics pipeline built using Nextflow , a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker containers making installation trivial and results highly reproducible.
Introduction
nf-core/dualrnaseq is specifically used for the analysis of Dual RNA-seq data, interrogating host-pathogen interactions through simultaneous RNA-seq.
This pipeline has been initially tested with eukaryotic host's including Human and Mouse, and pathogens including Salmonella enterica , Orientia tsutsugamushi , Streptococcus penumoniae , Escherichia coli and Mycobacterium leprae . The workflow should work with any eukaryotic and bacterial organisms with an available reference genome and annotation.
Method
The workflow merges host and pathogen genome annotations taking into account differences in annotation conventions, then processes raw data from FastQ inputs ( FastQC , BBDuk ), quantifies gene expression ( STAR and HTSeq ; STAR , Salmon and tximport ; or Salmon in quasimapping mode and tximport ), and summarises the results ( MultiQC ), as well as generating a number of custom summary plots and separate results tables for the pathogen and host. See the output documentation for more details.
Workflow
The workflow diagram below gives a simplified visual overview of how dualrnaseq has been designed.
Documentation
The nf-core/dualrnaseq pipeline comes with documentation about the pipeline, found in the
docs/
directory:
-
Pipeline configuration
Credits
nf-core/dualrnaseq was coded and written by Bozena Mika-Gospodorz and Regan Hayward.
We thank the following people for their extensive assistance in the development of this pipeline:
Contributions and Support
If you would like to contribute to this pipeline, please see the contributing guidelines .
For further information or help, don't hesitate to get in touch on the
Slack
#dualrnaseq
channel
(you can join with
this invite
).
Citations
You can cite the
nf-core
publication as follows:
The nf-core framework for community-curated bioinformatics pipelines.
Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x . ReadCube: Full Access Link
An extensive list of references for the tools used by the pipeline can be found in the
CITATIONS.md
file.
Code Snippets
721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 | """ echo $workflow.manifest.version > v_pipeline.txt echo $workflow.nextflow.version > v_nextflow.txt python --version > v_python.txt R --version > v_r.txt cutadapt --version > v_cutadapt.txt fastqc --version > v_fastqc.txt multiqc --version > v_multiqc.txt STAR --version > v_star.txt htseq-count . . --version > v_htseq.txt samtools --version > v_samtools.txt gffread --version > v_gffread.txt salmon --version > v_salmon.txt scrape_software_versions.py &> software_versions_mqc.yaml """ |
721
of
master/main.nf
773 774 775 | ''' python !{workflow.projectDir}/bin/check_replicates.py -s !{sample_name} 2>&1 ''' |
804 805 806 | ''' cp -n !{f_ext} !{base_name_file}.fasta ''' |
810 811 812 813 | ''' gunzip -f -S .zip !{f_ext} cp -n !{old_base_name_file} !{base_name_file}.fasta ''' |
817 818 819 820 | ''' gunzip -f !{f_ext} cp -n !{old_base_name_file} !{base_name_file}.fasta ''' |
822 823 824 | ''' echo "Your pathogen genome files appear to have the wrong extension. \n Currently, the pipeline only supports .fasta or .fa, or compressed files with .zip or .gz extensions." ''' |
852 853 854 | ''' cp -n !{f_ext} !{base_name_file}.fasta ''' |
858 859 860 861 | ''' gunzip -f -S .zip !{f_ext} cp -n !{old_base_name_file} !{base_name_file}.fasta ''' |
865 866 867 868 | ''' gunzip -f !{f_ext} cp -n !{old_base_name_file} !{base_name_file}.fasta ''' |
870 871 872 | ''' echo "Your host genome files appear to have the wrong extension. \n Currently, the pipeline only supports .fasta or .fa, or compressed files with .zip or .gz extensions." ''' |
901 902 903 | ''' cp -n !{f_ext} !{base_name_file}.gff3 ''' |
905 906 907 908 | ''' gunzip -f -S .zip !{f_ext} cp -n !{base_name_file} !{base_name_file}.gff3 ''' |
913 914 915 916 | ''' gunzip -f !{f_ext} cp -n !{old_base_name_file} !{base_name_file}.gff3 ''' |
918 919 920 | ''' echo "Your pathogen GFF file appears to be in the wrong format or has the wrong extension. \n Currently, the pipeline only supports .gff or .gff3, or compressed files with .zip or .gz extensions." ''' |
953 954 955 | ''' cp -n !{f_ext} !{base_name_file}.gff3 ''' |
957 958 959 960 | ''' gunzip -f -S .zip !{f_ext} cp -n !{base_name_file} !{base_name_file}.gff3 ''' |
965 966 967 968 | ''' gunzip -f !{f_ext} cp -n !{old_base_name_file} !{base_name_file}.gff3 ''' |
970 971 972 | ''' echo "Your host GFF file appears to be in the wrong format or has the wrong extension. \n Currently, the pipeline only supports .gff or .gff3, or compressed files with .zip or .gz extensions." ''' |
1005 1006 1007 | ''' cp -n !{f_ext} !{base_name_file}.gff3 ''' |
1009 1010 1011 1012 | ''' gunzip -f -S .zip !{f_ext} cp -n !{base_name_file} !{base_name_file}.gff3 ''' |
1017 1018 1019 1020 | ''' gunzip -f !{f_ext} cp -n !{old_base_name_file} !{base_name_file}.gff3 ''' |
1022 1023 1024 | ''' echo "Your host GFF file appears to be in the wrong format or has the wrong extension. \n Currently, the pipeline only supports .gff or .gff3, or compressed files with .zip or .gz extensions." ''' |
1051 1052 1053 | ''' cp -n !{f_ext} !{base_name_file}.gff3 ''' |
1055 1056 1057 1058 | ''' gunzip -f -S .zip !{f_ext} cp -n !{base_name_file} !{base_name_file}.gff3 ''' |
1063 1064 1065 1066 | ''' gunzip -f !{f_ext} cp -n !{old_base_name_file} !{base_name_file}.gff3 ''' |
1068 1069 1070 | ''' echo "Your host GFF tRNA file appears to be in the wrong format or has the wrong extension. \n Currently, the pipeline only supports .gff or .gff3, or compressed files with .zip or .gz extensions." ''' |
1100 1101 1102 | """ cat $pathogen_fa $host_fa > host_pathogen.fasta """ |
1130 1131 1132 | """ cat $host_gff_genome $host_gff_tRNA > ${outfile_name} """ |
1158 1159 1160 | """ $workflow.projectDir/bin/replace_feature_gff.sh $gff ${outfile_name} $features """ |
1184 1185 1186 | """ $workflow.projectDir/bin/replace_feature_gff.sh $gff ${outfile_name} $features """ |
1212 1213 1214 | """ $workflow.projectDir/bin/replace_attribute_gff.sh $gff ${outfile_name} $host_attribute $pathogen_attribute """ |
1237 1238 1239 | """ cat $pathogen_gff_genome $host_gff > host_pathogen_htseq.gff """ |
1266 1267 1268 | """ python $workflow.projectDir/bin/extract_annotations_from_gff.py -gff $gff -f $features -a $pathogen_attribute -org pathogen -q_tool htseq -o ${outfile_name} """ |
1295 1296 1297 | """ python $workflow.projectDir/bin/extract_annotations_from_gff.py -gff $gff -f $features -a $host_attribute -org host -q_tool htseq -o ${outfile_name} """ |
1327 1328 1329 | """ $workflow.projectDir/bin/extract_reference_names_from_fasta_files.sh reference_host_names.txt $host_fa """ |
1354 1355 1356 | """ $workflow.projectDir/bin/extract_reference_names_from_fasta_files.sh reference_pathogen_names.txt $pathogen_fa """ |
1386 1387 1388 | """ $workflow.projectDir/bin/replace_attribute_gff.sh $gff ${outfile_name} parent Parent """ |
1416 1417 1418 | """ $workflow.projectDir/bin/replace_attribute_gff.sh $gff ${outfile_name} parent $host_attribute """ |
1441 1442 1443 | """ cat $host_gff_genome $host_gff_tRNA > ${outfile_name} """ |
1471 1472 1473 | """ $workflow.projectDir/bin/replace_feature_gff.sh $gff ${outfile_name} $features """ |
1499 1500 1501 | """ $workflow.projectDir/bin/replace_attribute_gff.sh $gff ${outfile_name} parent $pathogen_attribute """ |
1528 1529 1530 | """ python $workflow.projectDir/bin/extract_annotations_from_gff.py -gff $gff -f $features -a parent -org pathogen -q_tool salmon -o ${outfile_name} """ |
1562 1563 1564 | """ python $workflow.projectDir/bin/extract_annotations_from_gff.py -gff $gff -f quant -a parent -org host -q_tool salmon -o ${outfile_name} """ |
1597 1598 1599 | """ gffread -w $outfile_name -g $host_fa $gff """ |
1627 1628 1629 | """ python $workflow.projectDir/bin/gff_to_fasta_transcriptome.py -fasta $host_fa -gff $gff -f $features -a $attribute -o $outfile_name """ |
1656 1657 1658 | """ cat $host_tr_fa $host_tRNA_tr_fa > host_transcriptome.fasta """ |
1702 1703 1704 | """ python $workflow.projectDir/bin/gff_to_fasta_transcriptome.py -fasta $pathogen_fa -gff $gff -f $features -a $attribute -o $outfile_name """ |
1730 1731 1732 | """ cat $pathogen_tr_fa $host_tr_fa > host_pathogen_transcriptome.fasta """ |
1760 1761 1762 | """ $workflow.projectDir/bin/replace_feature_gff.sh $gff ${outfile_name} $features """ |
1786 1787 1788 | """ cat $pathogen_gff_genome $host_gff > host_pathogen_star_alignment_mode.gff """ |
1814 1815 1816 | """ fastqc --quiet --threads $task.cpus --noextract $reads $fastqc_params """ |
1859 1860 1861 | """ cutadapt -j ${task.cpus} -q $q_value -a $adapter_seq_3 -m 1 -o ${name_out} $reads $cutadapt_params """ |
1872 1873 1874 | """ cutadapt -j ${task.cpus} -q $q_value -a ${adapter_seq_3[0]} -A ${adapter_seq_3[1]} -o ${name_1} -p ${name_2} -m 1 ${reads[0]} ${reads[1]} $cutadapt_params """ |
1915 1916 1917 | """ bbduk.sh -Xmx1g in=$reads out=${name_out} ref=$adapters minlen=$minlen qtrim=$qtrim trimq=$trimq ktrim=$ktrim k=$k mink=$mink hdist=$hdist &> $fileoutput $bbduk_params """ |
1929 1930 1931 | """ bbduk.sh -Xmx1g in1=${reads[0]} in2=${reads[1]} out1=${name_1} out2=${name_2} ref=$adapters minlen=$minlen qtrim=$qtrim trimq=$trimq ktrim=$ktrim k=$k mink=$mink hdist=$hdist $bbduk_params tpe tbo &> $fileoutput """ |
1967 1968 1969 | """ fastqc --threads ${task.cpus} --quiet --noextract $reads $fastqc_params """ |
1999 2000 2001 | """ $workflow.projectDir/bin/count_total_reads.sh $fastq >> total_raw_reads_fastq.tsv """ |
2025 2026 2027 | """ $workflow.projectDir/bin/collect_total_raw_read_pairs.py -i $tsv """ |
2069 2070 2071 2072 2073 | ''' grep ">" !{host_fa} | cut -d " " -f 1 > decoys.txt sed -i -e 's/>//g' decoys.txt cat !{host_pathogen_transcriptome_fasta} !{host_fa} > gentrome.fasta ''' |
2098 2099 2100 | """ salmon index -t $gentrome -i transcripts_index --decoys $decoys -k $kmer_length -p ${task.cpus} $keepDuplicates $salmon_sa_params_index """ |
2136 2137 2138 | """ salmon quant -p ${task.cpus} -i $index -l $libtype -r $reads $softclip --incompatPrior $incompatPrior $UnmappedNames --validateMappings $dumpEq $writeMappings -o $sample_name $salmon_sa_params_mapping """ |
2142 2143 2144 | """ salmon quant -p ${task.cpus} -i $index -l $libtype -1 ${reads[0]} -2 ${reads[1]} $softclip --incompatPrior $incompatPrior $UnmappedNames --validateMappings $dumpEq $writeMappings -o $sample_name $salmon_sa_params_mapping """ |
2171 2172 2173 | """ $workflow.projectDir/bin/split_quant_tables_salmon.sh $transcriptome_pathogen $transcriptome_host salmon/*/quant.sf "quant.sf" """ |
2203 2204 2205 | """ $workflow.projectDir/bin/salmon_extract_ambig_uniq_transcripts_genes.R salmon/*/quant.sf salmon/*/aux_info/ambig_info.tsv $sample_name $annotations """ |
2225 2226 2227 | """ $workflow.projectDir/bin/salmon_host_comb_ambig_uniq.R salmon/*/aux_info/*_host_quant_ambig_uniq.sf """ |
2247 2248 2249 | """ $workflow.projectDir/bin/salmon_pathogen_comb_ambig_uniq.R salmon/*/aux_info/*_pathogen_quant_ambig_uniq.sf """ |
2274 2275 2276 | """ python $workflow.projectDir/bin/collect_quantification_data.py -i $input_quantification -q salmon -a $gene_attribute -org both """ |
2310 2311 2312 2313 2314 | """ $workflow.projectDir/bin/split_quant_tables_salmon.sh $transcriptome_pathogen $transcriptome_host $quant_table "quant_salmon.tsv" pathonen_tab=\$(if [ \$(cat pathogen_quant_salmon.tsv | wc -l) -gt 1 ]; then echo "true"; else echo "false"; fi) host_tab=\$(if [ \$(cat host_quant_salmon.tsv | wc -l) -gt 1 ]; then echo "true"; else echo "false"; fi) """ |
2338 2339 2340 | """ $workflow.projectDir/bin/combine_quant_annotations.py -q $quantification_table -annotations $annotation_table -a $attribute -org pathogen """ |
2364 2365 2366 | """ $workflow.projectDir/bin/combine_quant_annotations.py -q $quantification_table -annotations $annotation_table -a $attribute -org host """ |
2390 2391 2392 | """ $workflow.projectDir/bin/tximport.R salmon $annotations $sample_name """ |
2414 2415 2416 | """ python $workflow.projectDir/bin/collect_quantification_data.py -i $input_quantification -q salmon -a gene_id -org host_gene_level """ |
2439 2440 2441 | """ $workflow.projectDir/bin/combine_annotations_salmon_gene_level.py -q $quantification_table -annotations $annotation_table -a gene_id -org host """ |
2473 2474 2475 | """ python $workflow.projectDir/bin/scatter_plots.py -q $quant_table -a $attribute -org pathogen """ |
2504 2505 2506 | """ python $workflow.projectDir/bin/scatter_plots.py -q $quant_table -a $attribute -org host """ |
2528 2529 2530 | """ $workflow.projectDir/bin/extract_processed_reads.sh salmon/*/aux_info/meta_info.json $sample_name salmon """ |
2552 2553 2554 | """ cat $process_reads > processed_reads_salmon.tsv """ |
2580 2581 2582 | """ python $workflow.projectDir/bin/mapping_stats.py -q_p $quant_table_pathogen -q_h $quant_table_host -total_processed $total_processed_reads -total_raw $total_raw_reads -a $attribute -t salmon -o salmon_host_pathogen_total_reads.tsv """ |
2604 2605 2606 | """ python $workflow.projectDir/bin/plot_mapping_statistics_salmon.py -i $stats """ |
2635 2636 2637 | ''' python !{workflow.projectDir}/bin/RNA_class_content.py -q !{quant_table} -a !{attribute} -annotations !{gene_annotations} -q_tool salmon -org pathogen 2>&1 ''' |
2666 2667 2668 | ''' python !{workflow.projectDir}/bin/RNA_class_content.py -q !{quant_table} -a !{attribute} -annotations !{gene_annotations} -rna !{rna_classes_to_replace} -q_tool salmon -org host 2>&1 ''' |
2694 2695 2696 | """ python $workflow.projectDir/bin/plot_RNA_class_stats_each.py -i $stats_table """ |
2722 2723 2724 | """ python $workflow.projectDir/bin/plot_RNA_class_stats_combined.py -i $stats_table -org pathogen """ |
2749 2750 2751 | """ python $workflow.projectDir/bin/plot_RNA_class_stats_each.py -i $stats_table """ |
2776 2777 2778 | """ python $workflow.projectDir/bin/plot_RNA_class_stats_combined.py -i $stats_table -org host """ |
2815 2816 2817 2818 | """ mkdir index STAR --runThreadN ${task.cpus} --runMode genomeGenerate --genomeDir index/ --genomeFastaFiles $fasta --sjdbGTFfile $gff --sjdbGTFfeatureExon exon --sjdbGTFtagExonParentTranscript Parent --sjdbOverhang $sjdbOverhang $star_salmon_index_params """ |
2863 2864 2865 2866 | """ mkdir $sample_name STAR --runThreadN ${task.cpus} --genomeDir . --sjdbGTFfile $gff $readFilesCommand --readFilesIn $reads --outSAMtype BAM Unsorted --outSAMunmapped $outSAMunmapped --outSAMattributes $outSAMattributes --outFileNamePrefix $sample_name/$sample_name --sjdbGTFfeatureExon quant --sjdbGTFtagExonParentTranscript parent --quantMode TranscriptomeSAM --quantTranscriptomeBan $quantTranscriptomeBan --outFilterMultimapNmax $outFilterMultimapNmax --outFilterType $outFilterType --limitBAMsortRAM $limitBAMsortRAM --alignSJoverhangMin $alignSJoverhangMin --alignSJDBoverhangMin $alignSJDBoverhangMin --outFilterMismatchNmax $outFilterMismatchNmax --outFilterMismatchNoverReadLmax $outFilterMismatchNoverReadLmax --alignIntronMin $alignIntronMin --alignIntronMax $alignIntronMax --alignMatesGapMax $alignMatesGapMax --winAnchorMultimapNmax $winAnchorMultimapNmax $star_salmon_alignment_params """ |
2868 2869 2870 2871 | """ mkdir $sample_name STAR --runThreadN ${task.cpus} --genomeDir . --sjdbGTFfile $gff $readFilesCommand --readFilesIn ${reads[0]} ${reads[1]} --outSAMtype BAM Unsorted --outSAMunmapped $outSAMunmapped --outSAMattributes $outSAMattributes --outFileNamePrefix $sample_name/$sample_name --sjdbGTFfeatureExon quant --sjdbGTFtagExonParentTranscript parent --quantMode TranscriptomeSAM --quantTranscriptomeBan $quantTranscriptomeBan --outFilterMultimapNmax $outFilterMultimapNmax --outFilterType $outFilterType --limitBAMsortRAM $limitBAMsortRAM --alignSJoverhangMin $alignSJoverhangMin --alignSJDBoverhangMin $alignSJDBoverhangMin --outFilterMismatchNmax $outFilterMismatchNmax --outFilterMismatchNoverReadLmax $outFilterMismatchNoverReadLmax --alignIntronMin $alignIntronMin --alignIntronMax $alignIntronMax --alignMatesGapMax $alignMatesGapMax --winAnchorMultimapNmax $winAnchorMultimapNmax $star_salmon_alignment_params """ |
2902 2903 2904 | """ salmon quant -p ${task.cpus} -t $transcriptome -l $libtype -a $bam_file --incompatPrior $incompatPrior -o $sample_name $salmon_alignment_based_params """ |
2930 2931 2932 | """ $workflow.projectDir/bin/split_quant_tables_salmon.sh $transcriptome_pathogen $transcriptome_host salmon/*/quant.sf "quant.sf" """ |
2963 2964 2965 | """ $workflow.projectDir/bin/salmon_extract_ambig_uniq_transcripts_genes.R salmon/*/quant.sf salmon/*/aux_info/ambig_info.tsv $sample_name $annotations """ |
2985 2986 2987 | """ $workflow.projectDir/bin/salmon_host_comb_ambig_uniq.R salmon/*/aux_info/*_host_quant_ambig_uniq.sf """ |
3007 3008 3009 | """ $workflow.projectDir/bin/salmon_pathogen_comb_ambig_uniq.R salmon/*/aux_info/*_pathogen_quant_ambig_uniq.sf """ |
3033 3034 3035 | """ $workflow.projectDir/bin/tximport.R salmon $annotations $sample_name """ |
3057 3058 3059 | """ python $workflow.projectDir/bin/collect_quantification_data.py -i $input_quantification -q salmon -a gene_id -org host_gene_level """ |
3082 3083 3084 | """ python $workflow.projectDir/bin/collect_quantification_data.py -i $input_quantification -q salmon -a $gene_attribute -org both """ |
3117 3118 3119 3120 3121 | """ $workflow.projectDir/bin/split_quant_tables_salmon.sh $transcriptome_pathogen $transcriptome_host $quant_table "quant_salmon.tsv" pathonen_tab=\$(if [ \$(cat pathogen_quant_salmon.tsv | wc -l) -gt 1 ]; then echo "true"; else echo "false"; fi) host_tab=\$(if [ \$(cat host_quant_salmon.tsv | wc -l) -gt 1 ]; then echo "true"; else echo "false"; fi) """ |
3145 3146 3147 | """ $workflow.projectDir/bin/combine_quant_annotations.py -q $quantification_table -annotations $annotation_table -a $attribute -org pathogen """ |
3170 3171 3172 | """ $workflow.projectDir/bin/combine_quant_annotations.py -q $quantification_table -annotations $annotation_table -a $attribute -org host """ |
3194 3195 3196 | """ $workflow.projectDir/bin/combine_annotations_salmon_gene_level.py -q $quantification_table -annotations $annotation_table -a gene_id -org host """ |
3220 3221 3222 | """ $workflow.projectDir/bin/extract_processed_reads.sh $Log_final_out $sample_name star """ |
3243 3244 3245 | """ cat $process_reads > processed_reads_star.tsv """ |
3274 3275 3276 | """ python $workflow.projectDir/bin/scatter_plots.py -q $quant_table -a $attribute -org pathogen """ |
3305 3306 3307 | """ python $workflow.projectDir/bin/scatter_plots.py -q $quant_table -a $attribute -org host """ |
3329 3330 3331 | """ $workflow.projectDir/bin/extract_processed_reads.sh salmon_alignment_mode/*/aux_info/meta_info.json $sample_name salmon_alignment """ |
3352 3353 3354 | """ cat $process_reads > processed_reads_salmon_alignment.tsv """ |
3381 3382 3383 | """ python $workflow.projectDir/bin/mapping_stats.py -q_p $quant_table_pathogen -q_h $quant_table_host -total_processed $total_processed_reads -total_raw $total_raw_reads -a $attribute --star_processed $total_processed_reads_star -t salmon_alignment -o salmon_alignment_host_pathogen_total_reads.tsv """ |
3407 3408 3409 | """ python $workflow.projectDir/bin/plot_mapping_statistics_salmon_alignment.py -i $stats """ |
3436 3437 3438 | ''' python !{workflow.projectDir}/bin/RNA_class_content.py -q !{quant_table} -a !{attribute} -annotations !{gene_annotations} -q_tool salmon -org pathogen 2>&1 ''' |
3466 3467 3468 | ''' python !{workflow.projectDir}/bin/RNA_class_content.py -q !{quant_table} -a !{attribute} -annotations !{gene_annotations} -rna !{rna_classes_to_replace} -q_tool salmon -org host 2>&1 ''' |
3494 3495 3496 | """ python $workflow.projectDir/bin/plot_RNA_class_stats_each.py -i $stats_table """ |
3521 3522 3523 | """ python $workflow.projectDir/bin/plot_RNA_class_stats_each.py -i $stats_table """ |
3549 3550 3551 | """ python $workflow.projectDir/bin/plot_RNA_class_stats_combined.py -i $stats_table -org pathogen """ |
3576 3577 3578 | """ python $workflow.projectDir/bin/plot_RNA_class_stats_combined.py -i $stats_table -org host """ |
3617 3618 3619 3620 | """ mkdir index STAR --runThreadN ${task.cpus} --runMode genomeGenerate --genomeDir index/ --genomeFastaFiles $fasta --sjdbGTFfile $gff --sjdbGTFfeatureExon exon --sjdbGTFtagExonParentTranscript Parent --sjdbOverhang $sjdbOverhang $star_index_params """ |
3666 3667 3668 3669 | """ mkdir $sample_name STAR --runThreadN ${task.cpus} --genomeDir . --sjdbGTFfile $gff $readFilesCommand --readFilesIn $reads --outSAMtype BAM SortedByCoordinate --outSAMunmapped $outSAMunmapped --outSAMattributes $outSAMattributes --outWigType $outWigType --outWigStrand $outWigStrand --outFileNamePrefix $sample_name/$sample_name --sjdbGTFfeatureExon exon --sjdbGTFtagExonParentTranscript Parent --outFilterMultimapNmax $outFilterMultimapNmax --outFilterType $outFilterType --limitBAMsortRAM $limitBAMsortRAM --alignSJoverhangMin $alignSJoverhangMin --alignSJDBoverhangMin $alignSJDBoverhangMin --outFilterMismatchNmax $outFilterMismatchNmax --outFilterMismatchNoverReadLmax $outFilterMismatchNoverReadLmax --alignIntronMin $alignIntronMin --alignIntronMax $alignIntronMax --alignMatesGapMax $alignMatesGapMax --winAnchorMultimapNmax $winAnchorMultimapNmax $star_alignment_params """ |
3671 3672 3673 3674 | """ mkdir $sample_name STAR --runThreadN ${task.cpus} --genomeDir . --sjdbGTFfile $gff $readFilesCommand --readFilesIn ${reads[0]} ${reads[1]} --outSAMtype BAM SortedByCoordinate --outSAMunmapped $outSAMunmapped --outSAMattributes $outSAMattributes --outWigType $outWigType --outWigStrand $outWigStrand --outFileNamePrefix $sample_name/$sample_name --sjdbGTFfeatureExon exon --sjdbGTFtagExonParentTranscript Parent --outFilterMultimapNmax $outFilterMultimapNmax --outFilterType $outFilterType --limitBAMsortRAM $limitBAMsortRAM --alignSJoverhangMin $alignSJoverhangMin --alignSJDBoverhangMin $alignSJDBoverhangMin --outFilterMismatchNmax $outFilterMismatchNmax --outFilterMismatchNoverReadLmax $outFilterMismatchNoverReadLmax --alignIntronMin $alignIntronMin --alignIntronMax $alignIntronMax --alignMatesGapMax $alignMatesGapMax --winAnchorMultimapNmax $winAnchorMultimapNmax $star_alignment_params """ |
3708 3709 3710 | """ $workflow.projectDir/bin/remove_crossmapped_reads_BAM.sh $alignment $workflow.projectDir/bin $host_reference $pathogen_reference $cross_mapped_reads $bam_file_without_crossmapped """ |
3712 3713 3714 | """ $workflow.projectDir/bin/remove_crossmapped_read_pairs_BAM.sh $alignment $workflow.projectDir/bin $host_reference $pathogen_reference $cross_mapped_reads $bam_file_without_crossmapped """ |
3737 3738 3739 | """ $workflow.projectDir/bin/extract_processed_reads.sh $Log_final_out $sample_name star """ |
3762 3763 3764 | """ cat $process_reads > processed_reads_star.tsv """ |
3790 3791 3792 | ''' !{workflow.projectDir}/bin/count_uniquely_mapped_reads.sh !{alignment} !{host_reference_names} !{pathogen_reference_names} !{sample_name} !{name} ''' |
3794 3795 3796 | ''' !{workflow.projectDir}/bin/count_uniquely_mapped_read_pairs.sh !{alignment} !{host_reference_names} !{pathogen_reference_names} !{sample_name} !{name} ''' |
3818 3819 3820 | """ python $workflow.projectDir/bin/combine_tables.py -i $stats -o uniquely_mapped_reads_star.tsv -s uniquely_mapped_reads """ |
3842 3843 3844 | """ $workflow.projectDir/bin/count_cross_mapped_reads.sh $cross_mapped_reads """ |
3870 3871 3872 | ''' !{workflow.projectDir}/bin/count_multi_mapped_reads.sh !{alignment} !{host_reference_names} !{pathogen_reference_names} !{sample_name} !{name} ''' |
3874 3875 3876 | ''' !{workflow.projectDir}/bin/count_multi_mapped_read_pairs.sh !{alignment} !{host_reference_names} !{pathogen_reference_names} !{sample_name} !{name} ''' |
3899 3900 3901 | """ python $workflow.projectDir/bin/combine_tables.py -i $stats -o multi_mapped_reads_star.tsv -s multi_mapped_reads """ |
3928 3929 3930 | """ python $workflow.projectDir/bin/mapping_stats.py -total_raw $total_raw_reads -total_processed $total_processed_reads -m_u $uniquely_mapped_reads -m_m $multi_mapped_reads -c_m $cross_mapped_reads -t star -o star_mapping_stats.tsv """ |
3952 3953 3954 | """ python $workflow.projectDir/bin/plot_mapping_stats_star.py -i $stats """ |
3999 4000 4001 4002 | """ htseq-count -n $task.cpus -t quant -f bam -r pos $st $gff -i $host_attr -s $stranded --max-reads-in-buffer=$max_reads_in_buffer -a $minaqual $htseq_params > $name_file2 sed -i '1{h;s/.*/'"$sample_name"'/;G}' "$name_file2" """ |
4029 4030 4031 | """ python $workflow.projectDir/bin/collect_quantification_data.py -i $input_quantification -q htseq -a $host_attribute """ |
4057 4058 4059 | """ $workflow.projectDir/bin/calculate_TPM_HTSeq.R $input_quantification $host_attribute $gff_pathogen $gff_host """ |
4094 4095 4096 4097 4098 | """ $workflow.projectDir/bin/split_quant_tables.sh $quant_table $host_annotations $pathogen_annotations quantification_uniquely_mapped_htseq.tsv pathonen_tab=\$(if [ \$(cat pathogen_quantification_uniquely_mapped_htseq.tsv | wc -l) -gt 1 ]; then echo "true"; else echo "false"; fi) host_tab=\$(if [ \$(cat host_quantification_uniquely_mapped_htseq.tsv | wc -l) -gt 1 ]; then echo "true"; else echo "false"; fi) """ |
4122 4123 4124 | """ $workflow.projectDir/bin/combine_quant_annotations.py -q $quantification_table -annotations $annotation_table -a $attribute -org pathogen """ |
4148 4149 4150 | """ $workflow.projectDir/bin/combine_quant_annotations.py -q $quantification_table -annotations $annotation_table -a $attribute -org host """ |
4181 4182 4183 | """ python $workflow.projectDir/bin/scatter_plots.py -q $quant_table -a $attribute -org pathogen """ |
4213 4214 4215 | """ python $workflow.projectDir/bin/scatter_plots.py -q $quant_table -a $attribute -org host """ |
4241 4242 4243 | """ python $workflow.projectDir/bin/mapping_stats.py -q_p $quant_table_pathogen -q_h $quant_table_host -a $attribute -star $star_stats -t htseq -o htseq_uniquely_mapped_reads_stats.tsv """ |
4266 4267 4268 | """ python $workflow.projectDir/bin/plot_mapping_stats_htseq.py -i $stats """ |
4296 4297 4298 | ''' python !{workflow.projectDir}/bin/RNA_class_content.py -q !{quant_table} -a !{attribute} -annotations !{gene_annotations} -q_tool htseq -org pathogen 2>&1 ''' |
4327 4328 4329 | ''' python !{workflow.projectDir}/bin/RNA_class_content.py -q !{quant_table} -a !{attribute} -annotations !{gene_annotations} -rna !{rna_classes_to_replace} -q_tool htseq -org host 2>&1 ''' |
4355 4356 4357 | """ python $workflow.projectDir/bin/plot_RNA_class_stats_each.py -i $stats_table """ |
4383 4384 4385 | """ python $workflow.projectDir/bin/plot_RNA_class_stats_each.py -i $stats_table """ |
4411 4412 4413 | """ python $workflow.projectDir/bin/plot_RNA_class_stats_combined.py -i $stats_table -org pathogen """ |
4438 4439 4440 | """ python $workflow.projectDir/bin/plot_RNA_class_stats_combined.py -i $stats_table -org host """ |
4477 4478 4479 | """ multiqc -d --export -f $rtitle $rfilename $custom_config_file . """ |
4499 4500 4501 | """ markdown_to_html.py $output_docs -o results_description.html """ |
Support
- Future updates
Related Workflows





