The TronFlow alignment pipeline is part of a collection of computational workflows for tumor-normal pair somatic variant calling.
Find the documentation here
This pipeline aligns paired and single end FASTQ files with BWA aln and mem algorithms and with BWA mem 2.
For RNA-seq STAR is also supported. To increase sensitivity of novel junctions use
--star_two_pass_mode
(recommended for RNAseq variant calling).
It also includes an initial step of read trimming using FASTP.
How to run it
Run it from GitHub as follows:
nextflow run tron-bioinformatics/tronflow-alignment -profile conda --input_files $input --output $output --algorithm aln --library paired
Otherwise download the project and run as follows:
nextflow main.nf -profile conda --input_files $input --output $output --algorithm aln --library paired
Find the help as follows:
$ nextflow run tron-bioinformatics/tronflow-alignment --help
N E X T F L O W ~ version 19.07.0
Launching `main.nf` [intergalactic_shannon] - revision: e707c77d7b
Usage:
nextflow main.nf --input_files input_files [--reference reference.fasta]
Input:
* input_fastq1: the path to a FASTQ file (incompatible with --input_files)
* input_name: name of the sample (only needed if input_fastq1 is used)
* input_files: the path to a tab-separated values file containing in each row the sample name and two paired FASTQs (incompatible with --fastq1 and --fastq2)
when `--library paired`, or a single FASTQ file when `--library single`
Example input file:
name1 fastq1.1 fastq1.2
name2 fastq2.1 fastq2.2
* reference: path to the indexed FASTA genome reference or the star reference folder in case of using star
Optional input:
* input_fastq2: the path to a second FASTQ file (incompatible with --input_files, incompatible with --library paired)
* output: the folder where to publish output (default: output)
* algorithm: determines the BWA algorithm, either `aln`, `mem`, `mem2` or `star` (default `aln`)
* library: determines whether the sequencing library is paired or single end, either `paired` or `single` (default `paired`)
* cpus: determines the number of CPUs for each job, with the exception of bwa sampe and samse steps which are not parallelized (default: 8)
* memory: determines the memory required by each job (default: 32g)
* inception: if enabled it uses an inception, only valid for BWA aln, it requires a fast file system such as flash (default: false)
* skip_trimming: skips the read trimming step
* star_two_pass_mode: activates STAR two-pass mode, increasing sensitivity of novel junction discovery, recommended for RNA variant calling (default: false)
* additional_args: additional alignment arguments, only effective in BWA mem, BWA mem 2 and STAR (default: none)
Output:
* A BAM file \${name}.bam and its index
* FASTP read trimming stats report in HTML format \${name.fastp_stats.html}
* FASTP read trimming stats report in JSON format \${name.fastp_stats.json}
Input tables
The table with FASTQ files expects two tab-separated columns without a header
Sample name | FASTQ 1 | FASTQ 2 |
---|---|---|
sample_1 | /path/to/sample_1.1.fastq | /path/to/sample_1.2.fastq |
sample_2 | /path/to/sample_2.1.fastq | /path/to/sample_2.2.fastq |
Reference genome
The reference genome has to be provided in FASTA format and it requires two set of indexes:
-
FAI index. Create with
samtools faidx your.fasta
-
BWA indexes. Create with
bwa index your.fasta
For bwa-mem2 a specific index is needed:
bwa-mem2 index your.fasta
For star a reference folder prepared with star has to be provided. In order to prepare it will need the reference genome in FASTA format and the gene annotations in GTF format. Run a command as follows:
STAR --runMode genomeGenerate --genomeDir $YOUR_FOLDER --genomeFastaFiles $YOUR_FASTA --sjdbGTFfile $YOUR_GTF
References
-
Li H. and Durbin R. (2010) Fast and accurate long-read alignment with Burrows-Wheeler Transform. Bioinformatics, Epub. https://doi.org/10.1093/bioinformatics/btp698
-
Shifu Chen, Yanqing Zhou, Yaru Chen, Jia Gu; fastp: an ultra-fast all-in-one FASTQ preprocessor, Bioinformatics, Volume 34, Issue 17, 1 September 2018, Pages i884–i890, https://doi.org/10.1093/bioinformatics/bty560
-
Vasimuddin Md, Sanchit Misra, Heng Li, Srinivas Aluru. Efficient Architecture-Aware Acceleration of BWA-MEM for Multicore Systems. IEEE Parallel and Distributed Processing Symposium (IPDPS), 2019.
-
Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013 Jan 1;29(1):15-21. doi: 10.1093/bioinformatics/bts635. Epub 2012 Oct 25. PMID: 23104886; PMCID: PMC3530905.
Code Snippets
21 22 23 24 25 26 27 28 29 30 31 32 33 34 | """ # --input_files needs to be forced, otherwise it is inherited from profile in tests fastp \ --in1 ${fastq1} \ --in2 ${fastq2} \ --out1 ${fastq1.baseName}.trimmed.fq.gz \ --out2 ${fastq2.baseName}.trimmed.fq.gz \ --json ${name}.fastp_stats.json \ --html ${name}.fastp_stats.html \ --thread ${params.cpus} echo ${params.manifest} >> software_versions.${task.process}.txt fastp --version 2>> software_versions.${task.process}.txt """ |
55 56 57 58 59 60 61 62 63 64 65 66 | """ # --input_files needs to be forced, otherwise it is inherited from profile in tests fastp \ --in1 ${fastq1} \ --out1 ${fastq1.baseName}.trimmed.fq.gz \ --json ${name}.fastp_stats.json \ --html ${name}.fastp_stats.html \ --thread ${params.cpus} echo ${params.manifest} >> software_versions.${task.process}.txt fastp --version 2>> software_versions.${task.process}.txt """ |
17 18 19 20 21 22 | """ bwa aln -t ${task.cpus} ${params.reference} ${fastq} > ${fastq.baseName}.sai echo ${params.manifest} >> software_versions.${task.process}.txt echo "bwa=0.7.17" >> software_versions.${task.process}.txt """ |
42 43 44 45 46 47 48 | """ bwa sampe ${params.reference} ${sai1} ${sai2} ${fastq1} ${fastq2} | samtools view -uS - | samtools sort - > ${name}.bam echo ${params.manifest} >> software_versions.${task.process}.txt echo "bwa=0.7.17" >> software_versions.${task.process}.txt samtools --version >> software_versions.${task.process}.txt """ |
68 69 70 71 72 73 | """ bwa samse ${params.reference} ${sai} ${fastq} | samtools view -uS - | samtools sort - > ${name}.bam echo ${params.manifest} >> software_versions.${task.process}.txt echo "bwa=0.7.17" >> software_versions.${task.process}.txt """ |
94 95 96 97 98 99 100 101 102 | """ bwa sampe ${params.reference} <( bwa aln -t ${params.cpus} ${params.reference} ${fastq1} ) \ <( bwa aln -t ${params.cpus} ${params.reference} ${fastq2} ) ${fastq1} ${fastq2} \ | samtools view -uS - | samtools sort - > ${name}.bam echo ${params.manifest} >> software_versions.${task.process}.txt echo "bwa=0.7.17" >> software_versions.${task.process}.txt samtools --version >> software_versions.${task.process}.txt """ |
18 19 20 21 22 23 24 | """ bwa-mem2 mem ${params.additional_args} -t ${task.cpus} ${params.reference} ${fastq1} ${fastq2} | samtools view -uS - | samtools sort - > ${name}.bam echo ${params.manifest} >> software_versions.${task.process}.txt bwa-mem2 version >> software_versions.${task.process}.txt samtools --version >> software_versions.${task.process}.txt """ |
44 45 46 47 48 49 50 | """ bwa-mem2 mem ${params.additional_args} -t ${task.cpus} ${params.reference} ${fastq} | samtools view -uS - | samtools sort - > ${name}.bam echo ${params.manifest} >> software_versions.${task.process}.txt bwa-mem2 version >> software_versions.${task.process}.txt samtools --version >> software_versions.${task.process}.txt """ |
18 19 20 21 22 23 24 | """ bwa mem ${params.additional_args} -t ${task.cpus} ${params.reference} ${fastq1} ${fastq2} | samtools view -uS - | samtools sort - > ${name}.bam echo ${params.manifest} >> software_versions.${task.process}.txt echo "bwa=0.7.17" >> software_versions.${task.process}.txt samtools --version >> software_versions.${task.process}.txt """ |
44 45 46 47 48 49 50 | """ bwa mem ${params.additional_args} -t ${task.cpus} ${params.reference} ${fastq} | samtools view -uS - | samtools sort - > ${name}.bam echo ${params.manifest} >> software_versions.${task.process}.txt echo "bwa=0.7.17" >> software_versions.${task.process}.txt samtools --version >> software_versions.${task.process}.txt """ |
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | """ STAR --genomeDir ${params.reference} ${two_pass_mode_param} ${params.additional_args} \ --readFilesCommand "gzip -d -c -f" \ --readFilesIn ${fastq1} ${fastq2} \ --outSAMmode Full \ --outSAMattributes Standard \ --outSAMunmapped None \ --outReadsUnmapped Fastx \ --outFilterMismatchNoverLmax 0.02 \ --runThreadN ${task.cpus} \ --outSAMtype BAM SortedByCoordinate \ --outFileNamePrefix ${name}. mv ${name}.Aligned.sortedByCoord.out.bam ${name}.bam echo ${params.manifest} >> software_versions.${task.process}.txt STAR --version >> software_versions.${task.process}.txt """ |
58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 | """ STAR --genomeDir ${params.reference} ${two_pass_mode_param} ${params.additional_args} \ --readFilesCommand "gzip -d -c -f" \ --readFilesIn ${fastq} \ --outSAMmode Full \ --outSAMattributes Standard \ --outSAMunmapped None \ --outReadsUnmapped Fastx \ --outFilterMismatchNoverLmax 0.02 \ --runThreadN ${task.cpus} \ --outSAMtype BAM SortedByCoordinate \ --outFileNamePrefix ${name}. mv ${name}.Aligned.sortedByCoord.out.bam ${name}.bam echo ${params.manifest} >> software_versions.${task.process}.txt STAR --version >> software_versions.${task.process}.txt """ |
18 19 20 21 22 23 | """ samtools index -@ ${task.cpus} ${bam} echo ${params.manifest} >> software_versions.${task.process}.txt samtools --version >> software_versions.${task.process}.txt """ |
Support
- Future updates
Related Workflows





