GermlineStructuralV-nf: Comprehensive Structural Variant Identification Pipeline for Human Genome Using Multi-Caller Approach
Help improve this workflow!
This workflow has been published but could be further improved with some additional meta data:- Keyword(s) in categories input, output
You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .
GermlineStructuralV-nf
Description
GermlineStructuralV-nf is a pipeline for identifying structural variant events in human Illumina short read whole genome sequence data. GermlineStructuralV-nf identifies structural variant and copy number events from BAM files using Manta , Smoove , and TIDDIT . Variants are then merged using SURVIVOR , and annotated by AnnotSV . The pipeline is written in Nextflow and uses Singularity/Docker to run containerised tools.
Structural and copy number detection is challenging. Most structural variant detection tools infer these events from read mapping patterns, which can often resemble sequencing and read alignment artefacts. To address this, GermlineStructuralV-nf employs 3 general purpose structural variant calling tools, which each support a combination of detection methods. Manta, Smoove and TIDDIT use typical detection approaches that consider:
- Discordant read pair alignments
- Split reads that span a breakpoints
- Read depth profiling
- Local de novo assembly
This approach is currently considered the best approach for maximising sensitivty of short read data ( Cameron et al. 2019 , Malmoud et al. 2019 ). By using a combination of tools that employ different methods, we improve our ability to detect different types and sizes of variant events.
Diagram
User guide
To run this pipeline, you will need to prepare your input files, reference data, and clone this repository. Before proceeding, ensure Nextflow is installed on the system you're working on.
Code Snippets
31 32 33 34 35 36 37 38 39 40 41 | """ AnnotSV \ -SVinputFile ${sampleID}_merged.vcf \ -annotationsDir ${params.annotsvDir} \ -bedtools bedtools -bcftools bcftools \ -annotationMode ${mode} \ -genomeBuild GRCh38 \ -includeCI 1 \ -overwrite 1 \ -outputFile ${outputFile} ${extraArgs} """ |
14 15 16 | """ cat "${params.input}" > samples.txt """ |
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 | """ # configure manta SV analysis workflow configManta.py \ --normalBam ${bam} \ --referenceFasta ${params.ref} \ --runDir manta \ ${intervals} ${extraArgs} # run SV detection manta/runWorkflow.py -m local -j ${task.cpus} # clean up outputs mv manta/results/variants/candidateSmallIndels.vcf.gz \ manta/Manta_${sampleID}.candidateSmallIndels.vcf.gz mv manta/results/variants/candidateSmallIndels.vcf.gz.tbi \ manta/Manta_${sampleID}.candidateSmallIndels.vcf.gz.tbi mv manta/results/variants/candidateSV.vcf.gz \ manta/Manta_${sampleID}.candidateSV.vcf.gz mv manta/results/variants/candidateSV.vcf.gz.tbi \ manta/Manta_${sampleID}.candidateSV.vcf.gz.tbi mv manta/results/variants/diploidSV.vcf.gz \ manta/Manta_${sampleID}.diploidSV.vcf.gz mv manta/results/variants/diploidSV.vcf.gz.tbi \ manta/Manta_${sampleID}.diploidSV.vcf.gz.tbi # convert multiline inversion BNDs from manta vcf to single line convertInversion.py \$(which samtools) ${params.ref} \ manta/Manta_${sampleID}.diploidSV.vcf.gz \ > manta/Manta_${sampleID}.diploidSV_converted.vcf # zip and index converted vcf bgzip manta/Manta_${sampleID}.diploidSV_converted.vcf tabix manta/Manta_${sampleID}.diploidSV_converted.vcf.gz """ |
73 74 75 76 77 78 79 80 81 82 83 84 85 | """ # create new header for merged vcf printf "${sampleID}_manta\n" > ${sampleID}_rehead_manta.txt # replace sampleID with caller_sample for merging bcftools reheader \ Manta_${sampleID}.diploidSV_converted.vcf.gz \ -s ${sampleID}_rehead_manta.txt \ -o Manta_${sampleID}.vcf.gz # gunzip vcf gunzip Manta_${sampleID}.vcf.gz """ |
22 23 24 25 26 27 28 | """ smoove call -d --name ${sampleID} \ --fasta ${params.ref} \ --outdir smoove \ --processes ${task.cpus} \ --genotype ${bam} ${extraArgs} """ |
43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | """ # create new header for merged vcf printf "${sampleID}_smoove\n" > ${sampleID}_rehead_smoove.txt # replace sampleID with caller_sample for merging bcftools reheader \ ${sampleID}-smoove.genotyped.vcf.gz \ -s ${sampleID}_rehead_smoove.txt \ -o Smoove_${sampleID}.vcf.gz # gunzip vcf gunzip Smoove_${sampleID}.vcf.gz #clean up #rm -r ${sampleID}_rehead_smoove.txt """ |
16 17 18 19 20 21 22 23 24 25 26 27 | """ echo ${mergeFile} | xargs -n1 > ${sampleID}_survivor.txt SURVIVOR merge ${sampleID}_survivor.txt \ ${params.survivorMaxDist} \ ${params.survivorConsensus} \ ${params.survivorType} \ ${params.survivorStrand} \ 0 \ ${params.survivorSize} \ ${sampleID}_merged.vcf """ |
13 14 15 16 17 18 19 20 21 | """ SURVIVOR vcftobed ${sampleID}_merged.vcf \ 0 -1 \ ${sampleID}_merged.bed SURVIVOR stats ${sampleID}_merged.vcf \ -1 -1 -1 \ ${sampleID}_merged.stats.txt """ |
36 37 | """ """ |
16 17 18 19 20 21 22 | """ tiddit \ --cov \ --bam ${bam} \ --ref ${params.ref} \ -o ${sampleID}_cov ${extraArgs} """ |
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | """ tiddit \ --sv \ -q 20 \ --bam ${bam} \ --ref ${params.ref} \ -o ${sampleID}_sv \ --threads ${task.cpus} ${extraArgs} # rename vcf to show its from tiddit mv ${sampleID}_sv.vcf \ Tiddit_${sampleID}_sv.vcf # filter to pass only variants grep -E "#|PASS" Tiddit_${sampleID}_sv.vcf \ > Tiddit_${sampleID}_PASSsv.vcf """ |
50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 | """ # bgzip and index tiddit vcf bgzip Tiddit_${sampleID}_PASSsv.vcf tabix Tiddit_${sampleID}_PASSsv.vcf.gz # create new header for merged vcf printf "${sampleID}_tiddit\n" > ${sampleID}_rehead_tiddit.txt # replace sampleID with caller_sample for merging bcftools reheader \ Tiddit_${sampleID}_PASSsv.vcf.gz \ -s ${sampleID}_rehead_tiddit.txt \ -o Tiddit_${sampleID}_final.vcf.gz # gunzip vcf gunzip Tiddit_${sampleID}_final.vcf.gz #clean up rm -r ${sampleID}_rehead_tiddit.txt """ |
Support
- Future updates
Related Workflows





