
Help improve this workflow!
This workflow has been published but could be further improved with some additional meta data:- Keyword(s) in categories input, output
You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .
Identifying genomic variants using next-generation sequencing data, such as single nucleotide polymorphisms (SNPs) and DNA insertions and deletions (indels), is an important aspect of scientific discovery. This pipeline was created to help researchers reliably and quickly discover and annotate sequence variants. The pipeline is based on the Broad Institute's recommended best practices for variant discovery analysis and uses the Genome Analysis Toolkit 4 (GATK4) to perform variant calling. SnpEff is employed to annotate and predict the effects of variants after SNPs have been detected. This pipeline is designed for calling variants in clonal samples, or samples from a single person. In these samples, the frequency of variations is estimated to be 1 (for haploids or homozygous diploids) or 0.5 (for heterozygous diploids).
Code Snippets
48 49 50 51 52 53 54 55 56 57 58 59 | """ bwa mem \ -K 100000000 \ -v 3 \ -t ${task.cpus} \ -Y \ -R \"${readGroup}\" \ $ref \ ${reads[0]} \ ${reads[1]} \ > ${pair_id}_aligned_reads.sam """ |
82 83 84 85 86 87 88 89 90 | """ mkdir -p ${params.tmpdir}/${workflow.runName}/${pair_id} gatk --java-options "-Djava.io.tmpdir=${params.tmpdir}/${workflow.runName}/${pair_id}" \ MarkDuplicatesSpark \ -I $aligned_reads \ -M ${pair_id}_dedup_metrics.txt \ -O ${pair_id}_sorted_dedup.bam rm -r ${params.tmpdir}/${workflow.runName}/${pair_id} """ |
111 112 113 114 115 116 117 118 119 120 121 122 123 | """ java -jar \$PICARD_JAR \ CollectAlignmentSummaryMetrics \ R=${params.ref} \ I=${sorted_dedup_reads} \ O=${pair_id}_alignment_metrics.txt java -jar \$PICARD_JAR \ CollectInsertSizeMetrics \ INPUT=${sorted_dedup_reads} \ OUTPUT=${pair_id}_insert_metrics.txt \ HISTOGRAM_FILE=${pair_id}_insert_size_histogram.pdf samtools depth -a ${sorted_dedup_reads} > ${pair_id}_depth_out.txt """ |
147 148 149 150 151 152 | """ gatk HaplotypeCaller \ -R $ref \ -I $input_bam \ -O ${pair_id}_raw_variants_${round}.vcf \ """ |
173 174 175 176 177 178 179 180 181 182 183 184 | """ gatk SelectVariants \ -R $ref \ -V $raw_variants \ -select-type SNP \ -O ${pair_id}_raw_snps_${round}.vcf gatk SelectVariants \ -R $ref \ -V $raw_variants \ -select-type INDEL \ -O ${pair_id}_raw_indels_${round}.vcf """ |
203 204 205 206 207 208 209 210 211 212 213 214 | """ gatk VariantFiltration \ -R $ref \ -V $raw_snps \ -O ${pair_id}_filtered_snps_${round}.vcf \ -filter-name "QD_filter" -filter "QD < 2.0" \ -filter-name "FS_filter" -filter "FS > 60.0" \ -filter-name "MQ_filter" -filter "MQ < 40.0" \ -filter-name "SOR_filter" -filter "SOR > 4.0" \ -filter-name "MQRankSum_filter" -filter "MQRankSum < -12.5" \ -filter-name "ReadPosRankSum_filter" -filter "ReadPosRankSum < -8.0" """ |
233 234 235 236 237 238 239 240 241 | """ gatk VariantFiltration \ -R $ref \ -V $raw_indels \ -O ${pair_id}_filtered_indels_${round}.vcf \ -filter-name "QD_filter" -filter "QD < 2.0" \ -filter-name "FS_filter" -filter "FS > 200.0" \ -filter-name "SOR_filter" -filter "SOR > 10.0" """ |
287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 | """ echo "New Round: " $new_round gatk SelectVariants \ --exclude-filtered \ -V $filtered_snps \ -O ${pair_id}_bqsr_snps.vcf gatk SelectVariants \ --exclude-filtered \ -V $filtered_indels \ -O ${pair_id}_bqsr_indels.vcf gatk BaseRecalibrator \ -R $ref \ -I $input_bam \ --known-sites ${pair_id}_bqsr_snps.vcf \ --known-sites ${pair_id}_bqsr_indels.vcf \ -O ${pair_id}_recal_data.table gatk ApplyBQSR \ -R $ref \ -I $input_bam \ -bqsr ${pair_id}_recal_data.table \ -O ${pair_id}_recal.bam gatk BaseRecalibrator \ -R $ref \ -I ${pair_id}_recal.bam \ --known-sites ${pair_id}_bqsr_snps.vcf \ --known-sites ${pair_id}_bqsr_indels.vcf \ -O ${pair_id}_post_recal_data.table """ |
329 330 331 332 333 334 | """ gatk AnalyzeCovariates \ -before $recal_table \ -after $post_recal_table \ -plots ${pair_id}_recalibration_plots.pdf """ |
351 352 353 354 355 356 | """ java -jar \$SNPEFF_JAR -v \ -dataDir $params.snpeff_data \ $params.snpeff_db \ $filtered_snps > ${pair_id}_filtered_snps.ann.vcf """ |
382 383 384 | """ parse_metrics.sh ${pair_id} > ${pair_id}_report.csv """ |
Support
- Future updates
Related Workflows





