Variant-calling-pipeline for identifying strain specific variants

public 1yr ago 0 bookmarks

View Workflow

variant-calling-pipeline — View Workflow

Help improve this workflow!

This workflow has been published but could be further improved with some additional meta data:

Keyword(s) in categories input, output, operation

You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .

A reproducible snakemake workflow for identifying strain specific variants in Danio Rerio (Zebrafish). Variants called would be uploaded to European Variation Archive, https://www.ebi.ac.uk/ena/browser/home . This workflow requires two Snakefiles and a

Code Snippets

set -o nounset
set -o errexit
set -o pipefail

if [[ $# -ne 3 ]] ; then
  echo "Incorrect number of arguments"
  exit 1
fi

output_fastq1=$1
output_fastq2=$2
units=$3

# Iterate over each sequencing run (i.e. unit)
count=0
for unit in $units; do
  ((count+=1))
  urls=`echo "$unit" | awk -F":" '{ print $1 }'`
  md5s=`echo "$unit" | awk -F":" '{ print $2 }'`
  url1=`echo "$urls" | awk -F";" '{ print $1 }'`
  url2=`echo "$urls" | awk -F";" '{ print $2 }'`
  md51=`echo "$md5s" | awk -F";" '{ print $1 }'`
  md52=`echo "$md5s" | awk -F";" '{ print $2 }'`

  # Get first FASTQ file and check checksum
  wget -q --timeout 60 --tries 10   -O "${output_fastq1}_tmp_$count.gz" "$url1"
  md5=`md5sum "${output_fastq1}_tmp_$count.gz" | awk '{ print $1 }'`
  if [ "$md51" != "$md5" ]; then
    echo "Checksum wrong for '$url1': '$md5' not '$md51'"
    exit 1
  fi

  gunzip "${output_fastq1}_tmp_$count.gz"


  # Get second FASTQ file and check checksum
  wget -q --timeout 60 --tries 10  -O "${output_fastq2}_tmp_$count.gz" "$url2"
  md5=`md5sum "${output_fastq2}_tmp_$count.gz" | awk '{ print $1 }'`
  if [ "$md52" != "$md5" ]; then
    echo "Checksum wrong for '$url2': '$md5' not '$md52'"
    exit 1
  fi

  gunzip "${output_fastq2}_tmp_$count.gz"

done

# Merge unzipped FASTQ files
cat `ls ${output_fastq1}_tmp_* | sort` > $output_fastq1
cat `ls ${output_fastq2}_tmp_* | sort` > $output_fastq2
rm ${output_fastq1}_tmp_* ${output_fastq2}_tmp_*

Shell From line 3 of scripts/download_merge_fastq.sh

shell:
    "scripts/download_merge_fastq.sh {output.fq1} {output.fq2} '{params.fqs}' &> {log}"

SnakeMake From line 39 of main/Snakefile

shell:
    "(mkdir {wildcards.sample}/chunks; "
    "split --suffix-length=3 --additional-suffix=.fastq --numeric-suffixes --lines={params.chunksize} {input.fq1} {wildcards.sample}/chunks/1.; "
    "split --suffix-length=3 --additional-suffix=.fastq --numeric-suffixes --lines={params.chunksize} {input.fq2} {wildcards.sample}/chunks/2.) "
    "&> {log}"

SnakeMake From line 53 of main/Snakefile

shell:
   "bwa aln  {params.genome} {input} > {output}  "

SnakeMake BWA From line 73 of main/Snakefile

shell:
    "bwa aln  {params.genome} {input} > {output} "

SnakeMake BWA From line 91 of main/Snakefile

shell:
    "(bwa sampe {params[0]} {input.sai1} {input.sai2} {input.fq1} {input.fq2} "
    "| samtools view -bT {params.genome} -o {output} -)  "

SnakeMake SAMtools From line 111 of main/Snakefile

shell:
  "samtools merge -o {output} {input} &&  "
  "rm -rf {wildcards.sample}/chunks  &&  "
  "rm -f {wildcards.sample}/1.fastq {wildcards.sample}/2.fastq "

SnakeMake SAMtools From line 127 of main/Snakefile

shell:
  "samtools sort -m {resources.mem}G -n {input} -O BAM -o {output} "

SnakeMake SAMtools From line 143 of main/Snakefile

shell:
  "samtools fixmate {input} - | bammarkduplicates O={output}"

SnakeMake SAMtools From line 157 of main/Snakefile

shell:
  "samtools sort -m {resources.mem}G --threads {resources.cpus} -o {output} {input} "

SnakeMake SAMtools From line 172 of main/Snakefile

shell:
  "  gatk  AddOrReplaceReadGroups -I {input} -O {output.addrg} -LB {wildcards.sample} "
  " -PL {wildcards.sample} -PU {wildcards.sample} -SM {wildcards.sample} --VALIDATION_STRINGENCY SILENT ; "
  "samtools index {wildcards.sample}/{wildcards.sample}_addreadgroups.bam  "

SnakeMake SAMtools gatk From line 187 of main/Snakefile

shell:
  "bcftools mpileup --gvcf 10,20  -f {input.ref} {input.bam} |  "
  "bcftools call --threads {resources.cpus}   -m  -Oz -o {output.vcf} &> {log} ;  "
  "tabix {wildcards.sample}/{wildcards.sample}_bcftools.g.vcf.gz  "

SnakeMake BCFtools tabix From line 210 of main/Snakefile

shell:
  "strelka-wrapper configureStrelkaGermlineWorkflow.py --bam={input.bam} "
  "--referenceFasta={input.ref} --runDir={wildcards.sample}  --exome ; "
  "strelka-wrapper {wildcards.sample}/runWorkflow.py -m local --quiet  -j {resources.cpus} -g {resources.mem} ; "
  "less {wildcards.sample}/results/variants/genome.S1.vcf.gz > {output.vcf} ; " 
  "rm -r -f {wildcards.sample}/workspace {wildcards.sample}/results ; "
  "rm {wildcards.sample}/workflow*  {wildcards.sample}/runWorkflow* "

SnakeMake From line 231 of main/Snakefile

shell:
  " gatk HaplotypeCaller -R {input.ref} -I {input.bam} -O {output} "
  " -L {input.intervals} -VS SILENT -ERC GVCF --QUIET --create-output-variant-index false "

SnakeMake gatk From line 256 of main/Snakefile

shell:
  "gatk MergeVcfs -I {input.chr1} -I {input.chr2} -I {input.chr3} -I {input.chr4} -I {input.chr5} -I {input.chr6} "
  "-I {input.chr7} -I {input.chr8} -I {input.chr9} -I {input.chr10} -I {input.chr11} -I {input.chr12} -I {input.chr13} " 
  "-I {input.chr14} -I {input.chr15} -I {input.chr16} -I {input.chr17} -I {input.chr18} -I {input.chr19} -I {input.chr20} "
  "-I {input.chr21} -I {input.chr22} -I {input.chr23} -I {input.chr24} -I {input.chr25} -I {input.other} "
  "-O {output.vcf} --VALIDATION_STRINGENCY SILENT --QUIET &> {log} ;"
  "rm {wildcards.sample}/gatkchr_* "