An easy-to-use, flexible variant calling pipeline for use on the Biowulf cluster at NIH

public 1yr ago Version: v1.0.0 0 bookmarks

View Workflow

ccbr_gatk4_exome_seq_pipeline — View Workflow

Help improve this workflow!

This workflow has been published but could be further improved with some additional meta data:

Keyword(s) in categories input, output, operation, topic

You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .

GATK4 Whole Exome-sequencing Pipeline . This is the home of the pipeline, exome-seek. Its long-term goals: to accurately call germline and somatic variants, to infer CNVs, and to boldly annotate variants like no pipeline before!

Overview

Welcome to exome-seek! Before getting started, we highly recommend reading through exome-seek's documentation .

The ./exome-seek pipeline is composed several inter-related sub commands to setup and run the pipeline across different systems. Each of the available sub commands perform different functions:

exome-seek run : Run the GATK4 WES pipeline with your input files.
exome-seek unlock : Unlocks a previous runs output directory.
exome-seek cache : Cache remote resources locally, coming soon!

Exome-seek is a comprehensive whole exome-sequencing pipeline following the Broad's set of best practices. It relies on technologies like Singularity1 to maintain the highest-level of reproducibility. The pipeline consists of a series of data processing and quality-control steps orchestrated by Snakemake2 , a flexible and scalable workflow management system, to submit jobs to a cluster or cloud provider.

The pipeline is compatible with data generated from Illumina short-read sequencing technologies. As input, it accepts a set of FastQ or BAM files and can be run locally on a compute instance, on-premise using a cluster, or on the cloud (feature coming soon!). A user can define the method or mode of execution. The pipeline can submit jobs to a cluster using a job scheduler like SLURM, or run on AWS using Tibanna (feature coming soon!). A hybrid approach ensures the pipeline is accessible to all users.

Before getting started, we highly recommend reading through the usage section of each available sub command.

For more information about issues or trouble-shooting a problem, please checkout our FAQ prior to opening an issue on Github .

Dependencies

Requires: singularity>=3.5 snakemake==6.X

Snakemake and singularity must be installed on the target system. Snakemake orchestrates the execution of each step in the pipeline. To guarantee the highest level of reproducibility, each step relies on versioned images from DockerHub . Snakemake uses singaularity to pull these images onto the local filesystem prior to job execution, and as so, snakemake and singularity are the only two dependencies.

Installation

Please clone this repository to your local filesystem using the following command:

# Clone Repository from Github
git clone https://github.com/mtandon09/CCBR_GATK4_Exome_Seq_Pipeline.git
# Change your working directory
cd CCBR_GATK4_Exome_Seq_Pipeline/

Contribute

This site is a living document, created for and by members like you. EXOME-seek is maintained by the members of CCBR and is improved by continous feedback! We encourage you to contribute new content and make improvements to existing content via pull request to our repository .

References

1. Kurtzer GM, Sochat V, Bauer MW (2017). Singularity: Scientific containers for mobility of compute. PLoS ONE 12(5): e0177459.
2. Koster, J. and S. Rahmann (2018). "Snakemake-a scalable bioinformatics workflow engine." Bioinformatics 34(20): 3600.

Code Snippets

shell: """
myoutdir="$(dirname {output.cnvs})/{params.tumorsample}"
if [ ! -d "$myoutdir" ]; then mkdir -p "$myoutdir"; fi

perl "{params.config_script}" \\
    "$myoutdir" \\
    {params.lengths} \\
    {params.chroms} \\
    {input.tumor} \\
    {input.normal} \\
    {params.pile} \\
    {params.fasta} \\
    {params.snps} \\
    {input.targets}

freec -conf "$myoutdir/freec_exome_config.txt"

cat "{params.sig_script}" | \\
    R --slave \\
    --args $myoutdir/{params.tumorsample}.bam_CNVs \\
    $myoutdir/{params.tumorsample}.bam_ratio.txt

mv $myoutdir/{params.tumorsample}.bam_CNVs.p.value.txt {output.cnvs}
cat "{params.plot_script}" | \\
    R --slave \\
    --args 2 \\
    $myoutdir/{params.tumorsample}.bam_ratio.txt \\
    $myoutdir/{params.tumorsample}.bam_BAF.txt
"""

SnakeMake From line 27 of rules/cnv.smk

shell: """
myoutdir="$(dirname {output.fit})/{params.tumorsample}"
if [ ! -d "$myoutdir" ]; then mkdir -p "$myoutdir"; fi

gzip -c "$(dirname {input.freeccnvs})/{params.tumorsample}/{params.normalsample}.bam_minipileup.pileup" \\
    > "$myoutdir/{params.normalsample}.recal.bam_minipileup.pileup.gz"
gzip -c "$(dirname {input.freeccnvs})/{params.tumorsample}/{params.tumorsample}.bam_minipileup.pileup" \\
    > "$myoutdir/{params.tumorsample}.recal.bam_minipileup.pileup.gz"

sequenza-utils bam2seqz \\
    -p \\
    -gc {params.gc} \\
    -n "$myoutdir/{params.normalsample}.recal.bam_minipileup.pileup.gz" \\
    -t "$myoutdir/{params.tumorsample}.recal.bam_minipileup.pileup.gz" \\
    | gzip > "$myoutdir/{params.tumorsample}.seqz.gz"

sequenza-utils seqz_binning \\
    -w 100 \\
    -s "$myoutdir/{params.tumorsample}.seqz.gz" \\
    | tee "$myoutdir/{params.tumorsample}.bin100.seqz" \\
    | gzip > "$myoutdir/{params.tumorsample}.bin100.seqz.gz"

Rscript "{params.run_script}" \\
    "$myoutdir/{params.tumorsample}.bin100.seqz" \\
    "$myoutdir" \\
    "{params.normalsample}+{params.tumorsample}" \\
    {threads}

mv "$myoutdir/{params.normalsample}+{params.tumorsample}_alternative_solutions.txt" "{output.fit}"
rm "$myoutdir/{params.tumorsample}.bin100.seqz"
"""

SnakeMake sequenza-utils From line 75 of rules/cnv.smk

shell: """
myoutdir="$(dirname {output.cnvs})/{params.tumorsample}"
if [ ! -d "$myoutdir" ]; then mkdir -p "$myoutdir"; fi

perl {params.config_script} \\
    "$myoutdir" \\
    {params.lengths} \\
    {params.chroms} \\
    {input.tumor} \\
    {input.normal} \\
    {params.pile} \\
    {params.fasta} \\
    {params.snps} \\
    {input.targets} \\
    {input.fit}

freec -conf "$myoutdir/freec_exome_config.txt"

cat "{params.sig_script}" | \\
    R --slave \\
    --args $myoutdir/{params.tumorsample}.bam_CNVs \\
    $myoutdir/{params.tumorsample}.bam_ratio.txt

mv $myoutdir/{params.tumorsample}.bam_CNVs.p.value.txt {output.cnvs}
cat "{params.plot_script}" | \\
    R --slave \\
    --args 2 \\
    $myoutdir/{params.tumorsample}.bam_ratio.txt \\
    $myoutdir/{params.tumorsample}.bam_BAF.txt
"""

SnakeMake From line 134 of rules/cnv.smk

shell: """
wget https://github.com/mikdio/SOBDetector/releases/download/v1.0.2/SOBDetector_v1.0.2.jar \\
    -O {output.SOBDetector_jar}
"""

SnakeMake From line 8 of rules/ffpe.smk

shell: """
if [ ! -d "$(dirname {output.pass1_vcf})" ]; then 
    mkdir -p "$(dirname {output.pass1_vcf})"
fi

echo "Running SOBDetector..."
# Try/catch for running SOB Dectetor
# with an empty input VCF file 
java -jar {input.SOBDetector_jar} \\
    --input-type VCF \\
    --input-variants "{input.vcf}" \\
    --input-bam {input.bam} \\
    --output-variants {output.pass1_vcf} \\
    --only-passed false || {{
# Compare length of VCF header to 
# the total length of the file
header_length=$(grep '^#' "{input.vcf}" | wc -l)
file_length=$(cat "{input.vcf}" | wc -l)
if [ $header_length -eq $file_length ]; then
    # VCF file only contains header
    # File contains no variants, catch
    # problem so pipeline can continue
    cat "{input.vcf}" > {output.pass1_vcf}
else
    # SOB Dectector failed for another reason
    echo "SOB Detector Failed... exiting now!" 1>&2
    exit 1
fi
}}

bcftools query \\
    -f '%INFO/numF1R2Alt\\t%INFO/numF2R1Alt\\t%INFO/numF1R2Ref\\t%INFO/numF2R1Ref\\t%INFO/numF1R2Other\\t%INFO/numF2R1Other\\t%INFO/SOB\\n' \\
    {output.pass1_vcf} \\
    | awk '{{if ($1 != "."){{tum_alt=$1+$2; tum_depth=$1+$2+$3+$4+$5+$6; if (tum_depth==0){{tum_af=1}} else {{tum_af=tum_alt/tum_depth }}; print tum_alt,tum_depth,tum_af,$7}}}}' > {output.pass1_info} 
"""

SnakeMake BCFtools From line 30 of rules/ffpe.smk

shell: """
echo -e "#TUMOR.alt\\tTUMOR.depth\\tTUMOR.AF\\tSOB\\tFS\\tSOR\\tTLOD\\tReadPosRankSum" > {output.all_info_file}
cat {input.info_files} >> {output.all_info_file}

# Try/catch for running calculating
# mean and standard deviation with 
# with a set of empty input VCF files
all_length=$(tail -n+2 {output.all_info_file} | wc -l)
if [ $all_length -eq 0 ]; then 
    echo 'WARNING: All SOB Dectect pass1 samples contained no variants.' \\
    | tee {output.params_file}
else
    # Calculate mean and standard deviation
    grep -v '^#' {output.all_info_file} \\
    | awk '{{ total1 += $1; ss1 += $1^2; total2 += $2; ss2 += $2^2; total3 += $3; ss3 += $3^2; total4 += $4; ss4 += $4^2 }} END {{ print total1/NR,total2/NR,total3/NR,total4/NR; print sqrt(ss1/NR-(total1/NR)^2),sqrt(ss2/NR-(total2/NR)^2),sqrt(ss3/NR-(total3/NR)^3),sqrt(ss4/NR-(total4/NR)^2) }}' > {output.params_file}
fi
"""

SnakeMake From line 77 of rules/ffpe.smk

shell: """
if [ ! -d "$(dirname {output.pass2_vcf})" ]; then 
    mkdir -p "$(dirname {output.pass2_vcf})"
fi

echo "Running SOBDetector..."
# Try/catch for running SOB Dectetor
# with an empty input VCF file
bcf_annotate_option="-e 'INFO/pArtifact < 0.05' "
java -jar {input.SOBDetector_jar} \\
    --input-type VCF \\
    --input-variants "{input.vcf}" \\
    --input-bam "{input.bam}" \\
    --output-variants "{output.pass2_vcf}" \\
    --only-passed true \\
    --standardization-parameters "{input.params_file}" || {{
# Compare length of VCF header to 
# the total length of the file
header_length=$(grep '^#' "{input.vcf}" | wc -l)
file_length=$(cat "{input.vcf}" | wc -l)
if [ $header_length -eq $file_length ]; then
    # VCF file only contains header
    # File contains no variants, catch
    # problem so pipeline can continue
    cat "{input.vcf}" > {output.pass2_vcf}
else
    # SOB Dectector failed for another reason
    echo "SOB Detector Failed... exiting now!" 1>&2
    exit 1
fi
}}

echo "Making info table..."
bcftools query \\
    -f '%INFO/numF1R2Alt\\t%INFO/numF2R1Alt\\t%INFO/numF1R2Ref\\t%INFO/numF2R1Ref\\t%INFO/numF1R2Other\\t%INFO/numF2R1Other\\t%INFO/SOB\\n' \\
    "{output.pass2_vcf}" \\
    | awk '{{if ($1 != "."){{tum_alt=$1+$2; tum_depth=$1+$2+$3+$4+$5+$6; if (tum_depth==0){{tum_af=1}} else {{tum_af=tum_alt/tum_depth }}; print tum_alt,tum_depth,tum_af,$7}}}}' > "{output.pass2_info}"

echo "Filtering out artifacts..."
if [ "{wildcards.vc_outdir}" == "{config[output_params][MERGED_SOMATIC_OUTDIR]}" ]; then
    echo "Adding 'set' annotation back from merged variants..."
    bgzip --threads {threads} -c "{input.vcf}" > "{input.vcf}.gz"
    bcftools index -f -t "{input.vcf}.gz"
    bgzip --threads {threads} -c "{output.pass2_vcf}" > "{output.pass2_vcf}.gz"
    bcftools index -f -t "{output.pass2_vcf}.gz"
    bcftools annotate \\
        -a "{input.vcf}.gz" \\
        -c "INFO/set" \\
        "$bcf_annotate_option" \\
        -Oz \\
        -o {output.filtered_vcf} {output.pass2_vcf}.gz
else
    bcftools filter \\
        "$bcf_annotate_option" \\
        -Oz \\
        -o {output.filtered_vcf} {output.pass2_vcf}
    bcftools index -f -t {output.filtered_vcf}
fi
"""

SnakeMake BCFtools From line 116 of rules/ffpe.smk

shell: """
echo -e "#ID\\tDefaultParam\\tCohortParam\\tTotalVariants" > {output.count_table}
echo -e "#SAMPLE_ID\\tParam\\tCHROM\\tPOS\\tnumF1R2Alt\\tnumF2R1Alt\\tnumF1R2Ref\\tnumF2R1Ref\\tnumF1R2Other\\tnumF2R1Other\\tSOB\\tpArtifact\\tFS\\tSOR\\tTLOD\\tReadPosRankSum" > {output.full_metric_table}

P1FILES=({input.pass1_vcf})
P2FILES=({input.pass2_vcf})
for (( i=0; i<${{#P1FILES[@]}}; i++ )); do
    MYID=$(basename -s ".sobdetect.vcf" ${{P1FILES[$i]}})
    echo "Collecting metrics from $MYID..."

    # grep may fail if input files do not contain any variants 
    total_count=$(grep -v ^# ${{P1FILES[$i]}} | wc -l) || total_count=0
    count_1p=$(bcftools query -f '%INFO/pArtifact\n' ${{P1FILES[$i]}} | awk '{{if ($1 != "." && $1 < 0.05){{print}}}}' | wc -l)
    count_2p=$(bcftools query -f '%INFO/pArtifact\n' ${{P2FILES[$i]}} | awk '{{if ($1 != "." && $1 < 0.05){{print}}}}' | wc -l)

    echo -e "$MYID\\t$count_1p\\t$count_2p\\t$total_count" >> {output.count_table}

    bcftools query -f '%CHROM\\t%POS\\t%INFO/numF1R2Alt\\t%INFO/numF2R1Alt\\t%INFO/numF1R2Ref\\t%INFO/numF2R1Ref\\t%INFO/numF1R2Other\\t%INFO/numF2R1Other\\t%INFO/SOB\\t%INFO/pArtifact\n' ${{P1FILES[$i]}} | awk -v id=$MYID 'BEGIN{{OFS="\t"}}{{print id,"PASS_1",$0}}' >> {output.full_metric_table}
    bcftools query -f '%CHROM\\t%POS\\t%INFO/numF1R2Alt\\t%INFO/numF2R1Alt\\t%INFO/numF1R2Ref\\t%INFO/numF2R1Ref\\t%INFO/numF1R2Other\\t%INFO/numF2R1Other\\t%INFO/SOB\\t%INFO/pArtifact\n' ${{P2FILES[$i]}} | awk -v id=$MYID 'BEGIN{{OFS="\t"}}{{print id,"PASS_2",$0}}' >> {output.full_metric_table}
done
"""

SnakeMake BCFtools From line 191 of rules/ffpe.smk

shell: """
echo "Converting to MAF..."
bash {params.vcf2maf_script} \\
    --vcf {input.filtered_vcf} \\
    --maf {output.maf} \\
    --tid {params.tumorsample} \\
    --genome {params.genome} \\
    --threads {threads} \\
    --vepresourcebundlepath {params.bundle} \\
    --info "set"
echo "Done converting to MAF..."
"""

SnakeMake From line 229 of rules/ffpe.smk

shell: """    
echo "Combining MAFs..."
head -2 {input.mafs[0]} > {output.maf}
awk 'FNR>2 {{print}}' {input.mafs} >> {output.maf}
"""

SnakeMake From line 252 of rules/ffpe.smk

shell:
    """
    myoutdir="$(dirname {output.gzvcf})"
    if [ ! -d "$myoutdir" ]; then mkdir -p "$myoutdir"; fi

    gatk --java-options '-Xmx24g' HaplotypeCaller \\
        --reference {params.genome} \\
        --input {input.bam} \\
        --use-jdk-inflater \\
        --use-jdk-deflater \\
        --emit-ref-confidence GVCF \\
        --annotation-group StandardAnnotation \\
        --annotation-group AS_StandardAnnotation \\
        --dbsnp {params.snpsites} \\
        --output {output.gzvcf} \\
        --intervals {params.chrom} \\
        --max-alternate-alleles 3
    """

SnakeMake gatk From line 27 of rules/germline.smk

shell:
    """
    input_str="--variant $(echo "{input.gzvcf}" | sed -e 's/ / --variant /g')"

    gatk --java-options '-Xmx24g' CombineGVCFs \\
        --reference {params.genome} \\
        --annotation-group StandardAnnotation \\
        --annotation-group AS_StandardAnnotation \\
        $input_str \\
        --output {output.gzvcf} \\
        --intervals {wildcards.chroms} \\
        --use-jdk-inflater \\
        --use-jdk-deflater
    """

SnakeMake gatk From line 68 of rules/germline.smk

shell:
    """
    myoutdir="$(dirname {output.vcf})"
    if [ ! -d "$myoutdir" ]; then mkdir -p "$myoutdir"; fi

    gatk --java-options '-Xmx96g' GenotypeGVCFs \\
        --reference {params.genome} \\
        --use-jdk-inflater \\
        --use-jdk-deflater \\
        --annotation-group StandardAnnotation \\
        --annotation-group AS_StandardAnnotation \\
        --dbsnp {params.snpsites} \\
        --output {output.vcf} \\
        --variant {input.gzvcf} \\
        --intervals {params.chr}
    """

SnakeMake gatk From line 106 of rules/germline.smk

shell:
    """
    # Avoids ARG_MAX issue which limits max length of a command
    ls --color=never -d $(dirname "{output.clist}")/raw_variants.*.vcf.gz > "{output.clist}"

    gatk MergeVcfs \\
        -R {params.genome} \\
        --INPUT {output.clist} \\
        --OUTPUT {output.vcf}
    """

SnakeMake gatk From line 142 of rules/germline.smk

shell:
    """
    gatk SelectVariants \\
        -R {params.genome} \\
        --intervals {params.targets} \\
        --variant {input.vcf} \\
        --sample-name {params.Sname} \\
        --exclude-filtered \\
        --exclude-non-variants \\
        --output {output.vcf}
    """

SnakeMake gatk From line 175 of rules/germline.smk

shell: """
if [ ! -d "$(dirname {output.txt})" ]; then 
    mkdir -p "$(dirname {output.txt})"
fi

python {params.get_flowcell_lanes} \\
    {input.r1} \\
    {wildcards.samples} > {output.txt}
"""

SnakeMake From line 25 of rules/qc.smk

shell: """
fastq_screen --conf {params.fastq_screen_config} \\
    --outdir {params.outdir} \\
    --threads {threads} \\
    --subset 1000000 \\
    --aligner bowtie2 \\
    --force \\
    {input.fq1} {input.fq2}
"""

SnakeMake Bowtie 2 From line 64 of rules/qc.smk

shell: """
# Setups temporary directory for
# intermediate files with built-in 
# mechanism for deletion on exit
tmp=$(mktemp -d -p "{params.localdisk}")
trap 'rm -rf "${{tmp}}"' EXIT

# Copy kraken2 db to local node storage to reduce filesystem strain
cp -rv {params.bacdb} ${{tmp}}
kdb_base=$(basename {params.bacdb})
kraken2 --db ${{tmp}}/${{kdb_base}} \\
    --threads {threads} --report {output.taxa} \\
    --output {output.out} \\
    --gzip-compressed \\
    --paired {input.fq1} {input.fq2}
# Generate Krona Report
cut -f2,3 {output.out} | \\
    ktImportTaxonomy - -o {output.html}
"""

SnakeMake kraken2 Krona From line 103 of rules/qc.smk

shell: """
fastqc -t {threads} \\
    -f bam \\
    -o {params.outdir} \\
    {input.bam} 
"""

SnakeMake FastQC From line 146 of rules/qc.smk

shell: """
python {params.script_path_reformat_bed} \\
    --input_bed {input.targets} \\
    --output_bed {output.bed}.temp
python3 {params.script_path_correct_target_bed} {output.bed}.temp {output.bed}
rm -f {output.bed}.temp
"""

SnakeMake From line 176 of rules/qc.smk

shell: """
unset DISPLAY
qualimap bamqc -bam {input.bam} \\
    --java-mem-size=48G \\
    -c -ip \\
    -gff {input.bed} \\
    -outdir {params.outdir} \\
    -outformat HTML \\
    -nt {threads} \\
    --skip-duplicated \\
    -nw 500 \\
    -p NON-STRAND-SPECIFIC
"""

SnakeMake QualiMap From line 209 of rules/qc.smk

shell: """
samtools flagstat {input.bam} > {output.txt}
"""

SnakeMake SAMtools From line 244 of rules/qc.smk

shell: """
vcftools --gzvcf {input.vcf} --het --out {params.prefix}
"""

SnakeMake VCFtools From line 271 of rules/qc.smk

shell: """
java -Xmx24g -jar ${{PICARDJARPATH}}/picard.jar \\
    CollectVariantCallingMetrics \\
    INPUT={input.vcf} \\
    OUTPUT={params.prefix} \\
    DBSNP={params.dbsnp} Validation_Stringency=SILENT
"""

SnakeMake Picard From line 298 of rules/qc.smk

shell: """
bcftools stats {input.vcf} > {output.txt}
"""

SnakeMake BCFtools From line 329 of rules/qc.smk

shell: """
gatk --java-options '-Xmx12g -XX:ParallelGCThreads={threads}' VariantEval \\
    -R {params.genome} \\
    -O {output.grp} \\
    --dbsnp {params.dbsnp} \\
    --eval {input.vcf} 
"""

SnakeMake gatk From line 360 of rules/qc.smk

shell: """
java -Xmx12g -jar $SNPEFF_JAR \\
    -v -canon -c {params.config} \\
    -csvstats {output.csv} \\
    -stats {output.html} \\
    {params.genome} \\
    {input.vcf} > {output.vcf}
"""

SnakeMake From line 393 of rules/qc.smk

shell: """ 
echo "Extracting sites to estimate ancestry"
somalier extract \\
    -d "$(dirname {output.somalierOut})" \\
    --sites {params.sites_vcf} \\
    -f {params.genomeFasta} \\
    {input.bam}
"""

SnakeMake From line 421 of rules/qc.smk

shell: """ 
echo "Estimating relatedness"
somalier relate \\
    -o "$(dirname {output.relatedness})/relatedness" \\
    {input.somalier}

echo "Estimating ancestry"
somalier ancestry \\
    -o "$(dirname {output.relatedness})/ancestry" \\
    --labels {params.ancestry_db}/ancestry-labels-1kg.tsv \\
    {params.ancestry_db}/*.somalier ++ \\
    {input.somalier}

Rscript {params.script_path_gender} \\
    {output.relatednessSamples} \\
    {output.finalFileGender}    

Rscript {params.script_path_samples} \\
    {output.relatedness} \\
    {output.finalFilePairs}

Rscript {params.script_path_pca} \\
    {output.ancestry} \\
    {output.finalFilePairs} \\
    {output.ancestoryPlot} \\
    {output.pairAncestoryHist}
"""

SnakeMake From line 460 of rules/qc.smk

shell: """
multiqc --ignore '*/.singularity/*' \\
    --ignore '*/*/*/*/*/*/*/*/pyflow.data/*' \\
    --ignore 'slurmfiles/' \\
    -f --interactive \\
    -n {output.report} \\
    {params.workdir}
"""

SnakeMake MultiQC From line 521 of rules/qc.smk

shell: """
if [ ! -d "$(dirname {output.split_bam})" ]; then
  mkdir -p "$(dirname {output.split_bam})"
fi

samtools view \\
    -b \\
    -o {output.split_bam} \\
    -@ {threads} \\
    {input.bam} {wildcards.chroms}

samtools index \\
    -@ {threads} \\
    {output.split_bam} {output.split_bam_idx}

cp {output.split_bam_idx} {output.split_bam}.bai
"""

SnakeMake SAMtools From line 17 of rules/somatic_snps.common.smk

shell: """
input_str="--input $(echo "{input.read_orientation_file}" | sed -e 's/ / --input /g')"

gatk LearnReadOrientationModel \\
    --output {output.model} \\
    $input_str
"""

SnakeMake gatk From line 50 of rules/somatic_snps.common.smk

shell: """
# Setups temporary directory for
# intermediate files with built-in 
# mechanism for deletion on exit
tmp=$(mktemp -d -p "{params.tmpdir}")
trap 'rm -rf "${{tmp}}"' EXIT

statfiles="--stats $(echo "{input.statsfiles}" | sed -e 's/ / --stats /g')"

gatk MergeMutectStats \\
    $statfiles \\
    -O {output.final}.stats

gatk FilterMutectCalls \\
    -R {params.genome} \\
    -V {input.vcf} \\
    --ob-priors {input.model} \\
    --contamination-table {input.summary} \\
    -O {output.marked_vcf} \\
    --stats {output.final}.stats

gatk SelectVariants \\
    -R {params.genome} \\
    --variant {output.marked_vcf} \\
    --exclude-filtered \\
    --output {output.final}

# VarScan can output ambiguous IUPAC bases/codes
# the awk one-liner resets them to N, from:
# https://github.com/fpbarthel/GLASS/issues/23
bcftools sort -T ${{tmp}} "{output.final}" \\
    | bcftools norm --threads {threads} --check-ref s -f {params.genome} -O v \\
    | awk '{{gsub(/\y[W|K|Y|R|S|M]\y/,"N",$4); OFS = "\t"; print}}' \\
    | sed '/^$/d' > {output.norm}
"""

SnakeMake gatk BCFtools From line 83 of rules/somatic_snps.common.smk

shell: """
input_str="-I $(echo "{input.vcf}" | sed -e 's/ / -I /g')"

gatk --java-options "-Xmx30g" MergeVcfs \\
    -O "{output.vcf}" \\
    -D {params.genomedict} \\
    $input_str
"""

SnakeMake gatk From line 135 of rules/somatic_snps.common.smk

shell: """
# Setups temporary directory for
# intermediate files with built-in 
# mechanism for deletion on exit
tmp=$(mktemp -d -p "{params.tmpdir}")
trap 'rm -rf "${{tmp}}"' EXIT

if [ ! -d "$(dirname {output.mergedvcf})" ]; then
  mkdir -p "$(dirname {output.mergedvcf})"
fi

input_str="--variant $(echo "{input.vcf}" | sed -e 's/ / --variant /g')"

java -Xmx60g -Djava.io.tmpdir=${{tmp}} -jar $GATK_JAR -T CombineVariants \\
    -R {params.genome} \\
    -nt {threads} \\
    --filteredrecordsmergetype KEEP_IF_ANY_UNFILTERED \\
    --genotypemergeoption PRIORITIZE \\
    --rod_priority_list {params.rodprioritylist} \\
    --minimumN 1 \\
    -o {output.mergedvcf} \\
    {params.variantsargs}
"""

SnakeMake From line 162 of rules/somatic_snps.common.smk

shell: """
echo "Converting to MAF..."
bash {params.vcf2maf_script} \\
    --vcf {input.filtered_vcf} \\
    --maf {output.maf} \\
    --tid {params.tumorsample} \\
    --genome {params.genome} \\
    --threads {threads} \\
    --vepresourcebundlepath {params.bundle} \\
    --info "set"
echo "Done converting to MAF..."
"""

SnakeMake From line 202 of rules/somatic_snps.common.smk

shell: """
echo "Combining MAFs..."
head -2 {input.mafs[0]} > {output.maf}
awk 'FNR>2 {{print}}' {input.mafs} >> {output.maf}
"""

SnakeMake From line 223 of rules/somatic_snps.common.smk

shell: """
if [ ! -d "$(dirname {output.vcf})" ]; 
    then mkdir -p "$(dirname {output.vcf})";
fi
gatk Mutect2 \\
    -R {params.genome} \\
    -I {input.tumor} \\
    -I {input.normal} \\
    -normal {params.normalsample} \\
    --panel-of-normals {params.pon} \\
    --germline-resource {params.germsource} \\
    -L {params.chrom} \\
    -O {output.vcf} \\
    --f1r2-tar-gz {output.read_orientation_file} \\
    --independent-mates
"""

SnakeMake gatk From line 24 of rules/somatic_snps.paired.smk

shell: """
# Run GetPileupSummaries in bg concurrently for a tumor/normal pair 
gatk --java-options '-Xmx48g' GetPileupSummaries \\
    -I {input.tumor} \\
    -V {params.germsource} \\
    -L {input.intervals} \\
    -O {output.tumor_summary} & \\
gatk --java-options '-Xmx48g' GetPileupSummaries \\
    -I {input.normal} \\
    -V {params.germsource} \\
    -L {input.intervals} \\
    -O {output.normal_summary} & \\
wait
"""

SnakeMake gatk From line 61 of rules/somatic_snps.paired.smk

shell: """
gatk CalculateContamination \\
    -I {input.tumor} \\
    --matched-normal {input.normal} \\
    -O {output.tumor_summary}
gatk CalculateContamination \\
    -I {input.normal} \\
    -O {output.normal_summary}
"""

SnakeMake gatk From line 94 of rules/somatic_snps.paired.smk

shell: """
# Setups temporary directory for
# intermediate files with built-in 
# mechanism for deletion on exit
tmp=$(mktemp -d -p "{params.tmpdir}")
trap 'rm -rf "${{tmp}}"' EXIT

workdir={params.basedir}
myoutdir="$(dirname {output.vcf})/{wildcards.samples}/{wildcards.chroms}"
if [ -d "$myoutdir" ]; then rm -r "$myoutdir"; fi
mkdir -p "$myoutdir"

configureStrelkaSomaticWorkflow.py \\
    --ref={params.genome} \\
    --tumor={input.tumor} \\
    --normal={input.normal} \\
    --runDir="$myoutdir" \\
    --exome
cd "$myoutdir"
./runWorkflow.py -m local -j {threads}

java -Xmx12g -Djava.io.tmpdir=${{tmp}} -XX:ParallelGCThreads={threads} \\
    -jar $GATK_JAR -T CombineVariants \\
    -R {params.genome} \\
    --variant results/variants/somatic.snvs.vcf.gz \\
    --variant results/variants/somatic.indels.vcf.gz \\
    --assumeIdenticalSamples \\
    --filteredrecordsmergetype KEEP_UNCONDITIONAL \\
    -o "$(basename {output.vcf})"

cd $workdir
mv "$myoutdir/$(basename {output.vcf})" "{output.vcf}"
"""

SnakeMake From line 131 of rules/somatic_snps.paired.smk

shell: """
# Setups temporary directory for
# intermediate files with built-in 
# mechanism for deletion on exit
tmp=$(mktemp -d -p "{params.tmpdir}")
trap 'rm -rf "${{tmp}}"' EXIT

gatk SelectVariants \\
    -R {params.genome} \\
    --variant {input.vcf} \\
    --discordance {params.pon} \\
    --exclude-filtered \\
    --output {output.filtered}

echo -e "TUMOR\t{params.tumorsample}\nNORMAL\t{params.normalsample}" > "{output.samplesfile}"

echo "Reheading VCFs with sample names..."
bcftools reheader \\
    -o "{output.final}" \\
    -s "{output.samplesfile}" "{output.filtered}"

# VarScan can output ambiguous IUPAC bases/codes
# the awk one-liner resets them to N, from:
# https://github.com/fpbarthel/GLASS/issues/23
bcftools sort \\
    -T ${{tmp}} "{output.final}" \\
    | bcftools norm --threads {threads} --check-ref s -f {params.genome} -O v \\
    | awk '{{gsub(/\y[W|K|Y|R|S|M]\y/,"N",$4); OFS = "\t"; print}}' \\
    | sed '/^$/d' > {output.norm}
"""

SnakeMake gatk BCFtools From line 189 of rules/somatic_snps.paired.smk

shell: """
# Setups temporary directory for
# intermediate files with built-in 
# mechanism for deletion on exit
tmp=$(mktemp -d -p "{params.tmpdir}")
trap 'rm -rf "${{tmp}}"' EXIT

if [ ! -d "$(dirname {output.vcf})" ]; then mkdir -p "$(dirname {output.vcf})"; fi

java -Xmx8g -Djava.io.tmpdir=${{tmp}} -jar ${{MUTECT_JAR}} \\
    --analysis_type MuTect \\
    --reference_sequence {params.genome} \\
    --normal_panel {params.pon} \\
    --vcf {output.vcf} \\
    --cosmic {params.cosmic} \\
    --dbsnp {params.dbsnp} \\
    --disable_auto_index_creation_and_locking_when_reading_rods \\
    --input_file:normal {input.normal} \\
    --input_file:tumor {input.tumor} \\
    --out {output.stats} \\
    -rf BadCigar
"""

SnakeMake MuTect From line 242 of rules/somatic_snps.paired.smk

shell: """
# Setups temporary directory for
# intermediate files with built-in 
# mechanism for deletion on exit
tmp=$(mktemp -d -p "{params.tmpdir}")
trap 'rm -rf "${{tmp}}"' EXIT

gatk SelectVariants \\
    -R {params.genome} \\
    --variant {input.vcf} \\
    --exclude-filtered \\
    --output {output.final}

# VarScan can output ambiguous IUPAC bases/codes
# the awk one-liner resets them to N, from:
# https://github.com/fpbarthel/GLASS/issues/23
bcftools sort -T ${{tmp}} "{output.final}" \\
    | bcftools norm --threads {threads} --check-ref s -f {params.genome} -O v \\
    | awk '{{gsub(/\y[W|K|Y|R|S|M]\y/,"N",$4); OFS = "\t"; print}}' \\
    | sed '/^$/d' > {output.norm}
"""

SnakeMake gatk BCFtools From line 287 of rules/somatic_snps.paired.smk

shell: """
if [ ! -d "$(dirname {output.vcf})" ]; then mkdir -p "$(dirname {output.vcf})"; fi
VarDict \\
    -G {params.genome} \\
    -f 0.05 \\
    -N \"{params.tumorsample}|{params.normalsample}\" \\
    --nosv \\
    -b \"{input.tumor}|{input.normal}\" \\
    -t \\
    -Q 20 \\
    -c 1 \\
    -S 2 \\
    -E 3 {params.targets} \\
    | testsomatic.R \\
    | var2vcf_paired.pl \\
        -S \\
        -Q 20 \\
        -d 10 \\
        -M \\
        -N \"{params.tumorsample}|{params.normalsample}\" \\
        -f 0.05 > {output.vcf}
"""

SnakeMake From line 328 of rules/somatic_snps.paired.smk

shell: """
# Setups temporary directory for
# intermediate files with built-in 
# mechanism for deletion on exit
tmp=$(mktemp -d -p "{params.tmpdir}")
trap 'rm -rf "${{tmp}}"' EXIT

bcftools filter \\
    --exclude \'STATUS=\"Germline\" | STATUS=\"LikelyLOH\" | STATUS=\"AFDiff\"\' \\
    {input.vcf} > {output.filtered}

gatk SelectVariants \\
    -R {params.genome} \\
    --variant {output.filtered} \\
    --discordance {params.pon} \\
    --exclude-filtered \\
    --output {output.final}

# VarScan can output ambiguous IUPAC bases/codes
# the awk one-liner resets them to N, from:
# https://github.com/fpbarthel/GLASS/issues/23
bcftools sort -T ${{tmp}} "{output.final}" \\
    | bcftools norm --threads {threads} --check-ref s -f {params.genome} -O v \\
    | awk '{{gsub(/\y[W|K|Y|R|S|M]\y/,"N",$4); OFS = "\t"; print}}' \\
    | sed '/^$/d' > {output.norm}
"""

SnakeMake gatk BCFtools From line 375 of rules/somatic_snps.paired.smk

shell: """
# Setups temporary directory for
# intermediate files with built-in 
# mechanism for deletion on exit
tmp=$(mktemp -d -p "{params.tmpdir}")
trap 'rm -rf "${{tmp}}"' EXIT

if [ ! -d "$(dirname {output.vcf})" ]; then mkdir -p "$(dirname {output.vcf})"; fi

tumor_purity=$( echo "1-$(printf '%.6f' $(tail -n -1 {input.tumor_summary} | cut -f2 ))" | bc -l)
normal_purity=$( echo "1-$(printf '%.6f' $(tail -n -1 {input.normal_summary} | cut -f2 ))" | bc -l)
varscan_opts="--strand-filter 1 --min-var-freq 0.01 --min-avg-qual 30 --somatic-p-value 0.05 --output-vcf 1 --normal-purity $normal_purity --tumor-purity $tumor_purity"
dual_pileup="samtools mpileup -d 10000 -q 15 -Q 15 -f {params.genome} {input.normal} {input.tumor}"
varscan_cmd="varscan somatic <($dual_pileup) {output.vcf} $varscan_opts --mpileup 1"    
eval "$varscan_cmd"

java -Xmx12g -Djava.io.tmpdir=${{tmp}} -XX:ParallelGCThreads={threads} \\
    -jar $GATK_JAR -T CombineVariants \\
    -R {params.genome} \\
    --variant {output.vcf}.snp \\
    --variant {output.vcf}.indel \\
    --assumeIdenticalSamples \\
    --filteredrecordsmergetype KEEP_UNCONDITIONAL \\
    -o {output.vcf}     
"""

SnakeMake SAMtools VarScan From line 426 of rules/somatic_snps.paired.smk

shell: """
# Setups temporary directory for
# intermediate files with built-in 
# mechanism for deletion on exit
tmp=$(mktemp -d -p "{params.tmpdir}")
trap 'rm -rf "${{tmp}}"' EXIT

varscan filter \\
    {input.vcf} \\
    {params.filter_settings} > {output.filtered1}

gatk SelectVariants \\
    -R {params.genome} \\
    --variant {output.filtered1} \\
    --discordance {params.pon} \\
    --exclude-filtered \\
    --output {output.filtered}

samplesFile="{output.samplesfile}"
echo -e "TUMOR\t{params.tumorsample}\nNORMAL\t{params.normalsample}" > "{output.samplesfile}"

bcftools reheader \\
    -o "{output.final}" \\
    -s "{output.samplesfile}" \\
    "{output.filtered}"

# VarScan can output ambiguous IUPAC bases/codes
# the awk one-liner resets them to N, from:
# https://github.com/fpbarthel/GLASS/issues/23
bcftools sort -T ${{tmp}} "{output.final}" \\
    | bcftools norm --threads {threads} --check-ref s -f {params.genome} -O v \\
    | awk '{{gsub(/\y[W|K|Y|R|S|M]\y/,"N",$4); OFS = "\t"; print}}' \\
    | sed '/^$/d' > {output.norm}
"""

SnakeMake gatk BCFtools VarScan From line 481 of rules/somatic_snps.paired.smk

shell: """
if [ ! -d "$(dirname {output.vcf})" ]; then
    mkdir -p "$(dirname {output.vcf})"
fi

gatk Mutect2 \\
    -R {params.genome} \\
    -I {input.tumor} \\
    --panel-of-normals {params.pon} \\
    --germline-resource {params.germsource} \\
    -L {wildcards.chroms} \\
    -O {output.vcf} \\
    --f1r2-tar-gz {output.read_orientation_file} \\
    --independent-mates
"""

SnakeMake gatk From line 21 of rules/somatic_snps.tumor_only.smk

shell: """
# Setups temporary directory for
# intermediate files with built-in 
# mechanism for deletion on exit
tmp=$(mktemp -d -p "{params.tmpdir}")
trap 'rm -rf "${{tmp}}"' EXIT

gatk --java-options "-Xmx10g -Djava.io.tmpdir=${{tmp}}" GetPileupSummaries \\
    -R {params.genome} \\
    -I {input.tumor} \\
    -V {params.germsource} \\
    -L {input.intervals} \\
    -O {output.pileup}
"""

SnakeMake gatk From line 56 of rules/somatic_snps.tumor_only.smk

shell: """
gatk CalculateContamination \\
    -I {input.pileup} \\
    -O {output.tumor_summary}
"""

SnakeMake gatk From line 87 of rules/somatic_snps.tumor_only.smk

shell: """
# Setups temporary directory for
# intermediate files with built-in 
# mechanism for deletion on exit
tmp=$(mktemp -d -p "{params.tmpdir}")
trap 'rm -rf "${{tmp}}"' EXIT

if [ ! -d "$(dirname {output.vcf})" ]; then 
    mkdir -p "$(dirname {output.vcf})"
fi

java -Xmx8g -Djava.io.tmpdir=${{tmp}} -jar ${{MUTECT_JAR}} \\
    --analysis_type MuTect \\
    --reference_sequence {params.genome} \\
    --normal_panel {params.pon} \\
    --vcf {output.vcf} \\
    --cosmic {params.cosmic} \\
    --dbsnp {params.dbsnp} \\
    -L {wildcards.chroms} \\
    --disable_auto_index_creation_and_locking_when_reading_rods \\
    --input_file:tumor {input.tumor} \\
    --out {output.stats} \\
    -rf BadCigar
"""

SnakeMake MuTect From line 112 of rules/somatic_snps.tumor_only.smk

shell: """
# Setups temporary directory for
# intermediate files with built-in 
# mechanism for deletion on exit
tmp=$(mktemp -d -p "{params.tmpdir}")
trap 'rm -rf "${{tmp}}"' EXIT

gatk SelectVariants \\
    -R {params.genome} \\
    --variant {input.vcf} \\
    --exclude-filtered \\
    --output {output.final}

# VarScan can output ambiguous IUPAC bases/codes
# the awk one-liner resets them to N, from:
# https://github.com/fpbarthel/GLASS/issues/23
bcftools sort -T ${{tmp}} "{output.final}" \\
    | bcftools norm --threads {threads} --check-ref s -f {params.genome} -O v \\
    | awk '{{gsub(/\y[W|K|Y|R|S|M]\y/,"N",$4); OFS = "\t"; print}}' \\
    | sed '/^$/d' > {output.norm}
"""

SnakeMake gatk BCFtools From line 157 of rules/somatic_snps.tumor_only.smk

shell: """
if [ ! -d "$(dirname {output.vcf})" ]; then 
    mkdir -p "$(dirname {output.vcf})"
fi

VarDict \\
    -G {params.genome} \\
    -f 0.05 \\
    -x 500 \\
    --nosv \\
    -b {input.tumor} \\
    -t \\
    -Q 20 \\
    -c 1 \\
    -S 2 \\
    -E 3 {params.targets} \\
    | teststrandbias.R \\
    | var2vcf_valid.pl \\
        -N {wildcards.samples} \\
        -Q 20 \\
        -d 10 \\
        -v 6 \\
        -S \\
        -E \\
        -f 0.05 > {output.vcf}
"""

SnakeMake From line 196 of rules/somatic_snps.tumor_only.smk

shell: """
# Setups temporary directory for
# intermediate files with built-in 
# mechanism for deletion on exit
tmp=$(mktemp -d -p "{params.tmpdir}")
trap 'rm -rf "${{tmp}}"' EXIT

gatk SelectVariants \\
    -R {params.genome} \\
    --variant {input.vcf} \\
    --discordance {params.pon} \\
    --exclude-filtered \\
    --output {output.final}

# VarScan can output ambiguous IUPAC bases/codes
# the awk one-liner resets them to N, from:
# https://github.com/fpbarthel/GLASS/issues/23
bcftools sort -T ${{tmp}} "{output.final}" \\
    | bcftools norm --threads {threads} --check-ref s -f {params.genome} -O v \\
    | awk '{{gsub(/\y[W|K|Y|R|S|M]\y/,"N",$4); OFS = "\t"; print}}' \\
    | sed '/^$/d' > {output.norm}
"""

SnakeMake gatk BCFtools From line 245 of rules/somatic_snps.tumor_only.smk

shell: """
if [ ! -d "$(dirname {output.vcf})" ]; then
    mkdir -p "$(dirname {output.vcf})"
fi

varscan_opts="--strand-filter 0 --min-var-freq 0.01 --output-vcf 1 --variants 1"
pileup_cmd="samtools mpileup -d 100000 -q 15 -Q 15 -f {params.genome} {input.tumor}"
varscan_cmd="varscan mpileup2cns <($pileup_cmd) $varscan_opts"
eval "$varscan_cmd > {output.vcf}.gz"
eval "bcftools view -U {output.vcf}.gz > {output.vcf}"
"""

SnakeMake SAMtools BCFtools VarScan From line 285 of rules/somatic_snps.tumor_only.smk

shell: """
# Setups temporary directory for
# intermediate files with built-in 
# mechanism for deletion on exit
tmp=$(mktemp -d -p "{params.tmpdir}")
trap 'rm -rf "${{tmp}}"' EXIT

varscan filter \\
    {input.vcf} \\
    {params.filter_settings} > {output.filtered1}

gatk SelectVariants \\
    -R {params.genome} \\
    --variant {output.filtered1} \\
    --discordance {params.pon} \\
    --exclude-filtered \\
    --output {output.filtered}

samplesFile="{output.samplesfile}"
echo -e "TUMOR\t{params.tumorsample}\n" > "{output.samplesfile}"

bcftools reheader \\
    -o "{output.final}" \\
    -s "{output.samplesfile}" \\
    "{output.filtered}"

# VarScan can output ambiguous IUPAC bases/codes
# the awk one-liner resets them to N, from:
# https://github.com/fpbarthel/GLASS/issues/23
bcftools sort -T ${{tmp}} "{output.final}" \\
    | bcftools norm --threads {threads} --check-ref s -f {params.genome} -O v \\
    | awk '{{gsub(/\y[W|K|Y|R|S|M]\y/,"N",$4); OFS = "\t"; print}}' \\
    | sed '/^$/d' > {output.norm}
"""

SnakeMake gatk BCFtools VarScan From line 325 of rules/somatic_snps.tumor_only.smk

shell: """
# Setups temporary directory for
# intermediate files with built-in 
# mechanism for deletion on exit
tmp=$(mktemp -d -p "{params.tmpdir}")
trap 'rm -rf "${{tmp}}"' EXIT

mkdir -p fastqs
gatk SamToFastq \\
    --INPUT {input.bam} \\
    --FASTQ {output.r1} \\
    --SECOND_END_FASTQ {output.r2} \\
    --UNPAIRED_FASTQ {output.orphans} \\
    --TMP_DIR ${{tmp}} \\
    -R {params.genome}
"""

SnakeMake gatk From line 30 of rules/trim_map_preprocess.smk

shell: """
myoutdir="$(dirname {output.one})"
if [ ! -d "$myoutdir" ]; then mkdir -p "$myoutdir"; fi
trimmomatic PE \\
    -threads {threads} \\
    -phred33 \\
    {input.r1} {input.r2} \\
    {output.one} {output.two} \\
    {output.three} {output.four} \\
    ILLUMINACLIP:{params.adapterfile}:3:30:10 \\
    LEADING:10 \\
    TRAILING:10 \\
    SLIDINGWINDOW:4:20 \\
    MINLEN:20      
"""

SnakeMake Trimmomatic From line 75 of rules/trim_map_preprocess.smk

shell: """
myoutdir="$(dirname {output})"
if [ ! -d "$myoutdir" ]; then mkdir -p "$myoutdir"; fi
bwa mem -M \\
    -R \'@RG\\tID:{params.sample}\\tSM:{params.sample}\\tPL:illumina\\tLB:{params.sample}\\tPU:{params.sample}\\tCN:hgsc\\tDS:wes\' \\
    -t {threads} \\
    {params.genome} \\
    {input} | \\
samblaster -M | \\
samtools sort -@12 -m 4G - -o {output}
"""

SnakeMake SAMtools BWA SAMBLASTER From line 118 of rules/trim_map_preprocess.smk

shell: """
samtools index -@ 2 {input.bam} {output.bai}
"""

SnakeMake SAMtools From line 153 of rules/trim_map_preprocess.smk

shell: """
gatk --java-options '-Xmx48g' BaseRecalibrator \\
    --input {input.bam} \\
    --reference {params.genome} \\
    {params.knowns} \\
    --output {output.re} \\
    --intervals {params.intervals}

gatk --java-options '-Xmx48g' ApplyBQSR \\
    --reference {params.genome} \\
    --input {input.bam} \\
    --bqsr-recal-file {output.re} \\
    --output {output.bam} \\
    --use-jdk-inflater \\
    --use-jdk-deflater
"""

SnakeMake gatk From line 189 of rules/trim_map_preprocess.smk

shell: """
sample={wildcards.samples}
ID=$sample
PL="ILLUMINA"  # exposed as a config param
LB="na"        # exposed as a config param 

# Check if there is no header or any of the info
HEADER=`samtools view -H {input.bam} | grep ^@RG`
if [[ "$HEADER" != "" ]]; then
    t=(${{HEADER//\t/ }})
    echo ${{t[1]}}
    ID=`printf '%s\n' "${{t[@]}}" | grep -P '^ID' | cut -d":" -f2` #(${{t[1]//:/ }})
    PL=`printf '%s\n' "${{t[@]}}" | grep -P '^PL' | cut -d":" -f2` #(${{t[3]//:/ }})
    LB=`printf '%s\n' "${{t[@]}}" | grep -P '^LB' | cut -d":" -f2` #(${{t[2]//:/ }})
    if [[ "$ID" == "$sample" ]]; then
        echo "The header of the BAM file is correct"
    else
        ID=$sample
    fi
fi

gatk AddOrReplaceReadGroups \\
    --INPUT {input.bam} \\
    --OUTPUT {output.bam} \\
    --RGID ${{ID}} \\
    --RGLB ${{LB}} \\
    --RGPL ${{PL}} \\
    --RGSM ${{ID}} \\
    --RGPU na

samtools index -@ 2 {output.bam} {output.bai}
cp {output.bai} {output.bai2}
"""