Help improve this workflow!
This workflow has been published but could be further improved with some additional meta data:- Keyword(s) in categories input, output, operation, topic
You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .
genome-seek 🔬
Whole Genome Clinical Sequencing Pipeline.
Overview
Welcome to genome-seek! Before getting started, we highly recommend reading through genome-seek's documentation .
The
./genome-seek
pipeline is composed several inter-related sub commands to setup and run the pipeline across different systems. Each of the available sub commands perform different functions:
-
genome-seek run
: Run the genome-seek pipeline with your input files. -
genome-seek unlock
: Unlocks a previous runs output directory. -
genome-seek cache
: Cache software containers locally.
genome-seek is a comprehensive clinical WGS pipeline that is focused on speed. Each tool in the pipeline was benchmarked and selected due to its low run times without sacrificing accuracy or precision. It relies on technologies like Singularity1 to maintain the highest-level of reproducibility. The pipeline consists of a series of data processing and quality-control steps orchestrated by Snakemake2 , a flexible and scalable workflow management system, to submit jobs to a cluster.
The pipeline is compatible with data generated from Illumina short-read sequencing technologies. As input, it accepts a set of FastQ files and can be run locally on a compute instance, on-premise using a cluster, or on the cloud (feature coming soon!). A user can define the method or mode of execution. The pipeline can submit jobs to a cluster using a job scheduler like SLURM, or run on AWS using Tibanna (feature coming soon!). A hybrid approach ensures the pipeline is accessible to all users.
Before getting started, we highly recommend reading through the usage section of each available sub command.
For more information about issues or trouble-shooting a problem, please checkout our FAQ prior to opening an issue on Github .
Dependencies
Requires:
singularity>=3.5
snakemake>=7.8
At the current moment, the pipeline uses a mixture of enviroment modules and docker images; however, this will be changing soon! In the very near future, the pipeline will only use docker images. With that being said, snakemake and singularity must be installed on the target system. Snakemake orchestrates the execution of each step in the pipeline. To guarantee the highest level of reproducibility, each step of the pipeline will rely on versioned images from DockerHub . Snakemake uses singularity to pull these images onto the local filesystem prior to job execution, and as so, snakemake and singularity will be the only two dependencies in the future.
Installation
Please clone this repository to your local filesystem using the following command:
# Clone Repository from Github
git clone https://github.com/OpenOmics/genome-seek.git
# Change your working directory
cd genome-seek/
# Add dependencies to $PATH
# Biowulf users should run
module load snakemake singularity
# Get usage information
./genome-seek -h
Contribute
This site is a living document, created for and by members like you. genome-seek is maintained by the members of NCBR and is improved by continous feedback! We encourage you to contribute new content and make improvements to existing content via pull request to our GitHub repository .
References
1.
Kurtzer GM, Sochat V, Bauer MW (2017). Singularity: Scientific containers for mobility of compute. PLoS ONE 12(5): e0177459.
2.
Koster, J. and S. Rahmann (2018). "Snakemake-a scalable bioinformatics workflow engine." Bioinformatics 34(20): 3600.
Code Snippets
64 65 66 67 68 69 70 71 72 73 74 75 76 | shell: """ vcftools \\ --gzvcf {input.vcf} \\ --plink \\ --out {params.intermediate} \\ --chr {params.peddy_chr} cut -f1-6 {params.intermediate}.ped \\ > {output.ped} peddy -p {threads} \\ --prefix {params.prefix} \\ {input.vcf} \\ {output.ped} """ |
126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 | shell: """ # Get biological sex # predicted by peddy predicted_sex=$(awk -F ',' \\ '$8=="{params.sample}" \\ {{print $7}}' \\ {input.csv} ) # Copy over base ploidy # vcf for predicted sex # and add name to header if [ "$predicted_sex" == "male" ]; then cp {params.male_ploidy} {output.ploidy} else # prediction is female cp {params.female_ploidy} {output.ploidy} fi sed -i 's/SAMPLENAME/{params.sample}/g' \\ {output.ploidy} # Delete Canvas checkpoints if [ -d {params.checkpoints} ]; then # Forces Canvas to start # over from the beginning rm -rf '{params.checkpoints}' fi # CANVAS in Germline WGS mode export COMPlus_gcAllowVeryLargeObjects=1 Canvas.dll Germline-WGS \\ -b {input.bam} \\ -n {params.sample} \\ -o {params.outdir} \\ -r {params.canvas_kmer} \\ --ploidy-vcf={output.ploidy} \\ -g {params.canvas_genome} \\ -f {params.canvas_filter} \\ --sample-b-allele-vcf={input.vcf} # Filter predicted CNVs bcftools filter \\ --include 'FILTER="PASS" && INFO/SVTYPE="CNV"' \\ {output.vcf} \\ > {output.filtered} # Rank and annotate CNVs AnnotSV \\ -annotationsDir {params.annotsv_annot} \\ -genomeBuild {params.annotsv_build} \\ -outputDir {params.outdir} \\ -outputFile {output.annotated} \\ -SVinputFile {output.filtered} \\ -snvIndelFiles {input.joint} \\ -snvIndelSamples {params.sample} # Check if AnnotSV silently failed if [ ! -f "{output.annotated}" ]; then # AnnotSV failed to process # provided SVinputFile file, # usually due to passing an # empty filtered SV file echo "WARNING: AnnotSV silently failed..." 1>&2 touch {output.annotated} fi """ |
232 233 234 235 236 237 238 239 240 | shell: """ java -Xmx{params.memory}g -cp {params.amber_jar} \\ com.hartwig.hmftools.amber.AmberApplication \\ -tumor {params.tumor} {params.normal_name} \\ -tumor_bam {input.tumor} {params.normal_bam} \\ -output_dir {params.outdir} \\ -threads {threads} {params.tumor_flag} \\ -loci {params.loci_ref} """ |
282 283 284 285 286 287 288 289 290 | shell: """ java -Xmx{params.memory}g -cp {params.cobalt_jar} \\ com.hartwig.hmftools.cobalt.CountBamLinesApplication \\ -tumor {params.tumor} {params.normal_name} \\ -tumor_bam {input.tumor} {params.normal_bam} \\ -output_dir {params.outdir} \\ -threads {threads} {params.tumor_flag} \\ -gc_profile {params.gc_profile} """ |
344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 | shell: """ # Set output directories # for Amber and Cobalt amber_outdir="$(dirname "{input.amber}")" cobalt_outdir="$(dirname "{input.cobalt}")" echo "Amber output directory: $amber_outdir" echo "Cobalt output directory: $cobalt_outdir" # Run Purple to find CNVs, # purity and ploidy, and # cancer driver events java -Xmx{params.memory}g -jar {params.purple_jar} \\ -tumor {params.tumor} {params.normal_name} \\ -output_dir {params.outdir} \\ -amber "$amber_outdir" \\ -cobalt "$cobalt_outdir" \\ -circos circos \\ -gc_profile {params.gc_profile} \\ -ref_genome {params.genome} \\ -ref_genome_version {params.ref_ver} \\ -run_drivers \\ -driver_gene_panel {params.panel} \\ -somatic_hotspots {params.somatic_hotspot} \\ -germline_hotspots {params.germline_hotspot} \\ -threads {threads} {params.tumor_flag} \\ -somatic_vcf {input.vcf} {params.sv_option} """ |
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 | shell: """ java -Xmx{params.memory}g -jar ${{GATK_JAR}} -T RealignerTargetCreator \\ --use_jdk_inflater \\ --use_jdk_deflater \\ -I {input.bam} \\ -R {params.genome} \\ -o {output.intervals} \\ {params.knowns} java -Xmx{params.memory}g -jar ${{GATK_JAR}} -T IndelRealigner \\ -R {params.genome} \\ -I {input.bam} \\ {params.knowns} \\ --use_jdk_inflater \\ --use_jdk_deflater \\ -targetIntervals {output.intervals} \\ -o {output.bam} """ |
76 77 78 79 80 81 82 83 | shell: """ gatk --java-options '-Xmx{params.memory}g' BaseRecalibrator \\ --input {input.bam} \\ --reference {params.genome} \\ {params.knowns} \\ --output {output.recal} \\ {params.intervals} """ |
111 112 113 114 115 116 117 118 119 120 | shell: """ # Create GatherBQSR list find {params.bams} -iname '{params.sample}_recal*_data.grp' \\ > {output.lsl} # Gather per sample BQSR results gatk --java-options '-Xmx{params.memory}g' GatherBQSRReports \\ --use-jdk-inflater --use-jdk-deflater \\ -I {output.lsl} \\ -O {output.recal} """ |
149 150 151 152 153 154 155 156 157 158 | shell: """ gatk --java-options '-Xmx{params.memory}g' ApplyBQSR \\ --use-jdk-inflater --use-jdk-deflater \\ --reference {params.genome} \\ --bqsr-recal-file {input.recal} \\ --input {input.bam} \\ --output {output.bam} samtools index -@ {threads} {output.bam} {output.bai} """ |
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | shell: """ # Setups temporary directory for # intermediate files with built-in # mechanism for deletion on exit if [ ! -d "{params.tmpdir}" ]; then mkdir -p "{params.tmpdir}"; fi tmp=$(mktemp -d -p "{params.tmpdir}") trap 'rm -rf "${{tmp}}"' EXIT run_deepvariant \\ --model_type=WGS \\ --ref={params.genome} \\ --reads={input.bam} \\ --output_gvcf={output.gvcf} \\ --output_vcf={output.vcf} \\ --num_shards={threads} \\ --intermediate_results_dir=${{tmp}} """ |
79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 | shell: """ # Setups temporary directory for # intermediate files with built-in # mechanism for deletion on exit, # GLnexus tmpdir should NOT exist # prior to running it. If it does # exist prior to runnning, it will # immediately error out. if [ ! -d "{params.tmpdir}" ]; then mkdir -p "{params.tmpdir}"; fi tmp_parent=$(mktemp -d -p "{params.tmpdir}") tmp_dne=$(echo "${{tmp_parent}}"| sed 's@$@/GLnexus.DB@') trap 'rm -rf "${{tmp_parent}}"' EXIT # Avoids ARG_MAX issue which will # limit max length of a command find {params.gvcfdir} -iname '*.g.vcf.gz' \\ > {output.gvcfs} glnexus_cli \\ --dir ${{tmp_dne}} \\ --config DeepVariant_unfiltered \\ --list {output.gvcfs} \\ --threads {threads} \\ --mem-gbytes {params.memory} \\ > {output.bcf} bcftools norm \\ -m - \\ -Oz \\ --threads {threads} \\ -f {params.genome} \\ -o {output.norm} \\ {output.bcf} bcftools view \\ -Oz \\ --threads {threads} \\ -o {output.jvcf} \\ {output.bcf} bcftools index \\ -f -t \\ --threads {threads} \\ {output.norm} bcftools index \\ -f -t \\ --threads {threads} \\ {output.jvcf} """ |
152 153 154 155 156 157 158 159 | shell: """ gatk --java-options '-Xmx{params.memory}g -XX:ParallelGCThreads={threads}' SelectVariants \\ -R {params.genome} \\ --variant {input.vcf} \\ --sample-name {params.sample} \\ --exclude-non-variants \\ --output {output.vcf} """ |
49 50 51 52 53 54 55 56 57 | shell: """ singularity exec -B {params.bind} {params.sif} \\ HLA-LA.pl \\ --BAM {input.bam} \\ --graph PRG_MHC_GRCh38_withIMGT \\ --sampleID sample \\ --maxThreads {threads} \\ --workingDir {params.outdir} """ |
32 33 34 35 36 37 38 39 40 41 42 | shell: """ gatk --java-options '-Xmx{params.memory}g -XX:ParallelGCThreads={threads}' SelectVariants \\ -R {params.genome} \\ --variant {input.vcf} \\ -L {params.chrom} \\ --exclude-non-variants \\ --output {output.vcf} tabix --force \\ -p vcf {output.vcf} """ |
70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 | shell: """ # Environment variable for modules dir export OC_MODULES="{params.module}" oc run \\ -t vcf \\ -x \\ --newlog \\ --cleanrun \\ --system-option "modules_dir={params.module}" \\ -a {params.annot} \\ -n {params.prefix} \\ -l {params.genome} \\ -d {params.outdir} \\ --mp {threads} \\ {input.vcf} """ |
157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 | shell: """ # Create first filtering script, # Filters based on AF mkdir -p {params.scripts} cp {input.db} {output.filter_1} cat << EOF > {params.filter_1} import sqlite3 import os maf = str({params.maf_thres}) mafc = str(1 - float({params.maf_thres})) so = "{params.so}" conn = sqlite3.connect("{output.filter_1}") conn.isolation_level = None cursor = conn.cursor() conn.execute('CREATE TABLE variant2 AS SELECT * FROM variant WHERE (base__so IN (' + '"' + '", "'.join(so.split(',')) + '"' + ')) AND ((gnomad__af IS NULL AND gnomad3__af IS NULL AND thousandgenomes__af IS NULL) OR (gnomad__af <= ' + maf +' OR gnomad__af >= '+mafc+' OR gnomad3__af <= '+maf+' OR gnomad3__af >= '+mafc+' OR thousandgenomes__af <= '+maf+' OR thousandgenomes__af >= '+mafc+'))') conn.execute('DROP TABLE variant') conn.execute('ALTER TABLE variant2 RENAME TO variant') conn.execute('DELETE from sample WHERE base__uid NOT IN (SELECT base__uid FROM variant)') conn.execute('DELETE FROM gene WHERE base__hugo NOT IN (SELECT base__hugo FROM variant)') conn.execute('DELETE FROM mapping WHERE base__uid NOT IN (SELECT base__uid FROM variant)') conn.execute('UPDATE info SET colval = (SELECT COUNT(*) FROM variant) WHERE colkey == "Number of unique input variants"') conn.execute('VACUUM') conn.close() EOF # Create second filtering script, # Filters based on filters in config cp {output.filter_1} {output.filter_2} cat << EOF > {params.filter_2} import pandas as pd import sqlite3 import os conn = sqlite3.connect("{output.filter_2}") conn.isolation_level = None cursor = conn.cursor() filter = {params.secondary} def keep(dd, used_annotators): final_annotators = {{annotator: dd[annotator] for annotator in used_annotators}} return final_annotators def filtercol(dd, annot): if dd['relation']=='IN': return annot + '__' + dd['col'] + ' ' + dd['relation'] + ' ("' + '", "'.join(dd['value'].split(',')) + '")' elif dd['relation']=='IS' or dd['relation']=='IS NOT': return annot + '__' + dd['col'] + ' ' + dd['relation'] + ' ' + dd['value'] else: return annot + '__'+ dd['col'] + ' ' + dd['relation'] + dd['value'] def filterunit(annot): dd = filter[annot] if dd['col'].lower()=='multiple': cols = dd['cols'].split(',') relations = dd['relation'].split(',') values = dd['value'].split(',') return(' OR '.join([filtercol({{'col':cols[0], 'relation': relations[0], 'value': values[0]}}, annot) for i in range(len(cols))])) else: return(filtercol(dd, annot)) def filterunit_null(annot): dd = filter[annot] if dd['col'].lower()=='multiple': cols = dd['cols'].split(',') relations = dd['relation'].split(',') values = dd['value'].split(',') return(' AND '.join([annot + '__' + cols[i] + ' IS NULL' for i in range(len(cols))])) else: return(annot + '__' + dd['col'] + ' IS NULL') # Find the intersect of annotators listed # in colnames of the SQLite variants table # and annotators in filter config. Must run # this step, if a module in OpenCRAVAT does # not have any annotations for a provided set # of variants, then that modules annotations # may not exist in the SQLite table. df = pd.read_sql_query("SELECT * FROM variant", conn) table_var_annotators = set([col for col in df.columns]) filter_annotators = [] column2module = {{}} for ann in set(filter.keys()): try: # Multiple column filters col_names = filter[ann]['cols'] col_names = [c.strip() for c in col_names.split(',')] except KeyError: # One column filter col_names = [filter[ann]['col'].strip()] for col in col_names: coln = '{{}}__{{}}'.format(ann, col) filter_annotators.append(coln) column2module[coln] = ann filter_annotators = set(filter_annotators) tmp_annotators = table_var_annotators.intersection(filter_annotators) keep_annotators = set([column2module[ann] for ann in tmp_annotators]) # Sanity check if len(keep_annotators) == 0: print('WARNING: No filter annotators were provided that match oc run annotators.', file=sys.stderr) print('WARNING: The next filtering step may fail.', file=sys.stderr) # Filter to avoid SQL filtering issues filter = keep(filter, keep_annotators) print('Apply final filters to SQLite: ', filter) filter_query_nonnull = ' OR '.join([filterunit(annot) for annot in filter.keys()]) filter_query_null = ' AND '.join([filterunit_null(annot) for annot in filter.keys()]) filter_query = filter_query_nonnull + ' OR (' + filter_query_null + ')' print(filter_query) conn.execute('CREATE TABLE variant2 AS SELECT * FROM variant WHERE (' + filter_query + ')') conn.execute('DROP TABLE variant') conn.execute('ALTER TABLE variant2 RENAME TO variant') conn.execute('DELETE from sample WHERE base__uid NOT IN (SELECT base__uid FROM variant)') conn.execute('DELETE FROM gene WHERE base__hugo NOT IN (SELECT base__hugo FROM variant)') conn.execute('DELETE FROM mapping WHERE base__uid NOT IN (SELECT base__uid FROM variant)') conn.execute('UPDATE info SET colval = (SELECT COUNT(*) FROM variant) WHERE colkey == "Number of unique input variants"') conn.execute('VACUUM') conn.close() EOF # Create fix column script, # Fixes columns to a min and max AF cp {output.filter_2} {output.fixed} cat << EOF > {params.fixed} import sqlite3 import os import pandas as pd conn = sqlite3.connect("{output.fixed}") conn.isolation_level = None cursor = conn.cursor() depth = pd.read_sql_query('SELECT base__uid,vcfinfo__tot_reads,vcfinfo__af from variant', conn, index_col = 'base__uid') tot_read = depth.vcfinfo__tot_reads.str.split(';', expand = True).astype('float') af = depth.vcfinfo__af.str.split(';', expand = True).apply(pd.to_numeric) depth['vcfinfo__Max_read'] = tot_read.max(axis = 1) depth['vcfinfo__Min_read'] = tot_read.min(axis = 1) depth['vcfinfo__Max_af'] = af.max(axis = 1) depth['vcfinfo__Min_af'] = af.min(axis = 1) depth.to_sql('tmp', conn, if_exists = 'replace', index = True) conn.execute('alter table variant add column vcfinfo__Max_read numeric(50)') conn.execute('alter table variant add column vcfinfo__Min_read numeric(50)') conn.execute('alter table variant add column vcfinfo__Max_af numeric(50)') conn.execute('alter table variant add column vcfinfo__Min_af numeric(50)') qry = 'update variant set vcfinfo__Max_read = (select vcfinfo__Max_read from tmp where tmp.base__uid = variant.base__uid) where vcfinfo__Max_read is NULL' conn.execute(qry) qry = 'update variant set vcfinfo__Min_read = (select vcfinfo__Min_read from tmp where tmp.base__uid = variant.base__uid) where vcfinfo__Min_read is NULL' conn.execute(qry) qry = 'update variant set vcfinfo__Max_af = (select vcfinfo__Max_af from tmp where tmp.base__uid = variant.base__uid) where vcfinfo__Max_af is NULL' conn.execute(qry) qry = 'update variant set vcfinfo__Min_af = (select vcfinfo__Min_af from tmp where tmp.base__uid = variant.base__uid) where vcfinfo__Min_af is NULL' conn.execute(qry) conn.execute('''INSERT INTO variant_header (col_name, col_def) VALUES ('vcfinfo__Max_read','{{"index": null, "name": "vcfinfo__Max_read", "title": "Max reads", "type": "float", "categories": [], "width": 70, "desc": null, "hidden": false, "category": null, "filterable": true, "link_format": null, "genesummary": false, "table": false}}')''') conn.execute('''INSERT INTO variant_header (col_name, col_def) VALUES ('vcfinfo__Min_read','{{"index": null, "name": "vcfinfo__Min_read", "title": "Min reads", "type": "float", "categories": [], "width": 70, "desc": null, "hidden": false, "category": null, "filterable": true, "link_format": null, "genesummary": false, "table": false}}')''') conn.execute('''INSERT INTO variant_header (col_name, col_def) VALUES ('vcfinfo__Max_af','{{"index": null, "name": "vcfinfo__Max_af", "title": "Max AF", "type": "float", "categories": [], "width": 70, "desc": null, "hidden": false, "category": null, "filterable": true, "link_format": null, "genesummary": false, "table": false}}')''') conn.execute('''INSERT INTO variant_header (col_name, col_def) VALUES ('vcfinfo__Min_af','{{"index": null, "name": "vcfinfo__Min_af", "title": "Min AF", "type": "float", "categories": [], "width": 70, "desc": null, "hidden": false, "category": null, "filterable": true, "link_format": null, "genesummary": false, "table": false}}')''') conn.commit() conn.execute('drop table tmp') conn.execute('VACUUM') conn.close() EOF echo 'Running first filtering script' python3 {params.filter_1} echo 'Running secondary filtering script' python3 {params.filter_2} echo 'Running column fixing script' python3 {params.fixed} """ |
353 354 355 356 357 | shell: """ oc util mergesqlite \\ -o {output.merged} \\ {input.dbs} """ |
386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 | shell: """ # Environment variable for modules dir export OC_MODULES="{params.module}" oc run \\ -t vcf \\ -x \\ --newlog \\ --cleanrun \\ --system-option "modules_dir={params.module}" \\ -a {params.annot} \\ -n {params.prefix} \\ -l {params.genome} \\ -d {params.outdir} \\ --mp {threads} \\ {input.vcfs} """ |
471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 | shell: """ # Create first filtering script, # Filters based on AF mkdir -p {params.scripts} cp {input.db} {output.filter_1} cat << EOF > {params.filter_1} import sqlite3 import os maf = str({params.maf_thres}) mafc = str(1 - float({params.maf_thres})) so = "{params.so}" conn = sqlite3.connect("{output.filter_1}") conn.isolation_level = None cursor = conn.cursor() conn.execute('CREATE TABLE variant2 AS SELECT * FROM variant WHERE (base__so IN (' + '"' + '", "'.join(so.split(',')) + '"' + ')) AND ((gnomad__af IS NULL AND gnomad3__af IS NULL AND thousandgenomes__af IS NULL) OR (gnomad__af <= ' + maf +' OR gnomad__af >= '+mafc+' OR gnomad3__af <= '+maf+' OR gnomad3__af >= '+mafc+' OR thousandgenomes__af <= '+maf+' OR thousandgenomes__af >= '+mafc+'))') conn.execute('DROP TABLE variant') conn.execute('ALTER TABLE variant2 RENAME TO variant') conn.execute('DELETE from sample WHERE base__uid NOT IN (SELECT base__uid FROM variant)') conn.execute('DELETE FROM gene WHERE base__hugo NOT IN (SELECT base__hugo FROM variant)') conn.execute('DELETE FROM mapping WHERE base__uid NOT IN (SELECT base__uid FROM variant)') conn.execute('UPDATE info SET colval = (SELECT COUNT(*) FROM variant) WHERE colkey == "Number of unique input variants"') conn.execute('VACUUM') conn.close() EOF # Create second filtering script, # Filters based on filters in config cp {output.filter_1} {output.filter_2} cat << EOF > {params.filter_2} import pandas as pd import sqlite3 import os conn = sqlite3.connect("{output.filter_2}") conn.isolation_level = None cursor = conn.cursor() filter = {params.secondary} def keep(dd, used_annotators): final_annotators = {{annotator: dd[annotator] for annotator in used_annotators}} return final_annotators def filtercol(dd, annot): if dd['relation']=='IN': return annot + '__' + dd['col'] + ' ' + dd['relation'] + ' ("' + '", "'.join(dd['value'].split(',')) + '")' elif dd['relation']=='IS' or dd['relation']=='IS NOT': return annot + '__' + dd['col'] + ' ' + dd['relation'] + ' ' + dd['value'] else: return annot + '__'+ dd['col'] + ' ' + dd['relation'] + dd['value'] def filterunit(annot): dd = filter[annot] if dd['col'].lower()=='multiple': cols = dd['cols'].split(',') relations = dd['relation'].split(',') values = dd['value'].split(',') return(' OR '.join([filtercol({{'col':cols[0], 'relation': relations[0], 'value': values[0]}}, annot) for i in range(len(cols))])) else: return(filtercol(dd, annot)) def filterunit_null(annot): dd = filter[annot] if dd['col'].lower()=='multiple': cols = dd['cols'].split(',') relations = dd['relation'].split(',') values = dd['value'].split(',') return(' AND '.join([annot + '__' + cols[i] + ' IS NULL' for i in range(len(cols))])) else: return(annot + '__' + dd['col'] + ' IS NULL') # Find the intersect of annotators listed # in colnames of the SQLite variants table # and annotators in filter config. Must run # this step, if a module in OpenCRAVAT does # not have any annotations for a provided set # of variants, then that modules annotations # may not exist in the SQLite table. df = pd.read_sql_query("SELECT * FROM variant", conn) table_var_annotators = set([col for col in df.columns]) filter_annotators = [] column2module = {{}} for ann in set(filter.keys()): try: # Multiple column filters col_names = filter[ann]['cols'] col_names = [c.strip() for c in col_names.split(',')] except KeyError: # One column filter col_names = [filter[ann]['col'].strip()] for col in col_names: coln = '{{}}__{{}}'.format(ann, col) filter_annotators.append(coln) column2module[coln] = ann filter_annotators = set(filter_annotators) tmp_annotators = table_var_annotators.intersection(filter_annotators) keep_annotators = set([column2module[ann] for ann in tmp_annotators]) # Sanity check if len(keep_annotators) == 0: print('WARNING: No filter annotators were provided that match oc run annotators.', file=sys.stderr) print('WARNING: The next filtering step may fail.', file=sys.stderr) # Filter to avoid SQL filtering issues filter = keep(filter, keep_annotators) print('Apply final filters to SQLite: ', filter) filter_query_nonnull = ' OR '.join([filterunit(annot) for annot in filter.keys()]) filter_query_null = ' AND '.join([filterunit_null(annot) for annot in filter.keys()]) filter_query = filter_query_nonnull + ' OR (' + filter_query_null + ')' print(filter_query) conn.execute('CREATE TABLE variant2 AS SELECT * FROM variant WHERE (' + filter_query + ')') conn.execute('DROP TABLE variant') conn.execute('ALTER TABLE variant2 RENAME TO variant') conn.execute('DELETE from sample WHERE base__uid NOT IN (SELECT base__uid FROM variant)') conn.execute('DELETE FROM gene WHERE base__hugo NOT IN (SELECT base__hugo FROM variant)') conn.execute('DELETE FROM mapping WHERE base__uid NOT IN (SELECT base__uid FROM variant)') conn.execute('UPDATE info SET colval = (SELECT COUNT(*) FROM variant) WHERE colkey == "Number of unique input variants"') conn.execute('VACUUM') conn.close() EOF # Create fix column script, # Fixes columns to a min and max AF cp {output.filter_2} {output.fixed} cat << EOF > {params.fixed} import sqlite3 import os import pandas as pd conn = sqlite3.connect("{output.fixed}") conn.isolation_level = None cursor = conn.cursor() depth = pd.read_sql_query('SELECT base__uid,vcfinfo__tot_reads,vcfinfo__af from variant', conn, index_col = 'base__uid') tot_read = depth.vcfinfo__tot_reads.str.split(';', expand = True).astype('float') af = depth.vcfinfo__af.str.split(';', expand = True).apply(pd.to_numeric) depth['vcfinfo__Max_read'] = tot_read.max(axis = 1) depth['vcfinfo__Min_read'] = tot_read.min(axis = 1) depth['vcfinfo__Max_af'] = af.max(axis = 1) depth['vcfinfo__Min_af'] = af.min(axis = 1) depth.to_sql('tmp', conn, if_exists = 'replace', index = True) conn.execute('alter table variant add column vcfinfo__Max_read numeric(50)') conn.execute('alter table variant add column vcfinfo__Min_read numeric(50)') conn.execute('alter table variant add column vcfinfo__Max_af numeric(50)') conn.execute('alter table variant add column vcfinfo__Min_af numeric(50)') qry = 'update variant set vcfinfo__Max_read = (select vcfinfo__Max_read from tmp where tmp.base__uid = variant.base__uid) where vcfinfo__Max_read is NULL' conn.execute(qry) qry = 'update variant set vcfinfo__Min_read = (select vcfinfo__Min_read from tmp where tmp.base__uid = variant.base__uid) where vcfinfo__Min_read is NULL' conn.execute(qry) qry = 'update variant set vcfinfo__Max_af = (select vcfinfo__Max_af from tmp where tmp.base__uid = variant.base__uid) where vcfinfo__Max_af is NULL' conn.execute(qry) qry = 'update variant set vcfinfo__Min_af = (select vcfinfo__Min_af from tmp where tmp.base__uid = variant.base__uid) where vcfinfo__Min_af is NULL' conn.execute(qry) conn.execute('''INSERT INTO variant_header (col_name, col_def) VALUES ('vcfinfo__Max_read','{{"index": null, "name": "vcfinfo__Max_read", "title": "Max reads", "type": "float", "categories": [], "width": 70, "desc": null, "hidden": false, "category": null, "filterable": true, "link_format": null, "genesummary": false, "table": false}}')''') conn.execute('''INSERT INTO variant_header (col_name, col_def) VALUES ('vcfinfo__Min_read','{{"index": null, "name": "vcfinfo__Min_read", "title": "Min reads", "type": "float", "categories": [], "width": 70, "desc": null, "hidden": false, "category": null, "filterable": true, "link_format": null, "genesummary": false, "table": false}}')''') conn.execute('''INSERT INTO variant_header (col_name, col_def) VALUES ('vcfinfo__Max_af','{{"index": null, "name": "vcfinfo__Max_af", "title": "Max AF", "type": "float", "categories": [], "width": 70, "desc": null, "hidden": false, "category": null, "filterable": true, "link_format": null, "genesummary": false, "table": false}}')''') conn.execute('''INSERT INTO variant_header (col_name, col_def) VALUES ('vcfinfo__Min_af','{{"index": null, "name": "vcfinfo__Min_af", "title": "Min AF", "type": "float", "categories": [], "width": 70, "desc": null, "hidden": false, "category": null, "filterable": true, "link_format": null, "genesummary": false, "table": false}}')''') conn.commit() conn.execute('drop table tmp') conn.execute('VACUUM') conn.close() EOF echo 'Running first filtering script' python3 {params.filter_1} echo 'Running secondary filtering script' python3 {params.filter_2} echo 'Running column fixing script' python3 {params.fixed} """ |
36 37 38 39 40 41 | shell: """ python3 {params.get_flowcell_lanes} \\ {input.r1} \\ {wildcards.name} \\ > {output.txt} """ |
66 67 68 69 70 71 72 | shell: """ fastqc \\ {input.r1} \\ {input.r2} \\ -t {threads} \\ -o {params.outdir} """ |
105 106 107 108 109 110 111 112 113 114 | shell: """ fastq_screen --conf {params.fastq_screen_config} \\ --outdir {params.outdir} \\ --threads {threads} \\ --subset 1000000 \\ --aligner bowtie2 \\ --force \\ {input.fq1} \\ {input.fq2} """ |
140 141 142 143 144 145 146 | shell: """ fastqc -t {threads} \\ -f bam \\ --contaminants {params.adapters} \\ -o {params.outdir} \\ {input.bam} """ |
172 173 174 175 176 177 178 179 180 181 182 183 | shell: """ unset DISPLAY qualimap bamqc -bam {input.bam} \\ --java-mem-size=92G \\ -c -ip --gd HUMAN \\ -outdir {params.outdir} \\ -outformat HTML \\ -nt {threads} \\ --skip-duplicated \\ -nw 500 \\ -p NON-STRAND-SPECIFIC """ |
207 208 209 210 211 | shell: """ samtools flagstat --threads {threads} \\ {input.bam} \\ > {output.txt} """ |
238 239 240 241 242 | shell: """ bcftools stats \\ {input.vcf} \\ > {output.txt} """ |
272 273 274 275 276 277 278 | shell: """ gatk --java-options '-Xmx{params.memory}g -XX:ParallelGCThreads={threads}' VariantEval \\ -R {params.genome} \\ -O {output.grp} \\ --dbsnp {params.dbsnp} \\ --eval {input.vcf} """ |
307 308 309 310 311 312 313 314 | shell: """ java -Xmx{params.memory}g -jar ${{SNPEFF_JAR}} \\ -v -canon -c {params.config} \\ -csvstats {output.csv} \\ -stats {output.html} \\ {params.genome} \\ {input.vcf} > {output.vcf} """ |
340 341 342 343 344 345 | shell: """ vcftools \\ --gzvcf {input.vcf} \\ --het \\ --out {params.prefix} """ |
372 373 374 375 376 377 378 379 | shell: """ java -Xmx{params.memory}g -jar ${{PICARDJARPATH}}/picard.jar \\ CollectVariantCallingMetrics \\ INPUT={input.vcf} \\ OUTPUT={params.prefix} \\ DBSNP={params.dbsnp} \\ Validation_Stringency=SILENT """ |
410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 | shell: """ echo "Extracting sites to estimate ancestry." somalier extract \\ -d {params.outdir} \\ --sites {params.sites} \\ -f {params.genome} \\ {input.vcf} # Check if pedigree file exists, # pulled from patient database pedigree_option="" if [ -f "{params.ped}" ]; then # Use PED with relate command pedigree_option="-p {params.ped}" fi echo "Estimating relatedness with pedigree $pedigree_option" somalier relate "$pedigree_option" \\ -i -o {params.outdir}/relatedness \\ {output.somalier} echo "Estimating ancestry." somalier ancestry \\ --n-pcs=10 \\ -o {params.outdir}/ancestry \\ --labels {params.ancestry_db}/ancestry-labels-1kg.tsv \\ {params.ancestry_db}/*.somalier ++ \\ {output.somalier} || {{ # Somalier ancestry error, # usually due to not finding # any sites compared to the # its references, expected # with sub-sampled datasets echo "WARNING: Somalier ancestry failed..." 1>&2 touch {output.ancestry} }} """ |
500 501 502 503 504 505 506 507 | shell: """ multiqc --ignore '*/.singularity/*' \\ --ignore 'slurmfiles/' \\ --exclude peddy \\ -f --interactive \\ -n {output.report} \\ {params.workdir} """ |
100 101 102 103 104 105 106 107 108 109 110 111 112 113 | shell: """ mkdir -p '{params.tmppath}' octopus --threads {threads} \\ -C cancer \\ --working-directory {params.wd} \\ --temp-directory-prefix {params.tmpdir} \\ -R {params.genome} \\ -I {input.normal} {input.tumor} {params.normal_option} \\ -o {output.vcf} \\ --forest-model {params.g_model} \\ --somatic-forest-model {params.s_model} \\ --annotations AC AD DP \\ -T {params.chunk} """ |
140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 | shell: """ # Create list of chunks to merge find {params.octopath} -iname '{params.tumor}.vcf.gz' \\ > {output.lsl} # Merge octopus chunk calls, # contains both germline and # somatic variants bcftools concat \\ --threads {threads} \\ -d exact \\ -a \\ -f {output.lsl} \\ -o {output.raw} \\ -O v # Filter Octopus callset for # variants with SOMATIC tag grep -E "#|CHROM|SOMATIC" {output.raw} \\ > {output.vcf} """ |
196 197 198 199 200 201 202 203 204 205 206 | shell: """ octopus --threads {threads} \\ --working-directory {params.wd} \\ --temp-directory-prefix {params.tmpdir} \\ -R {params.genome} \\ -I {input.normal} \\ -o {output.vcf} \\ --forest-model {params.model} \\ --annotations AC AD AF DP \\ -T {params.chroms} """ |
246 247 248 249 250 251 252 253 254 255 256 | shell: """ gatk Mutect2 \\ -R {params.genome} \\ -I {input.tumor} {params.i_option} {params.normal_option} \\ --panel-of-normals {params.pon} \\ --germline-resource {params.germsource} \\ -L {params.chrom} \\ -O {output.vcf} \\ --f1r2-tar-gz {output.orien} \\ --independent-mates """ |
287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 | shell: """ # Setups temporary directory for # intermediate files with built-in # mechanism for deletion on exit if [ ! -d "{params.tmpdir}" ]; then mkdir -p "{params.tmpdir}"; fi tmp=$(mktemp -d -p "{params.tmpdir}") trap 'rm -rf "${{tmp}}"' EXIT java -Xmx{params.memory}g -Djava.io.tmpdir=${{tmp}} \ -XX:ParallelGCThreads={threads} -jar $GATK_JAR -T CombineVariants \\ --use_jdk_inflater --use_jdk_deflater \\ -R {params.genome} \\ --filteredrecordsmergetype KEEP_UNCONDITIONAL \\ --assumeIdenticalSamples \\ -o {output.vcf} \\ {params.multi_variant_option} """ |
347 348 349 350 351 352 353 354 355 356 357 358 | shell: """ # Gather Mutect2 stats gatk MergeMutectStats \\ {params.multi_stats_option} \\ -O {output.stats} & # Learn read orientaion model # for artifact filtering gatk LearnReadOrientationModel \\ --output {output.orien} \\ {params.multi_orien_option} & wait """ |
386 387 388 389 390 391 392 | shell: """ gatk --java-options '-Xmx{params.memory}g' GetPileupSummaries \\ -I {input.tumor} \\ -V {params.gsnp} \\ -L {params.gsnp} \\ -O {output.summary} """ |
422 423 424 425 426 427 428 | shell: """ gatk --java-options '-Xmx{params.memory}g' GetPileupSummaries \\ -I {input.normal} \\ -V {params.gsnp} \\ -L {params.gsnp} \\ -O {output.summary} """ |
458 459 460 461 462 | shell: """ gatk CalculateContamination \\ -I {input.tumor} {params.normal_option} \\ -O {output.summary} """ |
493 494 495 496 497 498 499 500 501 502 | shell: """ # Mutect2 orien bias filter gatk FilterMutectCalls \\ -R {params.genome} \\ -V {input.vcf} \\ --ob-priors {input.orien} \\ --contamination-table {input.summary} \\ -O {output.vcf} \\ --stats {input.stats} """ |
549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 | shell: """ # Prefilter and calculate position # specific summary statistics MuSE call \\ -n {threads} \\ -f {params.genome} \\ -O {params.tumor} \\ {input.tumor} {params.normal_option} # Calculate cutoffs from a # sample specific error model MuSE sump \\ -n {threads} \\ -G \\ -I {output.txt} \\ -O {output.vcf} \\ -D {params.dbsnp} # Renaming TUMOR/NORMAL in VCF # with real sample names echo -e "TUMOR\\t{params.rename}{params.normal_header}" \\ > {output.header} bcftools reheader \\ -o {output.final} \\ -s {output.header} \\ {output.vcf} """ |
621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 | shell: """ # Setups temporary directory for # intermediate files with built-in # mechanism for deletion on exit if [ ! -d "{params.tmpdir}" ]; then mkdir -p "{params.tmpdir}"; fi tmp=$(mktemp -d -p "{params.tmpdir}") trap 'rm -rf "${{tmp}}"' EXIT # Delete previous attempts output # directory to ensure hard restart if [ -d "{params.outdir}" ]; then rm -rf "{params.outdir}" fi # Configure Strelka somatic workflow configureStrelkaSomaticWorkflow.py \\ --referenceFasta {params.genome} \\ --tumorBam {input.tumor} {params.normal_option} \\ --runDir {params.outdir} \\ --callRegions {params.regions} # Call somatic variants with Strelka echo "Starting Strelka workflow..." {params.workflow} \\ -m local \\ -j {threads} \\ -g {params.memory} # Combine and filter results echo "Running CombineVariants..." java -Xmx{params.memory}g -Djava.io.tmpdir=${{tmp}} \ -XX:ParallelGCThreads={threads} -jar $GATK_JAR -T CombineVariants \\ --use_jdk_inflater --use_jdk_deflater \\ -R {params.genome} \\ --variant {output.snps} \\ --variant {output.indels} \\ --assumeIdenticalSamples \\ --filteredrecordsmergetype KEEP_UNCONDITIONAL \\ -o {output.vcf} # Renaming TUMOR/NORMAL in # VCF with real sample names echo -e "TUMOR\\t{params.tumor}{params.normal_header}" \\ > {output.header} bcftools reheader \\ -o {output.final} \\ -s {output.header} \\ {output.vcf} """ |
699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 | shell: """ # Normalize VCF prior to SelectVar # which needs multi-allelic sites # to be split prior to running echo "Running bcftools norm..." bcftools norm \\ -c w \\ -m - \\ -Ov \\ --threads {threads} \\ -f {params.genome} \\ -o {output.norm} \\ {input.vcf} # Remove filtered sites and output # variants not called in the PON echo "Running SelectVariants..." gatk --java-options "-Xmx{params.memory}g" SelectVariants \\ -R {params.genome} \\ --variant {output.norm} \\ --discordance {params.pon} \\ --exclude-filtered \\ --output {output.filt} # Fix format number metadata, gatk # SelectVariants converts Number # metadata incorrectly when it # it is set to Number=. sed -i 's/Number=R/Number=./g' \\ {output.filt} """ |
756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 | shell: """ # Filter call set to tumor sites bcftools view \\ -c1 \\ -Oz \\ -s '{params.sample}' \\ -o {output.tmp} \\ {input.vcf} # Renaming sample name in VCF # to contain caller name echo -e "{params.sample}\\t{params.rename}" \\ > {output.header} bcftools reheader \\ -o {output.tumor} \\ -s {output.header} \\ {output.tmp} # Create an VCF index for intersect bcftools index \\ -f \\ --tbi \\ {output.tumor} """ |
806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 | shell: """ # Filter call set to normal sites bcftools view \\ --force-samples \\ -c1 \\ -Oz \\ -s '{params.sample}' \\ -o {output.tmp} \\ {input.vcf} # Renaming sample name in VCF # to contain caller name echo -e "{params.sample}\\t{params.rename}" \\ > {output.header} bcftools reheader \\ -o {output.normal} \\ -s {output.header} \\ {output.tmp} # Create an VCF index for intersect bcftools index \\ -f \\ --tbi \\ {output.normal} """ |
855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 | shell: """ # Delete previous attempts output # directory to ensure hard restart if [ -d "{params.isec_dir}" ]; then rm -rf "{params.isec_dir}" fi # Intersect somatic callset to find # variants in at least two callers bcftools isec \\ -Oz \\ -n+2 \\ -c none \\ -p {params.isec_dir} \\ {input.tumors} # Create list of files to merge find {params.isec_dir} \\ -name '*.vcf.gz' \\ | sort \\ > {output.lsl} # Merge variants found in at # least two callers bcftools merge \\ -Oz \\ -o {output.merged} \\ -l {output.lsl} # Create an VCF index for merge bcftools index \\ -f \\ --tbi \\ {output.merged} """ |
921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 | shell: """ # vcf2maf needs an uncompressed VCF file zcat {input.vcf} \\ > {output.vcf} # Run VEP and convert VCF into MAF file vcf2maf.pl \\ --input-vcf {output.vcf} \\ --output-maf {output.maf} \\ --vep-path ${{VEP_HOME}} \\ --vep-data {params.vep_data} \\ --cache-version {params.ref_version} \\ --ref-fasta {params.genome} \\ --vep-forks {threads} \\ --tumor-id {params.tumor} {params.normal_option} \\ --ncbi-build {params.vep_build} \\ --species {params.vep_species} """ |
958 959 960 961 962 | shell: """ echo "Combining MAFs..." head -2 {input.mafs[0]} > {output.maf} awk 'FNR>2 {{print}}' {input.mafs} >> {output.maf} """ |
992 993 994 995 996 997 998 | shell: """ Rscript {params.script} \\ {params.wdir} \\ {input.maf} \\ {output.summary} \\ {output.oncoplot} """ |
1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 | shell: """ # SigProfiler input directory must # only contain input MAF mkdir -p "{params.wdir}" ln -sf {input.maf} {params.wdir} python3 {params.script} \\ -i {params.wdir}/ \\ -o {params.odir}/ \\ -p {params.sample} \\ -r {params.genome} """ |
1075 1076 1077 1078 1079 | shell: """ # Merge SigProfiler PDFs pdfunite {input.pdfs} \\ {output.pdf} """ |
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 | shell: """ # Delete previous attempts output # directory to ensure hard restart if [ -d "{params.outdir}" ]; then rm -rf "{params.outdir}" fi # Configure Manta germline SV workflow configManta.py \\ --callRegions {params.regions} \\ --bam {input.bam} \\ --referenceFasta {params.genome} \\ --runDir {params.outdir} # Call germline SV with Manta workflow echo "Starting Manta workflow..." {params.workflow} \\ -j {threads} \\ -g {params.memory} """ |
108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 | shell: """ # Delete previous attempts output # directory to ensure hard restart if [ -d "{params.outdir}" ]; then rm -rf "{params.outdir}" fi # Configure Manta somatic SV workflow configManta.py {params.normal_option} \\ --callRegions {params.regions} \\ --tumorBam {input.tumor} \\ --referenceFasta {params.genome} \\ --runDir {params.outdir} \\ --outputContig # Call somatic SV with Manta workflow echo "Starting Manta workflow..." {params.workflow} \\ -j {threads} \\ -g {params.memory} """ |
29 30 31 32 33 34 35 36 37 38 | shell: """ fastp -w {threads} \\ --detect_adapter_for_pe \\ --in1 {input.r1} \\ --in2 {input.r2} \\ --out1 {output.r1} \\ --out2 {output.r2} \\ --json {output.json} \\ --html {output.html} """ |
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 | shell: """ # Setups temporary directory for # intermediate files with built-in # mechanism for deletion on exit if [ ! -d "{params.tmpdir}" ]; then mkdir -p "{params.tmpdir}"; fi tmp=$(mktemp -d -p "{params.tmpdir}") trap 'rm -rf "${{tmp}}"' EXIT bwa-mem2 mem \\ -t {threads} \\ -K 100000000 \\ -M \\ -R \'@RG\\tID:{params.sample}\\tSM:{params.sample}\\tPL:illumina\\tLB:{params.sample}\\tPU:{params.sample}\\tCN:ncbr\\tDS:wgs\' \\ {params.genome} \\ {input.r1} \\ {input.r2} \\ | samblaster -M \\ | samtools sort -@{params.sort_threads} \\ -T ${{tmp}} \\ --write-index \\ -m 10G - \\ -o {output.bam}##idx##{output.bai} """ |
Support
- Future updates
Related Workflows





