Help improve this workflow!
This workflow has been published but could be further improved with some additional meta data:- Keyword(s) in categories input, output, operation, topic
You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .
This Snakemake workflow automatically generates all results and figures from the initial Varlociraptor paper focusing on somatic variant calling .
Rerunning the workflow requires a lot of computation time and some unavoidable external resources that have to be manually deployed . We therefore hope that the Snakemake report in the supplementary material of the paper , providing all results together with comprehensive provenance information (workflow steps, parameters, software versions, code) will already yield sufficient information in most of the cases. If you nevertheless intend to rerun the analysis, feel free to follow the steps below, and please inform us about any potential issues.
General Requirements
Any 64-bit Linux installation with GLIBC 2.5 or newer (i.e. any Linux distribution that is newer than CentOS 6). Note that the restriction of this workflow to Linux is purely a design decision (to save space and ensure reproducibility) and not related to Conda/Bioconda. Bioconda packages are available for both Linux and MacOS in general.
Usage
This workflow can be used to recreate all results found in the paper.
Step 1: Setup system
Variant a: Installing Miniconda on your system
If you are on a Linux system with GLIBC 2.5 or newer (i.e. any Linux distribution that is newer than CentOS 6), you can simply install Miniconda3 with
curl -o /tmp/miniconda.sh https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh && bash /tmp/miniconda.sh
Make sure to answer
yes
to the question whether your PATH variable shall be modified.
Afterwards, open a new shell/terminal.
Variant b: Use a Docker container
Otherwise, e.g., on MacOS or if you don't want to modify your system setup, install Docker , run
docker run -it continuumio/miniconda3 /bin/bash
and execute all the following steps within that container.
Variant c: Use an existing Miniconda installation
If you want to use an existing Miniconda installation, please be aware that this is only possible if it uses Python 3 by default. You can check this via
python --version
Further, ensure it is up to date with
conda update --all
Step 2: Setup Bioconda channel
Setup Bioconda with
conda config --add channels defaults
conda config --add channels conda-forge
conda config --add channels bioconda
Step 3: Install Snakemake
Install Snakemake >=5.4.0 with
conda install snakemake
If you already have an older version of Snakemake, please make sure it is updated to >=5.4.0.
Step 4: Download the workflow
First, create a working directory:
mkdir varlociraptor-workflow
cd varlociraptor-workflow
Then, download the workflow archive from https://doi.org/10.5281/zenodo.3361700 and unpack it with
tar -xf workflow.tar.gz
Step 5: Obtain additional resources
In this special case there are unfortunately unavoidable additional requirements, due to licensing restrictions and data size.
-
The required real data has to be obtained from EGA (EGAD00001002142) . After downloading it, edit the config.yaml in order to point to the right paths here and below .
-
The required simulated data has to be obtained from Zenodo: https://doi.org/10.5281/zenodo.1421298. Download it, convert back to BAM, and edit the config.yaml in order to point to the right paths here and here .
-
The synthetic data can be obtained by running a separate workflow: https://doi.org/10.5281/zenodo.3630241.
Once all data is obtained, edit the
config.yaml
under
datasets:
to point to the right file paths in your system.
Step 6: Run the workflow
Execute the analysis workflow with Snakemake
snakemake --use-conda
Please wait a few minutes for the analysis to finish.
Results can be found in the folder
figs/
.
If you have been running the workflow in the docker container (see above),
you can obtain the results with
docker cp <container-id>:/bioconda-workflow/figs .
whith
<container-id>
being the ID of the container.
Known errors
-
If you see an error like
ImportError: No module named 'appdirs'
when starting Snakemake, you are likely suffering from a bug in an older conda version. Make sure to update your conda installation with
conda update --all
and then reinstall the
appdirs
andsnakemake
package withconda install -f appdirs snakemake
-
If you see an error like
ImportError: Missing required dependencies ['numpy']
you are likely suffering from a bug in an older conda version. Make sure to update your conda installation with
conda update --all
and then reinstall the
snakemake
package withconda install -f snakemake
Code Snippets
1 2 3 4 5 6 7 8 9 10 11 12 | __author__ = "Johannes Köster" __copyright__ = "Copyright 2016, Johannes Köster" __email__ = "koester@jimmy.harvard.edu" __license__ = "MIT" from snakemake.shell import shell shell( "bcftools concat {snakemake.params} -o {snakemake.output[0]} " "{snakemake.input}") |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | __author__ = "Johannes Köster" __copyright__ = "Copyright 2016, Johannes Köster" __email__ = "koester@jimmy.harvard.edu" __license__ = "MIT" from snakemake.shell import shell try: exclude = "-x " + snakemake.input.exclude except AttributeError: exclude = "" extra = snakemake.params.get("extra", "") log = snakemake.log_fmt_shell(stdout=True, stderr=True) shell( "OMP_NUM_THREADS={snakemake.threads} delly call {extra} " "{exclude} -t {snakemake.params.vartype} -g {snakemake.input.ref} " "-o {snakemake.output[0]} {snakemake.input.samples} {log}") |
1 2 3 4 5 6 7 8 9 10 11 12 | __author__ = "Johannes Köster" __copyright__ = "Copyright 2016, Johannes Köster" __email__ = "koester@jimmy.harvard.edu" __license__ = "MIT" from snakemake.shell import shell shell( "bcftools concat {snakemake.params} -o {snakemake.output[0]} " "{snakemake.input}") |
1 2 3 4 5 6 7 8 9 10 11 12 | __author__ = "Johannes Köster" __copyright__ = "Copyright 2016, Johannes Köster" __email__ = "koester@jimmy.harvard.edu" __license__ = "MIT" from snakemake.shell import shell shell( "bcftools view {snakemake.params} {snakemake.input[0]} " "-o {snakemake.output[0]}") |
1 2 3 4 5 6 7 8 9 10 11 12 | __author__ = "Johannes Köster" __copyright__ = "Copyright 2016, Johannes Köster" __email__ = "koester@jimmy.harvard.edu" __license__ = "MIT" from snakemake.shell import shell shell( "bcftools concat {snakemake.params} -o {snakemake.output[0]} " "{snakemake.input.calls}") |
1 2 3 4 5 6 7 8 9 10 11 12 | __author__ = "Johannes Köster" __copyright__ = "Copyright 2016, Johannes Köster" __email__ = "koester@jimmy.harvard.edu" __license__ = "MIT" from snakemake.shell import shell shell( "bcftools view {snakemake.params} {snakemake.input[0]} " "-o {snakemake.output[0]}") |
1 2 3 4 5 6 7 8 9 10 11 12 | __author__ = "Johannes Köster" __copyright__ = "Copyright 2016, Johannes Köster" __email__ = "koester@jimmy.harvard.edu" __license__ = "MIT" from snakemake.shell import shell shell("picard MarkDuplicates {snakemake.params} INPUT={snakemake.input} " "OUTPUT={snakemake.output.bam} METRICS_FILE={snakemake.output.metrics} " "&> {snakemake.log}") |
1 2 3 4 5 6 7 8 9 10 | __author__ = "Johannes Köster" __copyright__ = "Copyright 2016, Johannes Köster" __email__ = "koester@jimmy.harvard.edu" __license__ = "MIT" from snakemake.shell import shell shell("samtools index {snakemake.params} {snakemake.input[0]} {snakemake.output[0]}") |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | __author__ = "Johannes Köster" __copyright__ = "Copyright 2016, Johannes Köster" __email__ = "koester@jimmy.harvard.edu" __license__ = "MIT" import os from snakemake.shell import shell prefix = os.path.splitext(snakemake.output[0])[0] shell( "samtools sort {snakemake.params} -@ {snakemake.threads} -o {snakemake.output[0]} " "-T {prefix} {snakemake.input[0]}") |
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | __author__ = "Julian de Ruiter" __copyright__ = "Copyright 2017, Julian de Ruiter" __email__ = "julianderuiter@gmail.com" __license__ = "MIT" from snakemake.shell import shell extra = snakemake.params.get("extra", "") region = snakemake.params.get("region", "") log = snakemake.log_fmt_shell(stdout=False, stderr=True) shell("samtools stats {extra} {snakemake.input}" " {region} > {snakemake.output} {log}") |
9 10 | script: "../scripts/adhoc-calling.py" |
14 15 16 | shell: "(break-point-inspector -vcf {input.manta} -ref {input.samples[1]} -tumor {input.samples[0]} " "-output_vcf {output}) > {log} 2>&1" |
29 30 | wrapper: "0.22.0/bio/bcftools/view" |
40 41 | wrapper: "0.22.0/bio/bcftools/view" |
16 17 | wrapper: "0.17.4/bio/delly" |
27 28 | wrapper: "0.17.4/bio/bcftools/concat" |
34 35 36 37 | run: with open(output[0], "w") as out: print("tumor", "tumor", file=out) print("normal", "control", file=out) |
54 55 56 57 | shell: "delly filter -m 0 -r 1.0 --samples {input.samples} " "-o {params.tmp} {input.bcf}; " "bcftools view -i INFO/SOMATIC -f PASS -Ob {params.tmp} > {output}" |
14 15 | script: "../scripts/annotate-truth.py" |
43 44 | shell: "bcftools view {params.regions} {input.calls} | rbt vcf-match {params.match} {input.truth} > {output}" |
59 60 | shell: "bcftools view {params.regions} {input.calls} | rbt vcf-match {params.match} {input.truth} > {output}" |
70 71 | shell: "rbt vcf-to-txt --genotypes --info SOMATIC SVLEN SVTYPE TAF NAF < {input} > {output}" |
108 109 | shell: "rbt vcf-to-txt {params.gt} {params.tags} --info MATCHING < {input} > {output}" |
122 123 | shell: "rbt vcf-to-txt {params.gt} {params.tags} < {input} > {output}" |
136 137 | shell: "rbt vcf-to-txt {params.gt} {params.tags} --info MATCHING < {input} > {output}" |
154 155 | script: "../scripts/obtain-tp-fp.py" |
211 212 | script: "../scripts/plot-precision-recall.py" |
228 229 | script: "../scripts/plot-allelefreq-recall.py" |
245 246 | script: "../scripts/plot-fdr-control.py" |
260 261 | script: "../scripts/plot-allelefreq-estimation.py" |
275 276 | script: "../scripts/plot-allelefreq-scatter.py" |
289 290 | script: "../scripts/plot-score-dist.py" |
300 301 | script: "../scripts/bam-stats.py" |
315 316 | shell: "rbt vcf-match {params.match} {params.bcfs[1]} < {params.bcfs[0]} > {output}" |
336 337 | script: "../scripts/aggregate-concordance.py" |
350 351 | shell: "rbt vcf-to-txt {params.gt} {params.tags} --info MATCHING < {input} > {output}" |
364 365 | script: "../scripts/plot-concordance-upset.R" |
382 383 | script: "../scripts/plot-concordance.py" |
393 394 | script: "../scripts/plot-freqdist.py" |
8 9 10 11 12 13 14 15 16 17 18 | shell: """ cd resources curl -L https://github.com/nygenome/lancet/archive/{params.commit}.tar.gz | tar -xz cd lancet-{params.commit} sed -i 's/-Wl,-rpath,$(ABS_BAMTOOLS_DIR)\/lib\///' Makefile make clean prefix=$(dirname $(dirname $(which $CXX))) make CXX=$CXX INCLUDES="-I$prefix/include/ -I$prefix/include/bamtools -L$prefix/lib" cp lancet .. """ |
47 48 49 50 51 | shell: "LD_LIBRARY_PATH=$CONDA_PREFIX/lib " "resources/lancet --tumor {input.bams[0]} --normal {input.bams[1]} " "--ref {input.ref} --reg {params.region} --num-threads {threads} " "{params.extra} > {output} 2> {log}" |
62 63 | shell: "sed -r 's/MS\=[0-9]+[ACGT]+/MS/g' {input.vcf} | bcftools annotate -o {output} -h {input.header} -" |
73 74 | shell: "bcftools concat -Ob {input} > {output}" |
87 88 | wrapper: "0.19.3/bio/bcftools/view" |
20 21 22 23 24 | shell: "rm -rf {params.dir}; " "(configManta.py {params.extra} --tumorBam {input.samples[0]} --normalBam {input.samples[1]} " "--referenceFasta {input.ref} --runDir {params.dir}; " "{params.dir}/runWorkflow.py -m local -j {threads}) > {log} 2>&1" |
34 35 | wrapper: "0.22.0/bio/bcftools/view" |
45 46 | wrapper: "0.22.0/bio/bcftools/view" |
59 60 | wrapper: "0.22.0/bio/bcftools/view" |
15 16 | wrapper: "0.22.0/bio/samtools/sort" |
28 29 | shell: "samtools bam2fq {input} -1 {output.m1} -2 {output.m2} -0 {output.mixed}" |
41 42 | shell: "bwa index -p {params.prefix} {input}" |
72 73 74 | shell: "(resources/bwa mem -t {threads} {params.extra} {params.index} {input.sample} | " "samtools view -Sb - > {output}) 2> {log}" |
85 86 | wrapper: "0.22.0/bio/samtools/sort" |
99 100 | wrapper: "0.22.0/bio/picard/markduplicates" |
108 109 | wrapper: "0.22.0/bio/samtools/index" |
8 9 | shell: "faidx --transform bed {input} > {output}" |
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | shell: """ (preprocess.py --mode call --reference {input.ref} \ --region_bed {input.bed} --tumor_bam {input.bams[0]} \ --normal_bam {input.bams[1]} --work {output.workdir} \ --min_mapq 10 --num_threads {threads} \ --scan_alignments_binary /opt/neusomatic/neusomatic/bin/scan_alignments call.py --candidates_tsv {output.workdir}/dataset/*/candidates*.tsv \ --reference {input.ref} --out {output.workdir} \ --checkpoint /opt/neusomatic/neusomatic/models/NeuSomatic_v0.1.4_standalone_SEQC-WGS-Spike.pth \ --num_threads {threads} \ --batch_size 100 python `which postprocess.py` --reference {input.ref} --tumor_bam {input.bams[0]} \ --pred_vcf {output.workdir}/pred.vcf \ --candidates_vcf {output.workdir}/work_tumor/filtered_candidates.vcf \ --output_vcf {output.vcf} \ --work {output.workdir}) 2> {log} """ |
59 60 | shell: "bcftools annotate -Ob -o {output} -h {input.header} {input.vcf}" |
74 75 | wrapper: "0.19.3/bio/bcftools/view" |
15 16 | shell: "run_novobreak {input.ref} {input.bams[0]} {input.bams[1]} {threads} {output.workdir} > {log} 2>&1" |
26 27 | shell: "bcftools view -Ob {input}/novoBreak.pass.flt.vcf > {output}" |
37 38 | shell: "bcftools concat -Ob {input}/split/*.sp.vcf > {output}" |
11 12 | wrapper: "0.22.0/bio/samtools/stats" |
24 25 | shell: "samtools depth {input} | gzip > {output}" |
20 21 22 23 24 | shell: "rm -rf {params.dir}; " "(configureStrelkaSomaticWorkflow.py {params.extra} --tumorBam {input.samples[0]} --normalBam {input.samples[1]} " "--referenceFasta {input.ref} --runDir {params.dir} --indelCandidates {input.manta}; " "{params.dir}/runWorkflow.py -m local -j {threads}) > {log} 2>&1" |
35 36 | wrapper: "0.22.0/bio/bcftools/concat" |
49 50 | wrapper: "0.22.0/bio/bcftools/view" |
8 9 | shell: "cairosvg {input} -o {output}" |
39 40 41 42 43 44 45 | shell: "bcftools view -Ou {input.calls} {params.chrom_prefix} | " "varlociraptor call variants {input.ref} " "{config[caller][varlociraptor][params]} {params.caller} " "tumor-normal {input.bams} " "--purity {params.purity} " "> {output} 2> {log}" |
55 56 | wrapper: "0.19.1/bio/bcftools/concat" |
66 67 | shell: "varlociraptor filter-calls posterior-odds positive --events SOMATIC_TUMOR < {input} > {output}" |
77 78 79 80 | shell: "varlociraptor filter-calls control-fdr {input} --events SOMATIC_TUMOR --var {wildcards.type} " "--minlen {wildcards.minlen} --maxlen {wildcards.maxlen} " "--fdr {wildcards.fdr} > {output}" |
92 93 | wrapper: "0.22.0/bio/bcftools/view" |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | from cyvcf2 import VCF, Writer import numpy as np def get_sample_name(tissue): ds = snakemake.config["runs"][snakemake.wildcards.run]["dataset"] return snakemake.config["datasets"][ds][tissue]["name"] tumor, normal = map(get_sample_name, ["tumor", "normal"]) bcf_in = VCF(snakemake.input.vcf) bcf_out = Writer(snakemake.output[0], bcf_in) for rec in bcf_in: if rec.FILTER: continue gt = rec.genotypes tumor_gt = gt[0][:2] normal_gt = gt[1][:2] if (np.any(tumor_gt) and not np.any(normal_gt) and not np.any(np.isnan(normal_gt))): # somatic variant bcf_out.write_record(rec) bcf_out.close() |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 | from common import load_variants import networkx as nx import pandas as pd import numpy as np vartype = snakemake.wildcards.vartype index_cols = ["CHROM", "POS", "SVLEN"] if vartype == "INS" or vartype == "DEL" else ["CHROM", "POS", "ALT"] all_variants = [load_variants(f, vartype=vartype) for f in snakemake.input.calls] G = nx.Graph() for calls, (i, j) in zip(all_variants, snakemake.params.dataset_combinations): calls["component"] = None for call in calls.itertuples(): a = (i, call.Index) G.add_node(a) if call.MATCHING >= 0: b = (j, call.MATCHING) G.add_node(b) G.add_edge(a, b) # get a set of calls for each dataset (we don't need all pairwise comparisons for that) representatives = {snakemake.params.dataset_combinations[i][0]: calls for i, calls in enumerate(all_variants)} if snakemake.wildcards.mode != "varlociraptor": varlociraptor_variants = [load_variants(f, vartype=vartype) for f in snakemake.input.varlociraptor_calls] for calls in varlociraptor_variants: calls.set_index(index_cols, inplace=True) varlociraptor_representatives = {snakemake.params.dataset_combinations[i][0]: calls for i, calls in enumerate(varlociraptor_variants)} # annotate calls with their component, i.e. their equivalence class for component_id, component in enumerate(nx.connected_components(G)): for i, k in component: representatives[i].loc[k, "component"] = component_id for calls in representatives.values(): calls["component"] = calls["component"].astype(np.float32) calls.set_index("component", inplace=True) # join calls based on their equivalence class aggregated = None suffix = "_{}".format dataset_name = lambda i: snakemake.params.datasets[i] is_varlociraptor = False for dataset_id, calls in representatives.items(): cols = list(index_cols) if "CASE_AF" in calls.columns: cols.extend(["CASE_AF", "PROB_SOMATIC_TUMOR"]) is_varlociraptor = True calls = calls[cols] if snakemake.wildcards.mode != "varlociraptor": idx_calls = calls.set_index(cols, drop=False) caseaf = idx_calls.join(varlociraptor_representatives[dataset_id][["CASE_AF"]], how="left")["CASE_AF"] caseaf = caseaf[~caseaf.index.duplicated()] calls = calls[~idx_calls.index.duplicated()] calls["CASE_AF"] = caseaf.values calls.columns = [c + suffix(dataset_name(dataset_id)) for c in calls.columns] if aggregated is None: aggregated = calls else: aggregated = aggregated.join(calls, how="outer", lsuffix="", rsuffix="") # Forget the component id. Otherwise, we might run into errors with duplicate elements # in the index below. These can occur if there are multiple ambiguous calls. aggregated.reset_index(inplace=True, drop=True) pos_cols = aggregated.columns[aggregated.columns.str.startswith("POS_")] is_called = (~aggregated[pos_cols].isnull()).astype(int) is_called.columns = pos_cols.str.replace("POS_", "") aggregated = aggregated.join(is_called, lsuffix="", rsuffix="") aggregated.insert(len(aggregated.columns), "concordance_count", is_called.sum(axis=1)) aggregated["max_case_af"] = aggregated[aggregated.columns[aggregated.columns.str.startswith("CASE_AF")]].max(axis=1) if is_varlociraptor: aggregated["max_prob_somatic_tumor"] = aggregated[aggregated.columns[aggregated.columns.str.startswith("PROB_SOMATIC")]].min(axis=1) aggregated.to_csv(snakemake.output[0], sep="\t", index=False) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 | from cyvcf2 import VCF, Writer import numpy as np def subclone_vaf(gt): """Calculate subclone allele frequency""" if np.all(gt[:2] == [1, 1]): return 1.0 elif (np.all(gt[:2] == [0, 1]) or np.all(gt[:2] == [1, 0]) or np.all(gt[:2] == [-1, 1]) or np.all(gt[:2] == [1, -1])): return 0.5 else: return 0.0 # Reader vcf_in = VCF(snakemake.input[0]) # Setup subclone information subclones = ["Som{}".format(i) for i in range(1, 5)] fractions = [1/3, 1/3, 1/4, 1/12] # Prepare writer vcf_in.add_info_to_header({"ID": "TAF", "Number": "1", "Description": "True tumor allele frequency", "Type": "Float"}) vcf_in.add_info_to_header({"ID": "NAF", "Number": "1", "Description": "True normal allele frequency", "Type": "Float"}) bcf_out = Writer(snakemake.output[0], vcf_in) for rec in vcf_in: if len(rec.ALT) > 1: raise ValueError("multiallelic sites are not supported at the moment") try: # get VAFs from VCF tumor_vaf = rec.INFO["TAF"] normal_vaf = rec.INFO["NAF"] except KeyError: # calculate VAFs subclone_idx = [vcf_in.samples.index(s) for s in subclones] control_idx = vcf_in.samples.index("Control") tumor_vaf = sum(fraction * subclone_vaf(rec.genotypes[idx]) for idx, fraction in zip(subclone_idx, fractions)) normal_vaf = subclone_vaf(rec.genotypes[control_idx]) rec.INFO["TAF"] = tumor_vaf rec.INFO["NAF"] = normal_vaf # only keep somatic variants if normal_vaf == 0.0 and tumor_vaf > 0.0: bcf_out.write_record(rec) bcf_out.close() |
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 | import sys import numpy as np import pandas as pd import pysam import matplotlib matplotlib.use("agg") import matplotlib.pyplot as plt import seaborn as sns from functools import partial tumor = pysam.AlignmentFile(snakemake.input[0], "rb") normal = pysam.AlignmentFile(snakemake.input[1], "rb") softclips = [] for i, rec in enumerate(normal): if rec.is_supplementary or rec.is_unmapped: continue is_first_read = rec.pos < rec.mpos get_clip = lambda c: c[1] if c[0] == 4 else None clip_left = get_clip(rec.cigartuples[0]) if clip_left is not None: softclips.append([clip_left, True, is_first_read]) clip_right = get_clip(rec.cigartuples[-1]) if clip_right is not None: softclips.append([clip_right, False, is_first_read]) if i == 10000000: break softclips = pd.DataFrame(softclips, columns=["len", "left", "first_in_pair"]) def plot(*args, **kwargs): softclips = args[0] plt.hist(softclips, normed=True) q95 = np.percentile(softclips, 99) plt.plot([q95, q95], [0, 1.0], "--k") m = max(softclips) plt.plot([m, m], [0, 1.0], ":k") plt.text(m, 1, "max={}".format(m), horizontalalignment="right", verticalalignment="top") g = sns.FacetGrid(softclips, col="left", row="first_in_pair") g = g.map(plot, "len") plt.savefig(snakemake.output[0]) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | import pandas as pd import numpy as np from common import load_variants minlen = int(snakemake.wildcards.minlen) maxlen = int(snakemake.wildcards.maxlen) vartype = snakemake.wildcards.vartype if snakemake.wildcards.mode == "varlociraptor": score = snakemake.config["caller"]["varlociraptor"]["score"] # calls are already filtered by FDR control step minlen = None maxlen = None elif snakemake.wildcards.mode == "default": score = snakemake.config["caller"][snakemake.wildcards.caller]["score"] else: score = None calls = load_variants(snakemake.input.calls, vartype=vartype, minlen=minlen, maxlen=maxlen) calls["is_tp"] = calls["MATCHING"] >= 0 calls["score"] = calls[score] if score else np.nan calls.to_csv(snakemake.output[0], sep="\t") |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 | from itertools import product import math import matplotlib matplotlib.use("agg") from matplotlib import pyplot as plt import seaborn as sns import pandas as pd import common import numpy as np MIN_CALLS = 10 vartype = snakemake.wildcards.vartype colors = common.get_colors(snakemake.config) truth = common.load_variants(snakemake.input.truth, vartype=vartype) def props(callers): return product(callers, snakemake.params.len_ranges) def plot_len_range(minlen, maxlen): def plot(calls, colors): calls = calls[calls.is_tp] true_af = truth.loc[calls.MATCHING].reset_index().TAF calls = calls.reset_index() calls["error"] = calls.CASE_AF - true_af if calls.empty: return calls["true_af"] = true_af true_af = pd.Series(calls["true_af"].unique()).sort_values() # standard deviation when sampling in binomial process from allele freq # this is the expected sampling error within the correctly mapped fragments # sd = true_af.apply(lambda af: 1 / 40 * math.sqrt(40 * af * (1 - af))) # x = np.arange(len(true_af)) # offsets = [-0.5, 0.5] # y_upper = np.array([v for v in sd for o in offsets]) # y_lower = np.maximum(-y_upper, [-f for f in true_af for o in offsets]) # plt.fill_between([v + o for v in x for o in offsets], y_lower, y_upper, color="#EEEEEE", zorder=-5) calls["true_af"] = calls["true_af"].apply("{:.3f}".format) size = 1 if maxlen == 30 else 2 sns.stripplot("true_af", "error", hue="caller", data=calls, palette=colors, dodge=True, jitter=True, alpha=0.5, size=size, rasterized=True) sns.boxplot("true_af", "error", hue="caller", data=calls, color="white", fliersize=0, linewidth=1) handles, labels = plt.gca().get_legend_handles_labels() n = len(calls.caller.unique()) plt.ylim((-1,1)) plt.grid(axis="y", linestyle=":", color="grey") sns.despine() plt.xticks(rotation="vertical") ax = plt.gca() ax.legend().remove() return ax, handles[n:] all_calls, all_colors = load_calls(minlen, maxlen) return plot(all_calls, all_colors) def load_calls(minlen, maxlen): all_calls = [] all_colors = [] for calls, (caller, len_range) in zip(snakemake.input.varlociraptor_calls, props(snakemake.params.varlociraptor_callers)): if len_range[0] != minlen and len_range[1] != maxlen: continue label = "varlociraptor+{}".format(caller) calls = pd.read_table(calls) calls["caller"] = label if not calls.empty: all_calls.append(calls) all_colors.append(colors[caller]) all_calls = pd.concat(all_calls) return all_calls, all_colors common.plot_ranges( snakemake.params.len_ranges, plot_len_range, xlabel="true allele frequency", ylabel="predicted - truth") plt.savefig(snakemake.output[0], bbox_inches="tight") |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 | from itertools import product import matplotlib matplotlib.use("agg") from matplotlib import pyplot as plt import seaborn as sns import pandas as pd import common import numpy as np import math from matplotlib.lines import Line2D MIN_CALLS = 10 vartype = snakemake.wildcards.vartype colors = common.get_colors(snakemake.config) def props(callers): return product(callers, snakemake.params.len_ranges) def plot_len_range(minlen, maxlen): truth = common.load_variants( snakemake.input.truth, minlen, maxlen, vartype=vartype) afs = pd.Series(truth.TAF.unique()).sort_values() def plot(calls, label, color, varlociraptor=True, style="-.", markersize=4): calls = pd.read_table(calls, index_col=0) if len(calls) < 10: return if varlociraptor: phred = lambda p: -10 * math.log10(p) def calc_recall(p): c = calls[calls.score <= phred(p)] return [common.recall(c, truth[truth.TAF >= af]) for af in afs] return plt.fill_between( afs, calc_recall(0.98 if maxlen > 30 else 0.99), calc_recall(0.9), color=color, label=label, alpha=0.6) else: recall = [common.recall(calls, truth[truth.TAF >= af]) for af in afs] # plot a white background first to increase visibility plt.plot(afs, recall, "-", color="white", alpha=0.8) return plt.plot( afs, recall, style, color=color, label=label)[0] handles = [] def register_handle(handle): if handle is not None: handles.append(handle) for calls, (caller, len_range) in zip(snakemake.input.varlociraptor_calls, props(snakemake.params.varlociraptor_callers)): if len_range[0] != minlen and len_range[1] != maxlen: continue label = "varlociraptor+{}".format(caller) handle = plot(calls, label, colors[caller], varlociraptor=True) register_handle(handle) #handles.append(Line2D([0], [0], color=colors[caller], label=label)) for calls, (caller, len_range) in zip(snakemake.input.adhoc_calls, props(snakemake.params.adhoc_callers)): if len_range[0] != minlen and len_range[1] != maxlen: continue color = colors[caller] handle = plot(calls, caller, color, style=":", varlociraptor=False) register_handle(handle) #handles.append(Line2D([0], [0], linestyle=":", color=color, label=caller)) sns.despine() ax = plt.gca() return ax, handles common.plot_ranges( snakemake.params.len_ranges, plot_len_range, xlabel="allele frequency", ylabel="recall") plt.savefig(snakemake.output[0], bbox_inches="tight") |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 | import math import matplotlib matplotlib.use("agg") from matplotlib import pyplot as plt import seaborn as sns import pandas as pd import common import numpy as np MIN_COUNT = 20 MAX_DEPTH = 60 vartype = snakemake.wildcards.vartype colors = common.get_colors(snakemake.config) truth = common.load_variants(snakemake.input.truth, vartype=vartype) all_calls = [] for caller, calls in zip(snakemake.params.callers, snakemake.input.calls): calls = pd.read_table(calls) calls.loc[:, "caller"] = caller all_calls.append(calls) all_calls = pd.concat(all_calls) def plot(af, _): constrain_lower = lambda error: np.maximum(error, -af) constrain_upper = lambda error: np.minimum(error, 1.0 - af) dp = all_calls["TUMOR_DP"] calls = all_calls[all_calls.is_tp] true_af = truth.loc[calls.MATCHING].reset_index().TAF calls = calls.reset_index() calls["true_af"] = true_af calls = calls[calls["true_af"] == af] calls["error"] = calls.CASE_AF - true_af sns.kdeplot(calls["TUMOR_DP"], calls["error"], cmap="Blues", n_levels=50, shade=True, alpha=0.7, shade_lowest=False) #alpha=0.5, clip=((0.0, 1.0), (0.0, af))) plt.plot(calls["TUMOR_DP"], calls["error"], ",", color="k", lw=0, alpha=1.0, rasterized=True) by_depth = calls.groupby("TUMOR_DP")["error"].describe().reset_index() by_depth["-std"] = constrain_lower(-by_depth["std"]) by_depth["std"] = constrain_upper(by_depth["std"]) by_depth = by_depth[by_depth["count"] >= MIN_COUNT] plt.plot(by_depth.TUMOR_DP, by_depth["std"], "--", color="k") plt.plot(by_depth.TUMOR_DP, by_depth["-std"], "--", color="k") plt.plot(by_depth.TUMOR_DP, by_depth["mean"], "-", color="k") depths = np.arange(0, MAX_DEPTH) # standard deviation when sampling in binomial process from allele freq # this is the expected sampling error within the correctly mapped fragments sd = np.array([1.0 / depth * math.sqrt(depth * af * (1.0 - af)) for depth in depths]) plt.fill_between(depths, constrain_lower(-sd), constrain_upper(sd), color="grey", alpha=0.5) sns.despine() plt.xticks(rotation="vertical") ax = plt.gca() ax.legend().remove() handles, labels = ax.get_legend_handles_labels() plt.ylim((-1.0, 1.0)) plt.xlim((0, MAX_DEPTH)) return ax, [] afs = [(af, af) for af in truth.TAF.sort_values().unique()] common.plot_ranges( afs, plot, "depth", "predicted - truth") plt.savefig(snakemake.output[0], bbox_inches="tight") |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 | from itertools import product import matplotlib matplotlib.use("agg") from matplotlib import pyplot as plt import seaborn as sns import pandas as pd import common import numpy as np import math from matplotlib.lines import Line2D from matplotlib.colors import to_rgba class NotEnoughObservationsException(Exception): pass MIN_CALLS = 20 MAX_LEN = 1000 vartype = snakemake.wildcards.vartype colors = common.get_colors(snakemake.config) varlociraptor_calls_low = [pd.read_table(f) for f in snakemake.input.varlociraptor_calls_low] varlociraptor_calls_high = [pd.read_table(f) for f in snakemake.input.varlociraptor_calls_high] adhoc_calls = [pd.read_table(f) for f in snakemake.input.adhoc_calls] def expected_count(af, effective_mutation_rate): """Calculate the expected number of somatic variants greater than a given allele frequency given an effective mutation rate, according to the model of Williams et al. Nature Genetics 2016""" return effective_mutation_rate * (1.0 / af - 1.0) def expected_counts(afs, effective_mutation_rate): return [expected_count(af, effective_mutation_rate) for af in afs] def calc_concordance(calls): n = len(calls) return (calls["concordance_count"] > 1).sum() / n def plot_len_range(minlen, maxlen, yfunc=None, yscale=None, upper_bound=None): handles_varlociraptor = [] handles_adhoc = [] for i, caller in enumerate(snakemake.params.callers): def plot_calls(calls, label, color, style, calls_lower=None): def get_xy(calls, caseafs=None): svlen = calls.loc[:, calls.columns.str.startswith("SVLEN")].abs() # at least one of the calls has a valid svlen valid = ((svlen >= minlen) & (svlen <= maxlen)).sum(axis=1) >= 1 calls = calls[valid] if caseafs is None: caseafs = calls["max_case_af"].dropna().unique() y = [] _caseafs = [] for caseaf in sorted(caseafs): _calls = calls[calls["max_case_af"] >= caseaf] if upper_bound is not None: _calls = _calls[_calls["max_case_af"] <= caseaf + upper_bound] if len(_calls) < MIN_CALLS: continue _caseafs.append(caseaf) y.append(yfunc(_calls)) return _caseafs, y x, y = get_xy(calls) if not x: raise NotEnoughObservationsException() if calls_lower is not None: _, y2 = get_xy(calls_lower, caseafs=x) return plt.fill_between(x, y, y2, label=label, edgecolor=color, facecolor=to_rgba(color, alpha=0.2)) else: if style != "-": plt.plot(x, y, "-", color="white", alpha=0.8) return plt.plot(x, y, style, label=label, color=color)[0] color = colors[snakemake.params.callers[i]] try: handles_varlociraptor.append( plot_calls( varlociraptor_calls_high[i], "varlociraptor+{}".format(caller), color=color, style="-", calls_lower=varlociraptor_calls_low[i])) except NotEnoughObservationsException: # skip plot pass try: handles_adhoc.append(plot_calls(adhoc_calls[i], caller, color=color, style=":")) except NotEnoughObservationsException: # skip plot pass handles = handles_varlociraptor + handles_adhoc sns.despine() ax = plt.gca() if yscale is not None: ax.set_yscale(yscale) return ax, handles plt.figure(figsize=(10, 4)) plt.subplot(121) plot_len_range(1, MAX_LEN, yfunc=calc_concordance) plt.xlabel("$\geq$ tumor allele frequency") plt.ylabel("concordance") plt.subplot(122) for effective_mutation_rate in 10 ** np.linspace(1, 5, 7): afs = np.linspace(0.0, 1.0, 100, endpoint=False) plt.semilogy(afs, expected_counts(afs, effective_mutation_rate), "-", color="grey", alpha=0.4) ax, handles = plot_len_range(1, MAX_LEN, yfunc=lambda calls: len(calls), yscale="log") plt.xlabel("$\geq$ tumor allele frequency") plt.ylabel("# of calls") ax.legend(handles=handles, loc="upper left", bbox_to_anchor=(1.0, 1.0)) plt.tight_layout() plt.savefig(snakemake.output[0], bbox_inches="tight") |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | library(UpSetR) library(plyr) library(ggplot2) library(grid) library(rlist) calls <- read.table(snakemake@input[[1]], row.names = NULL, sep = "\t", header = TRUE, check.names = FALSE) #print(dim(calls)) calls <- calls[calls$max_case_af >= 0.1, ] #calls <- calls[calls$max_prob_somatic_tumor < 0.07]_ #calls$max_case_af_log10 <- log10(calls$max_case_af) queries <- list() for(ds in snakemake@params[["datasets"]]) { queries <- list.append(queries, list(query=intersects, params=list(ds))) } svg(file = snakemake@output[[1]]) if(!empty(calls)) { upset(calls, order.by = "freq", sets = snakemake@params[["datasets"]], queries = queries ) } dev.off() |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 | from itertools import product import matplotlib matplotlib.use("agg") from matplotlib import pyplot as plt import seaborn as sns import pandas as pd import common import numpy as np MIN_CALLS = 100 colors = common.get_colors(snakemake.config) props = product(snakemake.params.callers, snakemake.params.len_ranges, snakemake.params.fdrs) calls = [] for _calls, (caller, len_range, fdr) in zip(snakemake.input.varlociraptor_calls, props): calls.append({"caller": caller, "len_range": len_range, "fdr": float(fdr), "calls": _calls}) calls = pd.DataFrame(calls) calls = calls.set_index("caller", drop=False) def plot_len_range(minlen, maxlen): def plot(caller): color = colors[caller] label = "varlociraptor+{}".format(caller) fdrs = [] alphas = [] calls_ = calls.loc[caller] calls_ = calls_[calls_["len_range"].map(lambda r: r == [minlen, maxlen])] calls_ = calls_.sort_values("fdr") for e in calls_.itertuples(): c = pd.read_table(e.calls) n = c.shape[0] if n < MIN_CALLS: continue true_fdr = 1.0 - common.precision(c) if fdrs and fdrs[-1] == true_fdr: continue fdrs.append(true_fdr) alphas.append(e.fdr) plt.plot(alphas, fdrs, ".-", color=color, label=label) for caller in calls.index.unique(): plot(caller) plt.plot([0, 1], [0, 1], ":", color="grey") sns.despine() ax = plt.gca() handles, _ = ax.get_legend_handles_labels() return ax, handles common.plot_ranges( snakemake.params.len_ranges, plot_len_range, xlabel="FDR threshold", ylabel="true FDR") plt.savefig(snakemake.output[0], bbox_inches="tight") |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | from pysam import VariantFile import seaborn as sns import matplotlib.pyplot as plt from itertools import islice import pandas as pd sns.set_style("ticks") calls = VariantFile(snakemake.input[0]) def freq(sample): ref, alt = sample.get("AD") if alt == 0: return 0 return alt / (ref + alt) normal_freqs = [] tumor_freqs = [] vartypes = [] print("collecting calls") for record in calls: if not record.info.get("SHARED"): continue normal_freqs.append(freq(record.samples.get("normal"))) tumor_freqs.append(freq(record.samples.get("tumor"))) vartypes.append(record.info.get("TYPE")) d = pd.DataFrame({"normal": normal_freqs, "tumor": tumor_freqs, "type": vartypes}) print("subsampling") # sample d to not get overwhelmed d = d.sample(10000, random_state=245746) print("plotting and density estimation") g = sns.FacetGrid(col="type", data=d) g.map(sns.kdeplot, "normal", "tumor", shade=True, clip=(0, 1), shade_lowest=False) g.map(sns.scatterplot, "normal", "tumor", size=1, alpha=0.5, marker=".") plt.savefig(snakemake.output[0], bbox_inches="tight") |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 | from itertools import product from functools import partial import matplotlib matplotlib.use("agg") from matplotlib import pyplot as plt import seaborn as sns import pandas as pd import common import numpy as np import math from matplotlib.lines import Line2D vartype = snakemake.wildcards.vartype colors = common.get_colors(snakemake.config) def props(callers): return product(callers, snakemake.params.len_ranges) def plot_len_range(minlen, maxlen, min_precision=0.0): truth = common.load_variants( snakemake.input.truth, minlen, maxlen, vartype=vartype) def plot(calls, label, color, line=True, style="-", invert=False, markersize=4, endmarker=False): calls = pd.read_table(calls, index_col=0) if len(calls) < 10: return if line: thresholds = calls.score.quantile(np.linspace(0.0, 1.0, 50)) precision = [] recall = [] for t in thresholds: if invert: c = calls[calls.score >= t] else: c = calls[calls.score <= t] p = common.precision(c) r = common.recall(c, truth) print(label, t, c.shape[0], p, r) if len(c) < 10: print("skipping threshold: too few calls", c) continue precision.append(p) recall.append(r) if len(precision) <= 2: print("skipping curve because we have too few values") return else: precision = [common.precision(calls)] recall = [common.recall(calls, truth)] style = "." print(label, calls.shape[0], precision, recall) plt.plot( recall, precision, style, color=color, label=label, markersize=markersize ) if endmarker: plt.plot(recall[-1], precision[-1], "s", color=color, markersize=markersize) handles = [] for calls, (caller, len_range) in zip(snakemake.input.varlociraptor_calls, props(snakemake.params.varlociraptor_callers)): if len_range[0] != minlen and len_range[1] != maxlen: continue label = "varlociraptor+{}".format(caller) plot(calls, label, colors[caller], endmarker=True) handles.append(Line2D([0], [0], color=colors[caller], label=label)) for calls, (caller, len_range) in zip(snakemake.input.default_calls, props(snakemake.params.default_callers)): if len_range[0] != minlen and len_range[1] != maxlen: continue color = colors[caller] plot( calls, caller, color, style=":", invert=snakemake.config["caller"][caller].get("invert", False)) if caller in snakemake.params.adhoc_callers: handles.append(Line2D([0], [0], markersize=10, markerfacecolor=color, markeredgecolor=color, color=color, label=caller, marker=".", linestyle=":")) else: handles.append(Line2D([0], [0], color=color, label=caller, linestyle=":")) for calls, (caller, len_range) in zip(snakemake.input.adhoc_calls, props(snakemake.params.adhoc_callers)): if len_range[0] != minlen and len_range[1] != maxlen: continue color = colors[caller] plot(calls, caller, color, markersize=10, line=False) if caller not in snakemake.params.default_callers: handles.append(Line2D([0], [0], markersize=10, markerfacecolor=color, markeredgecolor=color, label=caller, marker=".", lw=0)) sns.despine() ax = plt.gca() plt.ylim((min_precision, 1.01 if min_precision == 0.0 else 1.001)) return ax, handles plot = plot_len_range fig_height = None legend_outside = snakemake.params.legend_outside if snakemake.wildcards.zoom == "zoom": plot = partial(plot_len_range, min_precision=0.99 if vartype == "INS" else 0.95) fig_height = 3 legend_outside = True common.plot_ranges( snakemake.params.len_ranges, plot, xlabel="recall", ylabel="precision", fig_height=fig_height, legend_outside=legend_outside, ) plt.savefig(snakemake.output[0], bbox_inches="tight") |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 | from itertools import product import matplotlib matplotlib.use("agg") from matplotlib import pyplot as plt import seaborn as sns import pandas as pd import common import numpy as np import math vartype = snakemake.wildcards.vartype colors = common.get_colors(snakemake.config) def props(callers): return product(callers, snakemake.params.len_ranges) phred_to_log_factor = -0.23025850929940456 log_to_phred_factor = -4.3429448190325175 def plot_len_range(minlen, maxlen): for calls, (caller, len_range) in zip(snakemake.input.varlociraptor_calls, props(snakemake.params.varlociraptor_callers)): if len_range[0] != minlen and len_range[1] != maxlen: continue label = "varlociraptor+{}".format(caller) calls = pd.read_table(calls) calls["caller"] = label if not calls.empty: color = colors[caller] sns.kdeplot(calls[calls.is_tp].PROB_SOMATIC_TUMOR.map(np.log), color=color, label=label) sns.kdeplot(calls[~calls.is_tp].PROB_SOMATIC_TUMOR.map(np.log), color=color, linestyle=":", label="") ax = plt.gca() fmt_ticks = lambda ticks: ["{:.1g}".format(np.exp(t)) for t in ticks] ax.set_xticklabels(fmt_ticks(plt.xticks()[0])) ax.legend().remove() handles, _ = ax.get_legend_handles_labels() sns.despine() return ax, handles common.plot_ranges( snakemake.params.len_ranges, plot_len_range, xlabel=r"$-10 \log_{10}$ Pr(somatic)", ylabel="density") plt.savefig(snakemake.output[0], bbox_inches="tight") |
114 115 | shell: "bcftools index {input}" |
125 126 127 128 129 130 | run: calls = pd.read_table(input[0], header=[0, 1]) calls = calls["VARIANT"] calls = calls[(calls.MATCHING >= 0 if params.type else calls.MATCHING < 0)] calls.sort_values("PROB_SOMATIC_TUMOR", ascending=not params.type, inplace=True) calls[["CHROM", "POS", "PROB_SOMATIC_TUMOR", "MATCHING"]].to_csv(output[0], sep="\t") |
154 155 156 157 158 | shell: "bcftools view {input.bcf} {wildcards.varpos} > {output.vcf}; " "samtools view -b {input.bams[0]} {params.region} > {output.tumor}; " "samtools view -b {input.bams[1]} {params.region} > {output.normal}; " "samtools index {output.tumor}; samtools index {output.normal}" |
Support
- Future updates
Related Workflows





