OpenEBench TCGA Cancer Driver Genes benchmarking workflow

public 1yr ago Version: Version 4 0 bookmarks

View Workflow

openebench-tcga-cancer-driver-genes-benchmarking-w — View Workflow

openebench-tcga-cancer-driver-genes-benchmarking-w — View Workflow

Description

The workflow takes an input file with Cancer Driver Genes predictions (i.e. the results provided by a participant), computes a set of metrics, and compares them against the data currently stored in OpenEBench within the TCGA community. Two assessment metrics are provided for that predictions. Also, some plots (which are optional) that allow to visualize the performance of the tool are generated. The workflow consists in three standard steps, defined by OpenEBench. The tools needed to run these steps are containerised in three Docker images, whose recipes are available in the TCGA_benchmarking_dockers repository and the images are stored in the INB GitLab container registry . Separated instances are spawned from these images for each step:

Validation : the input file format is checked and, if required, the content of the file is validated (e.g check whether the submitted gene IDs exist)
Metrics Generation : the predictions are compared with the 'Gold Standards' provided by the community, which results in two performance metrics - precision (Positive Predictive Value) and recall(True Positive Rate).
Consolidation : the benchmark itself is performed by merging the tool metrics with the rest of TCGA data. The results are provided in JSON format and SVG format (scatter plot).

Code Snippets

"""
python /app/validation.py -i $input_file -r $ref_dir -com $community_id -c $cancer_types -p $tool_name -o validation.json
"""

NextFlow From line 104 of 1.0.8/main.nf

"""
python /app/compute_metrics.py -i $input_file -c $cancer_types -m $gold_standards_dir -p $tool_name -com $community_id -o assessment.json
"""

NextFlow From line 130 of 1.0.8/main.nf

"""
cp -Lpr $benchmark_data augmented_benchmark_data
python /app/manage_assessment_data.py -b augmented_benchmark_data -p $assessment_out -o aggregation_dir
python /app/merge_data_model_files.py -p $validation_out -m $assessment_out -a aggregation_dir -o data_model_export.json
"""

NextFlow From line 153 of 1.0.8/main.nf

ShowHide 3 more snippets with no or duplicated tags.

Comments

Login to post a comment if you would like to share your experience with this workflow.

Support

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Share:

Created: 1yr ago

Updated: 1yr ago

Maitainers: public

URL: https://github.com/inab/TCGA_benchmarking_workflow/tree/1.0.8

Name: openebench-tcga-cancer-driver-genes-benchmarking-w

Version: Version 4

Badge:

Insert copied code into your website to add a link to this workflow.

Copyright: Public Domain

License: GNU Affero General Public License v3.0

Keywords:

JSON Gene ID Protein geometry validation Map data Nextflow WorkflowHub gene similarity analysis

Refs:

Future updates

Related Workflows

psychip_snakemake — Show Details View Workflow

ENCODE pipeline for histone marks developed for the psychENCODE project

public

psychip pipeline is an improved version of the ENCODE pipeline for histone marks developed for the psychENCODE project. The o...

raw sequence reads Alignment Sequence alignment report macs2 ucsc-bedclip bedGraphToBigWig BEDTools BWA Picard SAMtools Snakemake

Free

74

ncov_2 — Show Details View Workflow

Near-real time tracking of SARS-CoV-2 in Connecticut

public

Repository containing scripts to perform near-real time tracking of SARS-CoV-2 in Connecticut using genomic data. This pipeli...

JSON nextclade Augur Biopython FOCUS Pandas Snakemake bs4 epiweeks geopy matplotlib numpy pycountry pycountry-convert uszipcode

Free

28

cellranger-snakemake-gke — Show Details View Workflow

snakemake workflow to run cellranger on a given bucket using gke.

public

A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...

macs2 ucsc-bedclip bedGraphToBigWig BEDTools BWA Picard SAMtools Snakemake

Free

74

atlas — Show Details View Workflow

ATLAS - Three commands to start analyzing your metagenome data

public

Metagenome-atlas is a easy-to-use metagenomic pipeline based on snakemake. It handles all steps from QC, Assembly, Binning, t...

raw sequence reads Genome assembly Annotation track checkm2 gunc prodigal snakemake-wrapper-utils MEGAHIT Atlas BBMap Biopython BioRuby Bwa-mem2 cd-hit CheckM DAS Diamond eggNOG-mapper v2 MetaBAT 2 Minimap2 MMseqs MultiQC Pandas Picard pyfastx SAMtools SemiBin Snakemake SPAdes SqueezeMeta TADpole VAMB CONCOCT ete3 gtdbtk h5py networkx numpy plotly psutil utils metagenomics

Free

175

rna-seq-star-deseq2 — Show Details View Workflow

RNA-seq workflow using STAR and DESeq2

public

This workflow performs a differential gene expression analysis with STAR and Deseq2. The usage of this workflow is described ...

Free

41

dna-seq-gatk-variant-calling — Show Details View Workflow

This Snakemake pipeline implements the GATK best-practices workflow

public

This Snakemake pipeline implements the GATK best-practices workflow for calling small germline variants. The usage of thi...

VCF raw sequence reads Variant calling genetic variants gatk rust-bio-tools snakemake-wrapper-utils tabix BCFtools BWA FastQC MultiQC Pandas Picard SAMtools Snakemake Trimmomatic Variant Effect Predictor (VEP) common matplotlib numpy seaborn DNA

Free

57