A computational method to generate causal explanations for proteomic profiles using prior mechanistic knowledge in the literature, as recorded in cellular pathway maps.

public 1yr ago 0 bookmarks

View Workflow

Help improve this workflow!

This workflow has been published but could be further improved with some additional meta data:

Keyword(s) in categories input, output, operation, topic

You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .

This is a tool for pathway analysis of proteomic and phosphoproteomic datasets. CausalPath aims to identify mechanistic pathway relations that can explain observed correlations in experiments

Additional information about CausalPath can be found @ https://github.com/PathwayAndDataAnalysis/causalpath

A work-in-progress manuscript describing this method is available here .

Usage

Step 1: Install workflow

If you simply want to use this workflow, download and extract the latest release . If you intend to modify and further develop this workflow, fork this reposity. Please consider providing any generally applicable modifications via a pull request.

In any case, if you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this repository and, once available, its DOI.

Step 2: Configure workflow

Configure the workflow according to your needs via editing the file omic_config.yaml .

Step 3: Execute workflow

All you need to execute this workflow is to install Snakemake via the Conda package manager . Software needed by this workflow is automatically deployed into isolated environments by Snakemake.

Test your configuration by performing a dry-run via

snakemake --use-conda -n

Execute the workflow locally via

snakemake --use-conda --cores $N

using $N cores. Alternatively, it can be run in cluster or cloud environments (see the docs for details).

If you not only want to fix the software stack but also the underlying OS, use

snakemake --use-conda --use-singularity

in combination with any of the modes above.

Step 4: Investigate results

After successful execution, you can create a self-contained report with all results via:

snakemake --report report.html

Code Snippets

script:
    "../scripts/partition_data.py"        

SnakeMake From line 16 of rules/causal.smk

script:
    "../scripts/partition_data_causal.py"

SnakeMake From line 32 of rules/causal.smk

shell:
    "java -jar resources/causalpath/target/causalpath.jar results/{wildcards.transform}/{wildcards.type}/{wildcards.cond}"

SnakeMake From line 42 of rules/causal.smk

shell:
    "java -jar resources/causalpath/target/causalpath.jar results/correlation/{wildcards.condition}"

SnakeMake From line 52 of rules/causal.smk

from utils import ensure_dir, generate_data_files, generate_data_files_causal, generate_proteomics_data, generate_parameter_file
import pandas as pd
import os
from itertools import combinations

meta_file = snakemake.params.meta
meta = pd.read_csv(meta_file,sep='\t',index_col=0)
meta = meta.astype(str)

condition_id = snakemake.params.condition
permutations = snakemake.params.permutations
fdr = snakemake.params.fdr
site_match = snakemake.params.site_match
site_effect = snakemake.params.site_effect

phospho_prot_file = snakemake.params.phospho_prot
phospho_prot = pd.read_csv(phospho_prot_file,sep='\t')

correlation, cond = snakemake.output[0].split('/')[1:-1]
causal_relnm = os.path.join(*[os.getcwd(),'results', 'correlation', cond])
ensure_dir(causal_relnm)
kwargs = {condition_id:list(map(str,[cond]))}
print(kwargs)

sub_data, baseline, contrast = generate_data_files_causal(phospho_prot, meta, condition_id, **kwargs)
generate_proteomics_data(sub_data, causal_relnm)
generate_parameter_file(relnm=causal_relnm, test_samps=contrast, control_samps=baseline, value_transformation='correlation', fdr_threshold=fdr, site_match=site_match, site_effect=site_effect, permutations=permutations,ctype='correlation')

Python Pandas utils From line 1 of scripts/partition_data_causal.py

from utils import ensure_dir, generate_data_files, generate_proteomics_data, generate_parameter_file, generate_rna_data
import pandas as pd
import os
from itertools import combinations

meta_file = snakemake.params.meta
meta = pd.read_csv(meta_file,sep='\t',index_col=0)
meta = meta.astype(str)


phospho_prot_file = snakemake.params.phospho_prot
phospho_prot = pd.read_csv(phospho_prot_file,sep='\t').drop_duplicates()
phospho_prot.ID = phospho_prot.ID.str.upper()

condition_id = snakemake.params.condition
permutations = snakemake.params.permutations
fdr = snakemake.params.fdr
site_match = snakemake.params.site_match
site_effect = snakemake.params.site_effect
ds_thresh = snakemake.params.ds_thresh
rna_file = snakemake.params.rna_file

transform, ctype, cond = snakemake.output[0].split('/')[1:-1]
relnm = os.path.join(*[os.getcwd(),'results',transform, ctype, cond])
ensure_dir(relnm)
kwargs = {condition_id:list(map(str,cond.split('_')))}
sub_data, baseline, contrast = generate_data_files(phospho_prot, meta, condition_id, **kwargs)

generate_proteomics_data(sub_data, relnm)

if rna_file != None:
    print('Incorporating RNAseq into causal relations')

    rna_frame = pd.read_csv(rna_file,sep='\t',index_col=0)
    print('total RNAseq expression matrix of shape {},{}'.format(rna_frame.shape[0],rna_frame.shape[1]))
    print(rna_frame.head())
    sub_rna = rna_frame.reindex(sub_data.columns,axis=1).iloc[:,3:]
    print(sub_rna.head())
    generate_rna_data(sub_rna, relnm)

generate_parameter_file(ds_thresh=ds_thresh, relnm=relnm, test_samps=contrast, control_samps=baseline, ctype=ctype, value_transformation=transform, fdr_threshold=fdr, site_match=site_match, site_effect=site_effect, permutations=permutations)