A computational method to generate causal explanations for proteomic profiles using prior mechanistic knowledge in the literature, as recorded in cellular pathway maps.
Help improve this workflow!
This workflow has been published but could be further improved with some additional meta data:- Keyword(s) in categories input, output, operation, topic
You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .
This is a tool for pathway analysis of proteomic and phosphoproteomic datasets. CausalPath aims to identify mechanistic pathway relations that can explain observed correlations in experiments
Additional information about CausalPath can be found @ https://github.com/PathwayAndDataAnalysis/causalpath
A work-in-progress manuscript describing this method is available here .
Usage
Step 1: Install workflow
If you simply want to use this workflow, download and extract the latest release . If you intend to modify and further develop this workflow, fork this reposity. Please consider providing any generally applicable modifications via a pull request.
In any case, if you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this repository and, once available, its DOI.
Step 2: Configure workflow
Configure the workflow according to your needs via editing the file
omic_config.yaml
.
Step 3: Execute workflow
All you need to execute this workflow is to install Snakemake via the Conda package manager . Software needed by this workflow is automatically deployed into isolated environments by Snakemake.
Test your configuration by performing a dry-run via
snakemake --use-conda -n
Execute the workflow locally via
snakemake --use-conda --cores $N
using
$N
cores. Alternatively, it can be run in cluster or cloud environments (see
the docs
for details).
If you not only want to fix the software stack but also the underlying OS, use
snakemake --use-conda --use-singularity
in combination with any of the modes above.
Step 4: Investigate results
After successful execution, you can create a self-contained report with all results via:
snakemake --report report.html
Code Snippets
16 17 | script: "../scripts/partition_data.py" |
32 33 | script: "../scripts/partition_data_causal.py" |
42 43 | shell: "java -jar resources/causalpath/target/causalpath.jar results/{wildcards.transform}/{wildcards.type}/{wildcards.cond}" |
52 53 | shell: "java -jar resources/causalpath/target/causalpath.jar results/correlation/{wildcards.condition}" |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | from utils import ensure_dir, generate_data_files, generate_data_files_causal, generate_proteomics_data, generate_parameter_file import pandas as pd import os from itertools import combinations meta_file = snakemake.params.meta meta = pd.read_csv(meta_file,sep='\t',index_col=0) meta = meta.astype(str) condition_id = snakemake.params.condition permutations = snakemake.params.permutations fdr = snakemake.params.fdr site_match = snakemake.params.site_match site_effect = snakemake.params.site_effect phospho_prot_file = snakemake.params.phospho_prot phospho_prot = pd.read_csv(phospho_prot_file,sep='\t') correlation, cond = snakemake.output[0].split('/')[1:-1] causal_relnm = os.path.join(*[os.getcwd(),'results', 'correlation', cond]) ensure_dir(causal_relnm) kwargs = {condition_id:list(map(str,[cond]))} print(kwargs) sub_data, baseline, contrast = generate_data_files_causal(phospho_prot, meta, condition_id, **kwargs) generate_proteomics_data(sub_data, causal_relnm) generate_parameter_file(relnm=causal_relnm, test_samps=contrast, control_samps=baseline, value_transformation='correlation', fdr_threshold=fdr, site_match=site_match, site_effect=site_effect, permutations=permutations,ctype='correlation') |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | from utils import ensure_dir, generate_data_files, generate_proteomics_data, generate_parameter_file, generate_rna_data import pandas as pd import os from itertools import combinations meta_file = snakemake.params.meta meta = pd.read_csv(meta_file,sep='\t',index_col=0) meta = meta.astype(str) phospho_prot_file = snakemake.params.phospho_prot phospho_prot = pd.read_csv(phospho_prot_file,sep='\t').drop_duplicates() phospho_prot.ID = phospho_prot.ID.str.upper() condition_id = snakemake.params.condition permutations = snakemake.params.permutations fdr = snakemake.params.fdr site_match = snakemake.params.site_match site_effect = snakemake.params.site_effect ds_thresh = snakemake.params.ds_thresh rna_file = snakemake.params.rna_file transform, ctype, cond = snakemake.output[0].split('/')[1:-1] relnm = os.path.join(*[os.getcwd(),'results',transform, ctype, cond]) ensure_dir(relnm) kwargs = {condition_id:list(map(str,cond.split('_')))} sub_data, baseline, contrast = generate_data_files(phospho_prot, meta, condition_id, **kwargs) generate_proteomics_data(sub_data, relnm) if rna_file != None: print('Incorporating RNAseq into causal relations') rna_frame = pd.read_csv(rna_file,sep='\t',index_col=0) print('total RNAseq expression matrix of shape {},{}'.format(rna_frame.shape[0],rna_frame.shape[1])) print(rna_frame.head()) sub_rna = rna_frame.reindex(sub_data.columns,axis=1).iloc[:,3:] print(sub_rna.head()) generate_rna_data(sub_rna, relnm) generate_parameter_file(ds_thresh=ds_thresh, relnm=relnm, test_samps=contrast, control_samps=baseline, ctype=ctype, value_transformation=transform, fdr_threshold=fdr, site_match=site_match, site_effect=site_effect, permutations=permutations) |
Support
- Future updates
Related Workflows





