Reproducible Gene-Level Association Studies Workflow for Multi-Omic Analysis
Help improve this workflow!
This workflow has been published but could be further improved with some additional meta data:- Keyword(s) in categories input, output, operation
You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .
This repository contains an entire end-to-end workflow to reproduce gene-level association studies described in:
Zhou, D., Jiang, Y., Zhong, X. et al. A unified framework for joint-tissue transcriptome-wide association and Mendelian randomization analysis. Nat Genet 52 (2020). https://doi.org/10.1038/s41588-020-0706-2
The workflow is described in and executed by Snakemake , a management system for scalable reproducible analysis pipelines.
Requirements
-
Python 3.9
-
conda
- https://docs.conda.io/en/latest/ -
R>=4.0
includingtidyverse
andbookdown
packages (the latter for creatingsummary.pdf
)
Running workflow
Clone this repository to reproduce associations and generate
summary.pdf
. Associations results are generated by
SPrediXcan.py
and saved under
results/{ukbb_id}-{model}_{tissue}.csv
.
$ git clone https://github.com/manzt/zhou-et-al-natgen-2020 && cd zhou-et-al-natgen-2020
$ conda env create --file environment.yml
$ snakemake --cores all # downloads all data, runs associations, and generates `summary.pdf`
Running the complete workflow will download and organize all the input data necessary to reproduce the gene-level association results described in Table S7,
-
GWAS summary statistics from UKBB
-
Pretrained Transcriptome Prediction Model databases (JTI, PrediXcan, UTMOST)
-
Covariance matrices of SNps within each gene model
The data are organized in the following directory structure, and associations are generated using
SPrediXcan.py
from
MetaXcan
.
data/
├── covariances/
├── GWAS/
├── weights/
└── supplementary_tables.xlsx
Individual associations
Running all the associations can take some time. If you are interested in running individual associations, you may run
SPrediXcan.py
for individual GWAS summary statistics / model-tissue -specific weights explicitly by matching the following pattern with a snakmake wildcard
results/{ukbb_id}-{model}_{tissue}.csv
.
For example,
$ snakemake --cores all results/30740_irnt-UTMOST_Muscle_Skeletal.csv
will only perform the steps necessary to produce the gene-level association output for Glucose (
37040_irnt
) using the pretrained
UTMOST
weights for Muscle Skeletal tissue.
Code Snippets
32 33 34 35 36 37 38 39 | shell: """ cd notebooks Rscript -e "bookdown::render_book('index.Rmd', 'bookdown::pdf_book')" cd .. mv notebooks/_book/_main.pdf {output} rmdir notebooks/_book """ |
44 45 46 47 48 49 | shell: """ wget https://github.com/hakyimlab/MetaXcan/zipball/{} -O tmp.zip unzip tmp.zip rm tmp.zip """.format(METAXCAN_HASH) |
53 | shell: "wget https://zenodo.org/record/3842289/files/{wildcards.sample}.db -O {output}" |
57 | shell: "wget https://zenodo.org/record/3842289/files/{wildcards.sample}.txt.gz -O {output}" |
63 | shell: "paste <(cut -f 2- {input}) <(wget -qO - {params} | gunzip) > {output}" |
67 | shell: "wget https://broad-ukb-sumstats-us-east-1.s3.amazonaws.com/round2/annotations/variants.tsv.bgz -O {output}" |
72 | shell: "wget https://static-content.springer.com/esm/art%3A10.1038%2Fs41588-020-0706-2/MediaObjects/41588_2020_706_MOESM3_ESM.xlsx -O {output}" |
77 78 79 80 81 | shell: """ gunzip -c {input} | head -n 1 | cut -f -6 > {output} || true sort <(gunzip -c {input} | cut -f -6 | sed 1d) >> {output} || true """ |
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 | shell: """ python {input.SPrediXcan} --model_db_path {input.model_db} \ --covariance {input.covariance} \ --gwas_file {input.gwas_file} \ --snp_column rsid \ --effect_allele_column alt \ --non_effect_allele_column ref \ --chromosome_column chr \ --position_column pos \ --beta_column beta \ --se_column se \ --pvalue_column pval \ --freq_column minor_AF \ --output_file {output} \ """ |
Support
- Future updates
Related Workflows





