Help improve this workflow!
This workflow has been published but could be further improved with some additional meta data:- Keyword(s) in categories input, output
You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .
Use snakefile as main pipeline and defines paths for files and directives
Conda environment
Run pipeline in Conda specified env
-
Install the conda environment found under
envs/rbio.yaml
(conda has to be installed on the system):
conda env create -f envs/rbio.yaml
- Activate env after installation
conda activate rbio
-
Start R terminal
-
Install 1) devtools + MatrixEQTL and 2) annotation template
# matrixEQTL
install.packages("devtools")
devtools::install_github("andreyshabalin/MatrixEQTL")
# annotation package
BiocManager::install("IlluminaHumanMethylationEPICanno.ilm10b2.hg19")
Configuration
You may change the thresholds and cut-offs at
configs/workflow.json
-
maf
: the minor allele frequency threshold
Converting impute2 genotypes to dosages.
-
post_maf
: the minor allele frequency threshold to be applied after matrixEQTL has been run.
Useful to define different summaries.
-
cisDist
: the maximal distance between SNP-CpGs pairs until which it counts as a 'cis-pairs'.
Cis-pairs will be reported in a separate file.
-
pvThresholdCis
: The P value threshold to be applied for cis-meQTLs.
Only associations with P value < pvThresholdCis will be reported
-
pvThresholdTrans
: The p-value threshold to be applied for trans-meQTLs.
Only associations with P value < pvThresholdTrans will be reported
-
significant_pv_cutoff
: For final results files.
P value significant threshold e.g.: 1e-14
Execution
Execute
snakemake
in the project directory to run the full pipeline
Manually call the
all_dosage_to_matrixEQTL
and the
all_methylation_to_matrixEQTL
rules
Run snakemake with parallel task processing
./scripts/run_pipeline.sh <num-parallel-tasks>
Configure the script above to your VM capacity with:
[scripts/run_pipeline.sh](scripts/run_pipeline.sh)
Covariates
meth ~ geno + sex + age + bmi + wbc + PCs1..20 + houseman estimates (except gran) + plate + batch
Code Snippets
38 39 40 41 | shell: """ cat {input.snp_pos} > {output} """ |
72 73 74 75 76 77 | shell: """ zcat {input} | ./scripts/impute2dosage.py -m {params.maf} |\ sed -E 's/^[a-zA-Z0-9-]+\t/{wildcards.chr}\t/' |\ bgzip -c > {output} """ |
101 102 | script: "scripts/annotate_snps_maf.R" |
127 128 | script: "scripts/prepare_covariates.R" |
145 146 | script: "scripts/methylation_to_matrixEQTL.R" |
156 157 158 159 | shell: """ cat {input.cpg_pos} | grep -v "start" > {output} """ |
181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 | shell: """ echo "Extracting SNP ids." zcat {input.dosage_file} | cut -f 2 > {output.snp_ids} echo "Getting SNP locations" zcat {input.dosage_file} | cut -d "\t" -f 2,3 | \ awk '{{OFS="\t"}} {{print $1,"{wildcards.chr}",$2}}' >> \ {output.snp_positions} echo "Remove additional columns and order by given ordering, add SNP ids" key=$(cat {input.ordering} | tr '\n' ':' | sed 's/.$//') zcat {input.dosage_file} | cut -f 1,2,3,4,5 --complement | \ awk -v key="$key" -f scripts/reorder.awk | \ paste -d "\t" {output.snp_ids} - > {output.genotypes_matrixEQTL} echo "Splitting matrixEQTL input files into chunks" split -l {params.chunk_size} {output.genotypes_matrixEQTL} {params.geno_split_prefix} split -l {params.chunk_size} {output.snp_positions} {params.snp_pos_split_prefix} """ |
229 230 | script: "scripts/run_matrixEQTL.R" |
264 265 266 267 268 269 270 271 272 273 274 | shell: """ # reset the matrixEQTL header, change 'gene' to 'cpg' header='SNP\tcpg\tbeta\tt-stat\tp-value' pattern={params.input_dir}{wildcards.chr} (echo -e ${{header}} ; awk '{{if($5+0 <= {params.pv_cutoff}) print;}}' ${{pattern}}_*_cis_associations.out) > {output.cis} (echo -e ${{header}} ; awk '{{if($5+0 <= {params.pv_cutoff}) print;}}' ${{pattern}}_*_trans_associations.out) > {output.trans} """ |
283 284 285 286 287 | shell: """ cat {input.cis} > {output.cis} cat {input.trans} > {output.trans} """ |
301 302 303 304 305 306 307 308 | shell: """ header="SNP\tcpg\tbeta\tt-stat\tp-value" (echo -e ${{header}} ; fgrep -w -f {input.st1} {input.cis} ) > {output.cis_in_st1} (echo -e ${{header}} ; fgrep -w -f {input.st1} {input.trans} ) > {output.trans_in_st1} (echo -e ${{header}} ; grep -h -v SNP {output.cis_in_st1} {output.trans_in_st1} ) > {output.combined} """ |
322 323 324 325 326 327 328 | shell: """ header="SNP\tcpg\tbeta\tt-stat\tp-value" (echo -e ${{header}} ; awk '{{ if($5+0 <= {params.pv_cutoff} ) print; }}' {input.cis} {input.trans} ) > {output.combined} """ |
337 338 339 340 341 342 | shell: """ cut -f 2 {input.common_probes} > {output.temp_probes} header="SNP\tcpg\beta\t-stat\p-value" ( echo -e ${{header}} ; fgrep -v -w -f {output.temp_probes} {input.combined} ) > {output.epic_specific} """ |
368 369 | script: "scripts/create_full_result_table.R" |
381 382 | script: "scripts/extract_common_probes.R" |
409 410 | script: "scripts/compare_450k_to_EPIC.R" |
429 430 431 432 433 434 435 | shell: """ fgrep -w -h -f {input.snps} {params.geno_pattern} > {output.geno} fgrep -w -h -f {input.snps} {params.snp_pos_pattern} > {output.snp_pos} fgrep -w -h -f {input.cpgs} {params.meth_pattern} > {output.meth} fgrep -w -h -f {input.cpgs} {params.cpg_pos_pattern} > {output.cpg_pos} """ |
462 463 | script: "scripts/run_matrixEQTL.R" |
479 480 481 482 483 484 485 | shell: """ (head -n 1 {input.cis} ; \ fgrep -w -f {input.cis_pairs} {input.cis} ; \ fgrep -w -f {input.longrange_pairs} {input.longrange} ; \ fgrep -w -f {input.trans_pairs} {input.trans} ) > {output} """ |
512 513 | script: "scripts/create_full_result_table.R" |
533 534 | script: "scripts/create_effect_comparison_plot.R" |
545 546 547 548 549 550 | shell: """ zcat {input} | sort -k1,1 -k3,3n | bgzip > {output} # index using tabix, 1: chr; 3: pos tabix -s 1 -b 3 -e 3 {output} """ |
567 568 | script: "scripts/residualize_methylation.R" |
593 594 | script: "scripts/match_epic_with_sentinels.R" |
609 610 611 612 613 614 615 616 | shell: """ fgrep -w -h -f {input.cosmo_pairs} {input.epic_associations_cis} {input.epic_associations_trans} > {output.matched} cut -f 1,2 {output.matched} > {output.matched_pairs} fgrep -w -v -f {output.matched_pairs} {input.cosmo_pairs} > {output.missing} """ |
Support
- Future updates
Related Workflows





