Snakemake Workflow for Autoimmune Disease Data Extraction from GWAS and PGS Catalogs

public 1yr ago 0 bookmarks

View Workflow

Help improve this workflow!

This workflow has been published but could be further improved with some additional meta data:

Keyword(s) in categories input, output, operation

You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .

This repository contains a snakemake workflow that has been used to extract autoimmune disease-related data from the GWAS and PGS catalog according to experimental factor IDs. This allows regenerating Suppl. Tables and summary figures reported in:

Rochi Saurabh, Cesaire Fouodo, Inke R. König, Hauke Busch and Inken Wohlers.
A survey of genome-wide association studies (GWAS), polygenic scores (PGS) and UK Biobank (UKB) highlights resources for autoimmune disease genetics.

Code Snippets

shell: "head -n 1 {input} > {output}"

SnakeMake From line 40 of main/Snakefile

run:
    efo_ids = []
    with open(input[0],"r") as f_in:
        for line in f_in:
            efo_ids.append(line.strip("\n"))
    with open(input[1],"r") as f_in, open(output[0],"w") as f_out:
        for line in f_in:
             s = line.split("\t")
             mapped_trait = s[35]
             if (not ',' in mapped_trait and mapped_trait.split("/")[-1] in efo_ids) or line[:10] == "DATE ADDED":
                f_out.write(line)

SnakeMake From line 50 of main/Snakefile

shell: "cat {input} | cut -f 36 | sort | uniq -c | sort -k1 -n -r > {output}"

SnakeMake From line 66 of main/Snakefile

run: 
    with open(input[0],"r") as f_in, open(output[0],"w") as f_out:
        for line in f_in:
             s = line.split("\t")
             mapped_trait = s[35]
             if mapped_trait.split("/")[-1] == wildcards.efo_id or line[:10] == "DATE ADDED":
                f_out.write(line)    

SnakeMake From line 72 of main/Snakefile

shell: "cat {input} | cut -f 37 | sort | uniq -c | sort -k1 -n -r > {output}"

SnakeMake From line 88 of main/Snakefile

shell: "cat {input} | cut -f 24 | sort | uniq -c | sort -k1 -n -r > {output}"

SnakeMake From line 95 of main/Snakefile

shell: "cat {input} | cut -f 22 | sort | uniq -c | sort -k1 -n -r > {output}"

SnakeMake From line 102 of main/Snakefile

shell: "cat {input} | cut -f 14 | sort | uniq -c | sort -k1 -n -r > {output}"

SnakeMake From line 109 of main/Snakefile

run:
    efo_ids = []
    with open(input[0],"r") as f_in:
        for line in f_in:
            efo_ids.append(line.strip("\n"))
    with open(input[1],"r") as f_in, open(output[0],"w") as f_out:
        for line in f_in:
             s = line.split("\t")
             mapped_trait = s[13]
             if (not ',' in mapped_trait and mapped_trait.split("/")[-1] in efo_ids) or line[:10] == "DATE ADDED":
                f_out.write(line)

SnakeMake From line 123 of main/Snakefile

run: 
    with open(input[0],"r") as f_in, open(output[0],"w") as f_out:
        for line in f_in:
             s = line.split("\t")
             mapped_trait = s[13]
             if mapped_trait.split("/")[-1] == wildcards.efo_id or line[:10] == "DATE ADDED":
                f_out.write(line)

SnakeMake From line 139 of main/Snakefile

shell: "cat {input} | cut -f 15 | sort | uniq -c | sort -k1 -n -r > {output}"

SnakeMake From line 152 of main/Snakefile

run:
    efo_ids = []
    with open(input[0],"r") as f_in:
        for line in f_in:
            efo_ids.append(line.strip("\n"))
    with open(input[1],"r") as f_in, open(output[0],"w") as f_out:
        for line in f_in:
             s = line.split(",")
             pgs_efo = s[4]
             if (not "|" in pgs_efo and pgs_efo in efo_ids) or line[:15] == "Polygenic Score":
                f_out.write(line)    

SnakeMake From line 162 of main/Snakefile

run: 
    with open(input[0],"r") as f_in, open(output[0],"w") as f_out:
        for line in f_in:
             s = line.split(",")
             mapped_trait = s[4]
             if mapped_trait == wildcards.efo_id or line[:15] == "Polygenic Score":
                f_out.write(line)   

SnakeMake From line 181 of main/Snakefile

shell: "cat {input} | grep -v 'Polygenic Score (PGS) ID' | cut -f 1 -d ',' | sort | uniq  > {output} & true"

SnakeMake From line 192 of main/Snakefile

shell: "cat {input} | grep -v 'Polygenic Score (PGS) ID' | cut -f 12 -d ',' | sort | uniq  > {output} & true"

SnakeMake From line 197 of main/Snakefile

shell: "cat {input} | grep -v 'PGS Performance Metric (PPM) ID' | cut -f 4 -d ',' | sort | uniq  > {output} & true"

SnakeMake From line 202 of main/Snakefile

shell: "cat {input[0]} | grep -v 'Polygenic Score (PGS) ID' | cut -f 12 -d ',' > {output}.tmp & true; " + \
       "cat {input[1]} | grep -v 'PGS Performance Metric (PPM) ID' | cut -f 4 -d ',' >> {output}.tmp & true; " + \
       "cat {output}.tmp | sort | uniq -c > {output}; " + \
       "rm {output}.tmp; "

SnakeMake From line 208 of main/Snakefile

run:
    pgs_ids = []
    with open(input[0],"r") as f_in:
        for line in f_in:
            pgs_ids.append(line.strip("\n"))
    with open(input[1],"r") as f_in, open(output[0],"w") as f_out:
        for line in f_in:
             s = line.split(",")
             pgs_id = s[0]
             if (pgs_id in pgs_ids) or line[:15] == "Polygenic Score":
                f_out.write(line)  

SnakeMake From line 217 of main/Snakefile

run:
    pgs_ids = []
    with open(input[0],"r") as f_in:
        for line in f_in:
            pgs_ids.append(line.strip("\n"))
    with open(input[1],"r") as f_in, open(output[0],"w") as f_out:
        for line in f_in:
             s = line.split(",")
             pgs_id = s[1]
             if (pgs_id in pgs_ids) or line[:22] == "PGS Performance Metric":
                f_out.write(line)  

SnakeMake From line 233 of main/Snakefile

run:
    pgs_ids = []
    with open(input[0],"r") as f_in:
        for line in f_in:
            pgs_ids.append(line.strip("\n"))
    with open(input[1],"r") as f_in, open(output[0],"w") as f_out:
        for line in f_in:
             s = line.split(",")
             pgs_id = s[1]
             if pgs_id in pgs_ids or line[:14] == "PGS Sample Set":
                f_out.write(line)  

SnakeMake From line 249 of main/Snakefile

shell: "cat {input} | cut -f 1 -d ',' | sort | uniq -c > {output}"

SnakeMake From line 264 of main/Snakefile

shell: "cat {input} | cut -f 12 | sort | uniq -c > {output}"

SnakeMake From line 269 of main/Snakefile

shell: "cat {input} | cut -f 5 -d ',' | sort | uniq -c | sort -k1 -n -r > {output}"

SnakeMake From line 275 of main/Snakefile

shell: "cat {input} | cut -f 1 -d ',' | sort | uniq > {output}"

SnakeMake From line 281 of main/Snakefile

run:
    pgs_ids = []
    with open(input[0],"r") as f_in:
        for line in f_in:
            pgs_ids.append(line.strip("\n"))
    with open(input[1],"r") as f_in, open(output[0],"w") as f_out:
        for line in f_in:
             s = line.split(",")
             pgs_id = s[0]
             if pgs_id in pgs_ids or line[:15] == "Polygenic Score":
                f_out.write(line)

SnakeMake From line 288 of main/Snakefile

run:
    pgs_ids = []
    with open(input[0],"r") as f_in:
        for line in f_in:
            pgs_ids.append(line.strip("\n"))
    with open(input[1],"r") as f_in, open(output[0],"w") as f_out:
        for line in f_in:
             s = line.split(",")
             pgs_id = s[1]
             if pgs_id in pgs_ids or line[:15] == "PGS Performance":
                f_out.write(line)  

SnakeMake From line 305 of main/Snakefile

run:
    pgs_ids = []
    with open(input[0],"r") as f_in:
        for line in f_in:
            pgs_ids.append(line.strip("\n"))
    with open(input[1],"r") as f_in, open(output[0],"w") as f_out:
        for line in f_in:
             s = line.split(",")
             pgs_id = s[1]
             if pgs_id in pgs_ids or line[:14] == "PGS Sample Set":
                f_out.write(line)    

SnakeMake From line 322 of main/Snakefile

run:
    efo_ids = []
    with open(input[0],"r") as f_in:
        for line in f_in:
            efo_ids.append(line.strip("\n"))
    with open(input[1],"r") as f_in, open(output[0],"w") as f_out:
        for line in f_in:
             s = line.strip("\n").split("\t")
             assert(len(s)==7)
             ukb_efo = s[2]
             if (not "|" in ukb_efo and (ukb_efo in efo_ids) or line[:5] == "ZOOMA"):