Re-running Genetic Variation Analysis

public 1yr ago 0 bookmarks

View Workflow

code_for_genetic_diversity_sampling — View Workflow

Help improve this workflow!

This workflow has been published but could be further improved with some additional meta data:

Keyword(s) in categories input, output, operation

You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .

Code_for_Genetic_Diversity_SAmpling

How to re-run the 'genetic variation' analysis described in Madupe et al 2023 The details of the analysis are described in the supplementary of the paper.

Download and Instalation

First Clone this

Code Snippets

import os
import os.path
from os import listdir
from os.path import isfile, join
import sys
import random
from itertools import combinations
import statistics
import matplotlib.pyplot as plt

##################################################################################################################################################################################################################################################################################################
##Set Up

GENOTYPE_FILE=open(sys.argv[1],'r')
SAMPLE_NAME=sys.argv[1].split('.gntp')[0]

OUTPUT=open(F'{SAMPLE_NAME}.metric','w')

PROTEIN_COVERAGE_FILE=open('Protein_Coverage.txt','r')

########### Data Strucutres

GENOTYPES=[]   ## List of lists, of lists - paired genotypes for each locus
FLAT_GENOTYPES=[] ## List of lists, genotypes for each locus

EXPECTED_HETEROZ=[] ## List of floats

###############################
################# Prepare data

###### Protein Data, get total number of AA covered by this analysis and use it to calculate the length of the underlying sequence
Number_of_AA=[]

for line in PROTEIN_COVERAGE_FILE:
    line=line.strip().split()
    if len(line)>1:
        AA=line[1].split(',')
        for K in AA:
            Number_of_AA.append(K)

Number_of_AA=len(Number_of_AA)
Length_of_Sequence=3*Number_of_AA

###### Prep Genotypes

count=1
for LINE in GENOTYPE_FILE:
    LINE=LINE.strip().split('\t')

    ##### N of dataset
    NUMBER_OF_SAMPLES=len(LINE) ## How many samples

    ##### Get Genotypes
    GENOTYPES_HERE=[ x.split('/') for x in LINE ]
    FLAT_GENOTYPES_HERE=[ l for x in GENOTYPES_HERE for l in x]

    FREQ_GENOTYPES_HERE=[int(x) for x in FLAT_GENOTYPES_HERE if x!='.']
    FREQ=statistics.mean(FREQ_GENOTYPES_HERE)
    print(F'\nVariant {count} with frequency: {FREQ}\n')
    count+=1

    GENOTYPES.append(GENOTYPES_HERE) ## List of lists of pairs of genotypes
    FLAT_GENOTYPES.append(FLAT_GENOTYPES_HERE) ## All Genotypes in one list

    ################################################################################################################################################# #################################################################################################################################################
    ##### Calculate PIE - Expected Heterozygosity    

    ALLELE_NUMBER=len([x for x in FLAT_GENOTYPES_HERE if x!='.'])
    Q_FREQ=FLAT_GENOTYPES_HERE.count('0')/ALLELE_NUMBER
    P_FREQ=FLAT_GENOTYPES_HERE.count('1')/ALLELE_NUMBER

    EXPECTED_HETEROZ.append(2*Q_FREQ*P_FREQ)

EXPECTED_HETEROZ=sum(EXPECTED_HETEROZ)

print(F'\nFound {len(GENOTYPES)} potentially polymorphic sites for {NUMBER_OF_SAMPLES} individuals\n')

################################################################################################################################################# #################################################################################################################################################
##### Calculate PIE - AVG Heterozygosity

OBSERVED_HETEROZ=[]

for IND in range(0,NUMBER_OF_SAMPLES):

    IND_HETEROZ=[]

    for GENOTYPES_HERE in GENOTYPES:
        SAMPLE=GENOTYPES_HERE[IND]

        if (SAMPLE.count('0')==1) and (SAMPLE.count('1')==1):
            IND_HETEROZ.append(1)
        if (SAMPLE.count('0')==2) or (SAMPLE.count('1')==2):
            IND_HETEROZ.append(0)

    OBSERVED_HETEROZ.append(sum(IND_HETEROZ))

OBSERVED_HETEROZ=sum(OBSERVED_HETEROZ)/len(OBSERVED_HETEROZ)

##################################################################################################################################################################################################################################################################################################
##### Calculate Watterssons estimate

##### Check that sites are indeed polymorphic in population
SEG_SITES=0
FIXED_SITES=0

for J in range(0,len(FLAT_GENOTYPES)):
    FLAT_GENOTYPES_HERE=[int(x) for x in FLAT_GENOTYPES[J] if x != '.']
    FLAT_GENOTYPES_HERE=list(set(FLAT_GENOTYPES_HERE))

    if len(FLAT_GENOTYPES_HERE)>1:
        SEG_SITES+=1
    if len(FLAT_GENOTYPES_HERE)==1:
        FIXED_SITES+=1

HARMONIC=sum([ 1/x for x in range(1,SEG_SITES*2)]) #### Harmonic for number of segregating sites * 2 for diploidy
if HARMONIC!=0:
    WTRSNS_E = SEG_SITES/HARMONIC
else:
    WTRSNS_E=0

print(F'\nNumber of Segregating sites: {SEG_SITES}\n')
print(F'Number of Fixed sites: {FIXED_SITES}\n')
print(F'Harmonic Number of Samples: {HARMONIC}\n')
print(F'\nThis results to a Watterssons Estimate of: {WTRSNS_E}\n')

###############################################################################################################################################################################################################################################################################################################################
###### Random Sampling Loop
### Preselect quartets of individuals for the loop

MAX_LOOP=1000

if NUMBER_OF_SAMPLES >= 4*MAX_LOOP: ### Sample without replacement
    SAMPLINGS=random.sample(range(NUMBER_OF_SAMPLES), 4*MAX_LOOP)
    random.shuffle(SAMPLINGS)

if NUMBER_OF_SAMPLES < 4*MAX_LOOP: ### Sample with replacement
    SAMPLINGS=[]

    for k in range(0,MAX_LOOP):
        CHOICES=random.choices(range(NUMBER_OF_SAMPLES), k=4)
        for j in CHOICES:
            SAMPLINGS.append(j)

    random.shuffle(SAMPLINGS)

##### These will be used to get the sampling metrics across the quartets

SAMPLING_SUCCESS_OR_NOT=[]
HOMOZYGOTE_SUCCESS_OR_NOT=[]
TOTAL_VARIANT_SUCCESS_OR_NOT=[]
TWO_ALTERNATIVES_SUCCESS_OR_NOT=[]

##### These will be used to get the average diveristy metrics for the quartets

WTRSNS_E_QUARTET=[]
OBSERVED_HETEROZ_QUARTET=[]
EXPECTED_HETEROZ_QUARTET=[]

#### Sampling Loop, Sample 4 individuals and check all segregating sites, do you find varaition in any of them?
for LOOP in range(0,MAX_LOOP):

    #### These will be used to get the sampling metrics for this quartet
    VARIANT_SPOTTED=0
    TOTAL_VARIANT_SPOTTED=0
    HOMOZYG_SPOTTED=0
    TWO_ALTERNATIVES_SPOTTED=0

    SAMPLING_NUMBER=SAMPLINGS[LOOP*4:LOOP*4+4]

    ### This will be used for the Diversity Metricsf for this  quartet
    SEG_SITES_LOCAL=0
    WTRSNS_E_LOCAL=0
    OBSERVED_HETEROZ_LOCAL=[]
    EXPECTED_HETEROZ_LOCAL=[]

    #### For each site
    for SNP in range(0,len(GENOTYPES)):

        #### Load site data from matrix
        GENOTYPES_HERE=GENOTYPES[SNP]

        ##### Sample using method 1 # Sample diploid individuals
        SAMPLING=[ GENOTYPES_HERE[x] for x in SAMPLING_NUMBER] #### Get 4 individuals (diploid) using flat genotypes
        SAMPLING_DIPLOID=SAMPLING #### Keep 4 individuals (diploid) using flat genotypes
        SAMPLING=[l for x in SAMPLING for l in x] ### clean up data to be a flat list of 8 alleles

        if '.' in SAMPLING:
            SAMPLING.remove('.')

        SAMPLING_UNIQ=list(set(SAMPLING)) #### either a (0) a (1) or (1,0)

        if len(SAMPLING_UNIQ)>1: ##### Check if any variation exists
            VARIANT_SPOTTED=1        ##### Count sucessful test
            TOTAL_VARIANT_SPOTTED+=1 ##### Count how many times Variant spotted has been triggered within a quartet!
            SEG_SITES_LOCAL+=1

        if (['1','1'] in SAMPLING_DIPLOID) and (['0','0'] in SAMPLING_DIPLOID): #### Check if homozygous individuals for the variant exist
            HOMOZYG_SPOTTED=1

        if (SAMPLING.count('1'))>=2: ### Check if more than 1 alternative allele exists
            TWO_ALTERNATIVES_SPOTTED=1

        #### Calc Expected Heterozygosity for this SNP in quartet

        ALLELE_NUMBER_LOCAL_SNP=len(SAMPLING) #number of non missing alleles
        Q_FREQ=SAMPLING.count('0')/ALLELE_NUMBER_LOCAL_SNP #Freq of allele 1
        P_FREQ=SAMPLING.count('1')/ALLELE_NUMBER_LOCAL_SNP #Freq of allele 2

        EXPECTED_HETEROZ_LOCAL.append(2*Q_FREQ*P_FREQ)

    SAMPLING_SUCCESS_OR_NOT.append(VARIANT_SPOTTED)
    TOTAL_VARIANT_SUCCESS_OR_NOT.append(TOTAL_VARIANT_SPOTTED)
    TWO_ALTERNATIVES_SUCCESS_OR_NOT.append(TWO_ALTERNATIVES_SPOTTED)
    HOMOZYGOTE_SUCCESS_OR_NOT.append(HOMOZYG_SPOTTED)

    ### Calc Expected Heterozygosity for quartet

    EXPECTED_HETEROZ_LOCAL=sum(EXPECTED_HETEROZ_LOCAL)

    ### Calc Watterssons E for quartet

    HARMONIC_LOCAL=sum([ 1/x for x in range(1,SEG_SITES_LOCAL*2)]) #### Harmonic for number of segregating sites * 2 for diploidy

    if HARMONIC_LOCAL==0:
        WTRSNS_E_LOCAL=0
    if HARMONIC_LOCAL!=0:
        WTRSNS_E_LOCAL = SEG_SITES_LOCAL/HARMONIC_LOCAL

    #### Calc Observed Heterozygosity for quartet

    for IND in SAMPLING_NUMBER:

        IND_HETEROZ=[]

        for GENOTYPES_HERE in GENOTYPES:
            SAMPLE=GENOTYPES_HERE[IND]

            if (SAMPLE.count('0')==1) and (SAMPLE.count('1')==1):
                IND_HETEROZ.append(1)
            if (SAMPLE.count('0')==2) or (SAMPLE.count('1')==2):
                IND_HETEROZ.append(0)

        OBSERVED_HETEROZ_LOCAL.append(sum(IND_HETEROZ))

    OBSERVED_HETEROZ_LOCAL=sum(OBSERVED_HETEROZ_LOCAL)/len(OBSERVED_HETEROZ_LOCAL)

    WTRSNS_E_QUARTET.append(WTRSNS_E_LOCAL)
    OBSERVED_HETEROZ_QUARTET.append(EXPECTED_HETEROZ_LOCAL)
    EXPECTED_HETEROZ_QUARTET.append(OBSERVED_HETEROZ_LOCAL)

###### Sampling metrics averages across loop
AT_LEAST_ONE_VARIANT=sum(SAMPLING_SUCCESS_OR_NOT)/len(SAMPLING_SUCCESS_OR_NOT)
ONE_OR_MORE_VARIANT=sum(TOTAL_VARIANT_SUCCESS_OR_NOT)/len(TOTAL_VARIANT_SUCCESS_OR_NOT)
AT_LEAST_ONE_HOMOZ=sum(HOMOZYGOTE_SUCCESS_OR_NOT)/len(HOMOZYGOTE_SUCCESS_OR_NOT)
AT_LEAST_TWO_ALTERNATIVES=sum(TWO_ALTERNATIVES_SUCCESS_OR_NOT)/len(TWO_ALTERNATIVES_SUCCESS_OR_NOT)

###### Diversity Metrics average across quartets

WTRSNS_E_QUARTET=sum(WTRSNS_E_QUARTET)/len(WTRSNS_E_QUARTET)
OBSERVED_HETEROZ_QUARTET=sum(OBSERVED_HETEROZ_QUARTET)/len(OBSERVED_HETEROZ_QUARTET)
EXPECTED_HETEROZ_QUARTET=sum(EXPECTED_HETEROZ_QUARTET)/len(EXPECTED_HETEROZ_QUARTET)

print(F'Expected Heterozygosity: {EXPECTED_HETEROZ}\nObserved Heterozygosity: {OBSERVED_HETEROZ}\nWattersons Estimator {WTRSNS_E}\n\n\n')
print(F'Average Expected Heterozygosity per Quartet: {EXPECTED_HETEROZ_QUARTET}\nAverage Observed Heterozygosity per Quartet: {OBSERVED_HETEROZ_QUARTET}\nAverage Wattersons Estimator per Quartet  {WTRSNS_E_QUARTET}\n\n\n')

print(F'Probability of at least one variant: {AT_LEAST_ONE_VARIANT}\n')
print(F'Average number of successes per individual test: {ONE_OR_MORE_VARIANT}\n')
print(F'Probability of at least one homozygous individual: {AT_LEAST_ONE_HOMOZ}\n')
print(F'Probability of at least 2 alternative alleles: {AT_LEAST_TWO_ALTERNATIVES}\n\n\n')

print(F'Total length of underlying sequence: {Length_of_Sequence}\n')

plt.hist(TOTAL_VARIANT_SUCCESS_OR_NOT)
plt.savefig('Variants_Histogram.pdf',format='pdf')

Python matplotlib From line 1 of main/Do_Random_Sampling.py

import os
import os.path
from os import listdir
from os.path import isfile, join
import sys

################################################################################################################################################################################################################################################################################################################################################
################################################################### Extract and Filter variants from VEP Output


PROTEIN_POSITIONS_COVERAGE=open('Protein_Coverage.txt','r')

VEP_OUTPUT=open(sys.argv[1],'r')


SAMPLE_NAME=sys.argv[1].split('_VEP.VEP')[0]
OUTPUT=open(F'{SAMPLE_NAME}_Processed_Variants.PV','w')


##### Get which parts of the protein are covered by our ancient samples
COVERAGE={}

for line in  PROTEIN_POSITIONS_COVERAGE:
    line=line.strip().split()
    Protein_Name=line[0]

    if len(line)>1:
        Positions=line[1].split(',')
    else:
        Positions=[]

    COVERAGE[Protein_Name]=[int(x) for x in Positions]



print(COVERAGE)


### Go through VEP output
FILTERED=[]

for LINE in VEP_OUTPUT:
    if LINE[0]=='#'and LINE[1]!='#':
        LABELS=LINE.strip().split()

    if LINE[0]!='#':
        LINE=LINE.strip().split()

        Uploaded_variation=LINE[LABELS.index('#Uploaded_variation')]
        Location=LINE[LABELS.index('Location')]

        Gene=LINE[LABELS.index('Gene')] ### Ensemble ID
        Feature=LINE[LABELS.index('Feature')] ### Ensemble ID

        Feature_type=LINE[LABELS.index('Feature_type')] ## e.g. 'Transcript'
        VARIANT_CLASS=LINE[LABELS.index('VARIANT_CLASS')] ## e.g. SNV or Insertion
        SYMBOL=LINE[LABELS.index('SYMBOL')] ### Which Protein
        Consequence=LINE[LABELS.index('Consequence')] ## Important! can be multiple things including: synonimous_variant, upstream_variant, missense_variant, splice_donor_variant(?), frameshift_variant
        IMPACT=LINE[LABELS.index('IMPACT')] #### HIGH,LOW other?
        CANONICAL=LINE[LABELS.index('CANONICAL')]
        Protein_position=LINE[LABELS.index('Protein_position')]


        print(Uploaded_variation,Location,Feature_type,VARIANT_CLASS,SYMBOL,Consequence,IMPACT,Protein_position,CANONICAL)


        ### Filter based on criteria here

        if (VARIANT_CLASS=='SNV') and (Consequence=='missense_variant') and (CANONICAL=='YES'):
            FILTERED.append([SYMBOL,Location,Protein_position])


print(LABELS)


for SNP in FILTERED:
    Loc=SNP[1]
    Loc=Loc.split(':')
    Loc='\t'.join(Loc)

    Prot=SNP[0]
    Prot_Pos=int(SNP[2])

    ##COVERAGE[Prot] ## all positions covered by all 4 samples
    print(Loc,Prot,Prot_Pos,COVERAGE[Prot])
    if Prot_Pos in COVERAGE[Prot]:
        OUTPUT.write(Loc+'\n')

Python From line 1 of main/Extract_Info_From_VEP_Output.py

import os
import os.path
from os import listdir
from os.path import isfile, join
import sys
from Bio import SeqIO



FILES_IN_FOLDER = [f for f in os.listdir('.') if os.path.isfile(f)]
FASTA_FILE_LIST= [f for f in FILES_IN_FOLDER if '.fa' in f]


OUTPUT=open('Protein_Coverage.txt','w')


COVERAGE={}

for FILE in FASTA_FILE_LIST:
    fasta_sequences = SeqIO.parse(open(FILE),'fasta')
    for fasta in fasta_sequences:
        name, sequence = fasta.id, str(fasta.seq)
        name=name.split('/')[0]
        name=name.split('_')
        sample='_'.join(name[0:len(name)-1])
        protein=name[len(name)-1]


        if 'Paranthropus' in sample:
            if sample not in COVERAGE.keys():
                COVERAGE[sample]={}

            counter=1            
            for POS in sequence:
                counter+=1
                if POS!='?':
                    if protein not in COVERAGE[sample].keys():
                        COVERAGE[sample][protein]=[]
                    COVERAGE[sample][protein].append(counter)

            # print(sample,protein)

TOTAL_COVERAGE={}

for SMPL in COVERAGE.keys():
    for PRTN in COVERAGE[SMPL].keys():
        if PRTN not in TOTAL_COVERAGE.keys():
            TOTAL_COVERAGE[PRTN]=[]
        for PSTN in COVERAGE[SMPL][PRTN]:
            TOTAL_COVERAGE[PRTN].append(PSTN)





####### For getting positions covered by ALL 4 P.rob samples
for PRTN in TOTAL_COVERAGE.keys():
    TOTAL_COVERAGE[PRTN]=sorted(TOTAL_COVERAGE[PRTN])
    TOTAL_COVERAGE_UNIQUE=list(set(TOTAL_COVERAGE[PRTN])) ### Sites only once, so we can loop through them

    ### Only select positions that are counted 4 times
    TOTAL_COVERAGE[PRTN]=[ str(SITE) for SITE in TOTAL_COVERAGE_UNIQUE if TOTAL_COVERAGE[PRTN].count(SITE)==4 ]
    print(PRTN,TOTAL_COVERAGE[PRTN])
    POSITIONS=','.join(TOTAL_COVERAGE[PRTN])
    OUTPUT.write(F'{PRTN}\t{POSITIONS}\n')







####### For getting positions covered by at least one P.rob sample!
# for PRTN in TOTAL_COVERAGE.keys():
    # TOTAL_COVERAGE[PRTN]=sorted(list(set(TOTAL_COVERAGE[PRTN])))
    # TOTAL_COVERAGE[PRTN]=[str(x) for x in TOTAL_COVERAGE[PRTN]]
    # print(PRTN,TOTAL_COVERAGE[PRTN])
    # POSITIONS=','.join(TOTAL_COVERAGE[PRTN])
    # OUTPUT.write(F'{PRTN}\t{POSITIONS}\n')

Python Biopython From line 1 of main/Get_Protein_Coverage.py

run:
    shell(F"bcftools index {input.VCF_FILE} -f --threads {threads}")

SnakeMake BCFtools From line 83 of main/Snakefile

run:
    shell(F"python3 Get_Protein_Coverage.py")

SnakeMake From line 93 of main/Snakefile

run:
    shell(F"bcftools view {input.VCF_FILE} -R {input.GENE_LOCATIONS} --threads {threads} -O v -o {output.GENE_FILTERED_VCF}")

SnakeMake BCFtools From line 118 of main/Snakefile

run:
    shell(F"vep --i {input.GENE_FILTERED_VCF} --tab --species homo_sapiens --offline --dir_cache VEP_Cache/ --output_file {output.VEP_OUTPUT} --force_overwrite --everything")

SnakeMake Variant Effect Predictor (VEP) From line 143 of main/Snakefile

run:
    shell(F"python3 Extract_Info_From_VEP_Output.py {input.VEP_OUTPUT}")

SnakeMake From line 168 of main/Snakefile

run:
    shell(F"bgzip -i -k -f --threads {threads} {input.GENE_FILTERED_VCF}")
    shell(F"bcftools index {output.GENE_FILTERED_GZVCF} -f --threads {threads}")

SnakeMake BCFtools From line 187 of main/Snakefile

run:
    shell(F"bcftools view {input.GENE_FILTERED_GZVCF} -R {input.PROCESSED_VARIANTS} --threads {threads} -O v -o {output.SECOND_FILTERING_VCF}")

SnakeMake BCFtools From line 208 of main/Snakefile

run:
    shell(F"bcftools query -f '[%GT\t]\n' {input.SECOND_FILTERING_VCF} > {output.GENOTYPES}")
    shell(F"bcftools query -f '%CHROM %POS %ID %REF %ALT\n' {input.SECOND_FILTERING_VCF} > {output.SNP_LOCATIONS}")

SnakeMake BCFtools From line 234 of main/Snakefile

run:
    shell(F"python3 Do_Random_Sampling.py {input.GENOTYPES}")

SnakeMake From line 257 of main/Snakefile

ShowHide 8 more snippets with no or duplicated tags.

Comments

Support

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Created: 1yr ago

Updated: 1yr ago

Maitainers: public

URL: https://github.com/johnpatramanis/Code_for_Genetic_Diversity_Sampling

Name: code_for_genetic_diversity_sampling

Version: 1

Badge:

Insert copied code into your website to add a link to this workflow.

License: None

Keywords:

BCFtools Biopython Snakemake Variant Effect Predictor (VEP) matplotlib Genetic variation

Future updates

Related Workflows

psychip_snakemake — Show Details View Workflow

ENCODE pipeline for histone marks developed for the psychENCODE project

public

psychip pipeline is an improved version of the ENCODE pipeline for histone marks developed for the psychENCODE project. The o...

raw sequence reads Alignment Sequence alignment report macs2 ucsc-bedclip bedGraphToBigWig BEDTools BWA Picard SAMtools Snakemake

Free

Near-real time tracking of SARS-CoV-2 in Connecticut

public

Repository containing scripts to perform near-real time tracking of SARS-CoV-2 in Connecticut using genomic data. This pipeli...

JSON nextclade Augur Biopython FOCUS Pandas Snakemake bs4 epiweeks geopy matplotlib numpy pycountry pycountry-convert uszipcode

Free

cellranger-snakemake-gke — Show Details View Workflow

snakemake workflow to run cellranger on a given bucket using gke.

public

A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...

macs2 ucsc-bedclip bedGraphToBigWig BEDTools BWA Picard SAMtools Snakemake

Free

ATLAS - Three commands to start analyzing your metagenome data

public

Metagenome-atlas is a easy-to-use metagenomic pipeline based on snakemake. It handles all steps from QC, Assembly, Binning, t...

raw sequence reads Genome assembly Annotation track checkm2 gunc prodigal snakemake-wrapper-utils MEGAHIT Atlas BBMap Biopython BioRuby Bwa-mem2 cd-hit CheckM DAS Diamond eggNOG-mapper v2 MetaBAT 2 Minimap2 MMseqs MultiQC Pandas Picard pyfastx SAMtools SemiBin Snakemake SPAdes SqueezeMeta TADpole VAMB CONCOCT ete3 gtdbtk h5py networkx numpy plotly psutil utils metagenomics

Free

175

rna-seq-star-deseq2 — Show Details View Workflow

RNA-seq workflow using STAR and DESeq2

public

This workflow performs a differential gene expression analysis with STAR and Deseq2. The usage of this workflow is described ...

Free

dna-seq-gatk-variant-calling — Show Details View Workflow

This Snakemake pipeline implements the GATK best-practices workflow

public

This Snakemake pipeline implements the GATK best-practices workflow for calling small germline variants. The usage of thi...

VCF raw sequence reads Variant calling genetic variants gatk rust-bio-tools snakemake-wrapper-utils tabix BCFtools BWA FastQC MultiQC Pandas Picard SAMtools Snakemake Trimmomatic Variant Effect Predictor (VEP) common matplotlib numpy seaborn DNA

Free