African Demography Review

public 1yr ago 0 bookmarks

View Workflow

africanpopulationstructure — View Workflow

Help improve this workflow!

This workflow has been published but could be further improved with some additional meta data:

Keyword(s) in categories input, output, operation, topic

You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .

African Demography Review

This repository provides a Snakemake workflow used for the data analysis presented in Pfennig et al., Evolutionary genetics and admixture in African populations, Genome Biology and Evolution, 2023; https://doi.org/10.1093/gbe/evad054 .

Datasets

We combine and harmonize genomic datasets from"

20 European, Middle Eastern, and African populations from the Simons Genome Diversity Project (SGDP)1
36 African populations from Crawford et al. (2017)2 and Scheinfeldt et al. (2019)3 that were genotyped on the Illumina 1M array (These datasets were merged by Matt Hansen (unpublished work) and kindly provided to us)
16 Sudanese populations from Hollfelder et al. (2017)4
8 North African populations from Arauna et al. (2017)5
14 Sahelian populations from Fortes-Lima et al. (2022)6
10 Southern African populations from Schlebusch et al. (2012)7

Populations with more than 2 samples are downsampled to 2 indivdiuals to avoid effects from variable sample sizes. If a population was included in more than one dataset, only samples from one dataset were included for this population. We merge the dataset on biallelic common SNPs. Conflicting SNPs are strand flipped once and SNPs that remain in conflict are excluded. A minor allele frequency threshold of 0.05 and genotyping rate per SNP threshold of 0.1 and LD pruning (max r2=0.05) were applied (these parameters can be changed in the config/config.yaml . Ancestry proportions were estyimated using ADMIXTURE8 and different values of K. For visualization, ADMIXTURE results are aligned across all Ks using CLUMPAK9 prior to plotting (Figure 2). Ancestry proportions of the best K were then interpolated in space using the ordinary Kriging method and effective migration rates were estimated using FEEMS10 (Figure 3).

Software dependencies

bcftools v1.13 (path can be updated in config/config.yaml )
All other software is stored in provided conda environment, which are automatically created and used by Snakemake.

Data dependencies

The dataset from Crawford et al.2 (dbGaP: phs001396.v1.p1 ) and Scheinfeldt et al.3 (dbGaP: phs001780.v1.p1 ) need to be manually downloaded and merged and put into the data directory. The plink prefix of the files can be updated in the config/config.yaml . We provide coordinates of populations used in our analysis in data/coordinates_crawford_et_al_and_scheinfeldt_et_al.txt . Importantly, we only used samples that were genotyped on the Illumina 1M array.
The dataset from Fortes-Lima et al.5 needs to be downloaded separately from https://www.ebi.ac.uk/biostudies/arrayexpress/studies/E-MTAB-12243?query=E-MTAB-12243 (the dataset was kindly made available by the authors prior to publication...). Population coordinates were extracted from Table S2 in the corresponding publication and are stored in data/coordinates_fortes_lima_et_al.txt . The plink prefix can be updated in the config/config.yaml .
Geographic coordinates of the 18 Sudanese populations from Hollfelder et al. (2017)3 were kindly made available by the original authors. We provide the coordinates in data/coordinates_hollfelder_et_al.txt .
If city information was available for populations in Arauna et al. (2017)5, the coordinates of the city were used. For the populations from Libya and Egypt, the center of the respective country was taken (see data/coordinates_arauna_et_al.txt ). The data file was kindly made available by the origina authors.
All other required data is automatically retrieved

Comments

The Kriging plot of admixture results currently only works for K<=4, as we observed the lowest CV error at K=4. To get the best CV error across different Ks run grep CV {results_path}/{prefix}.*.Q .

As of now, this Snakemake is workflow is not written in a general way that allows to download/merge data from any dataset, but only those mentioned above.

Executing the Snakemake workflow

After ensuring that soft and data requirements are satisfied, the Snakemake workflow can be executed using the following command:

snakemake -c<threads> --use-conda

Citation

Pfennig et al., Evolutionary genetics and admixture in African populations, Genome Biology and Evolution, 2023; https://doi.org/10.1093/gbe/evad054

References

Mallick S, Li H, Lipson M, et al. The Simons Genome Diversity Project: 300 genomes from 142 diverse populations. Nature 538, 201–206 (2016). https://doi.org/10.1038/nature18964
Crawford, N. G., Kelly, D. E., Hansen, M. E. B., et al. (2017) Loci associated with skin pigmentation identified in African populations. Science 358, eaan8433. 10.1126/science.aan8433
Scheinfeldt LB et al. 2019. Genomic evidence for shared common ancestry of East African hunting-gathering populations and insights into local adaptation. Proc. Natl. Acad. Sci. 116:4166–4175. doi: https://www.pnas.org/doi/full/10.1073/pnas.1817678116 .
Hollfelder N, Schlebusch CM, Günther T, et al. (2017) Northeast African genomic variation shaped by the continuity of indigenous groups and Eurasian migrations. PLoS Genet 13(8): e1006976. https://doi.org/10.1371/journal.pgen.1006976
Arauna LR, Mendoza-Revilla J, Mas-Sandoval A, et al. (2017) Recent Historical Migrations Have Shaped the Gene Pool of Arabs and Berbers in North Africa, Molecular Biology and Evolution, Volume 34, Issue 2, https://doi.org/10.1093/molbev/msw218
Fortes-Lima C, Tříska P, Čížková M, et al. (2022) Demographic and Selection Histories of Populations Across the Sahel/Savannah Belt, Molecular Biology and Evolution, Volume 39, Issue 10, msac209, https://doi.org/10.1093/molbev/msac209
Schlebusch CM et al. (2012). Genomic variation in seven Khoe-San groups reveals adaptation and complex African history. Science. 338:374–379. https://doi.org/10.1126/science.1227721 .
Alexander DH, Novembre J, and Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Research, 19:1655–1664, 2009. http://www.genome.org/cgi/doi/10.1101/gr.094052.109
Kopelman, N, Mayzel, J, Jakobsson, M, Rosenberg, N, Mayrose, I (2015) CLUMPAK: a program for identifying clustering modes and packaging population structure inferences across K. Molecular Ecology Resources 15(5): 1179-1191, https://doi.org/10.1111/1755-0998.12387
Marcus J, Wooseok H, Barber RF, Novembre J (2021) Fast and flexible estimation of effective migration surfaces eLife 10:e61927. https://doi.org/10.7554/eLife.61927

Code Snippets

import matplotlib.pyplot as plt
from matplotlib.lines import Line2D
import sys 
import argparse
import pandas as pd
import numpy as np
import os

### This for purely explorative purposes. The labelling is roughly done on geographic coordinates, but


def visualize_pcs(data, outfile, pc_a=1, pc_b=2):
    """
    Visualize PCA done with plink
    :param data: df, DataFrame with eigenvectors and geographic coordinates
    :param outfile: str, where to save Figure to
    :param pc_a: int, first PC to plot
    :param pc_b: int, second PC to plot
    """
    np.random.seed(42)
    colors = np.where(
        (data.Longitude >= -16.7) & (data.Longitude <= 17) & (data.Latitude >= 2.5) & (data.Latitude <= 20), 'green', 0)
    colors = np.where((data.Longitude > 17) & (data.Latitude >= -5.0) & (data.Latitude <= 20), 'blue', colors)
    colors = np.where((data.Latitude <= -5), 'orange', colors)
    colors = np.where((data.Latitude > 15) & (data.Longitude >= 35), 'pink', colors)
    colors = np.where((data.Latitude > 27) & (data.Longitude < 15), 'red', colors)
    color_dict = {'Middle East': 'pink',
                  "West Africa": "green",
                  "East Africa": "blue",
                  "South Africa": "orange",
                  "Europe": "red"}
    data['colors'] = colors
    fig, ax = plt.subplots()
    ax.scatter(data.loc[:, f'PC{pc_a}'], data.loc[:, f'PC{pc_b}'].values, c=data.colors, alpha=0.5)
    ax.set_xlabel(f'PC{pc_a}')
    ax.set_ylabel(f'PC{pc_b}')
    legend_elements = [Line2D([0], [0], color=col, marker='o', ls='', label=pop) for pop, col in color_dict.items()]
    ax.legend(handles=legend_elements, loc='upper center', ncol=4, bbox_to_anchor=(0.5, -.14))
    fig.savefig(outfile, bbox_inches='tight')
    plt.close()


def main(argv):
    parser = argparse.ArgumentParser(description='Visualize PCA performed with plink2')
    parser.add_argument('--eigenvec', help='.eigenvec file that was generated by plink. ')
    parser.add_argument('--coords', help='file containing geographic coordinates of sample locations.')
    parser.add_argument('--plot_dir', help='Directory where to save plots')
    args = parser.parse_args()
    if args.plot_dir.endswith('/'):
        plot_dir = args.plot_dir
    else:
        plot_dir = args.plot_dir + '/'
    if not os.path.isdir(plot_dir):
        os.makedirs(plot_dir)
    eigenvectors = pd.read_csv(args.eigenvec, sep='\t')
    eigenvectors.set_index('IID', inplace=True)
    coordinates = pd.read_csv(args.coords, sep='\t', header=None, names=['FID', "IID", "Latitude", "Longitude"])
    # join
    data = eigenvectors.join(coordinates.set_index('IID'))
    outfile = plot_dir + args.eigenvec.split('/')[-1].split('.eigenvec')[0] + "_pca.png"
    visualize_pcs(data, outfile)


if __name__ == '__main__':
    main(sys.argv[1:])

Python Pandas numpy matplotlib pLink plink2 From line 2 of scripts/plot_pca.py

import numpy as np
import pkg_resources
from sklearn.impute import SimpleImputer
from pandas_plink import read_plink
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
from feems.utils import prepare_graph_inputs
from feems import SpatialGraph, Viz
from feems.cross_validation import run_cv
import sys
import argparse
import pandas as pd

# change matplotlib fonts
plt.rcParams["font.family"] = "Arial"
plt.rcParams["font.sans-serif"] = "Arial"


def create_spatial_graph(data_path, prefix, coords_file):
    """
    Create SpatialGraph object
    :param data_path: str, data directory
    :param prefix: str, prefix for plink files in data directory
    :param coords_file: str, path to file with sampling coordinates
    :return: SpatialGraph
    """
    data_path_feems = pkg_resources.resource_filename("feems", "data/")
    # read the genotype data and mean impute missing data
    (bim, fam, G) = read_plink("{}/{}".format(data_path, prefix))
    imp = SimpleImputer(missing_values=np.nan, strategy="mean")
    genotypes = imp.fit_transform((np.array(G)).T)

    # setup graph
    coords = pd.read_csv(coords_file, sep='\t', header=None,
                         names=['fid_c', 'iid_c', 'latitude', 'longitude'])
    coords = fam.set_index('iid').join(coords.set_index('iid_c')).loc[:, ["longitude", "latitude"]].values
    grid_path = "{}/grid_100.shp".format(data_path_feems)  # path to discrete global grid

    # graph input files
    _, edges, grid, _ = prepare_graph_inputs(coord=coords,
                                             ggrid=grid_path,
                                             translated=False,
                                             buffer=0,)

    # # construct spatial graph object
    sp_graph = SpatialGraph(genotypes, coords, grid, edges, scale_snps=True)
    return sp_graph


def cross_validate(sp_graph):
    """
    Perform cross validation to find optimal lambda
    :param sp_graph: SpatialGraph
    :return: float, lambda
    """
    # reverse the order of lambdas and alphas for warmstart
    lamb_grid = np.geomspace(1e-6, 1e2, 20)[::-1]

    # run cross-validation
    cv_err = run_cv(sp_graph, lamb_grid, n_folds=sp_graph.n_observed_nodes, factr=1e10)

    # average over folds
    mean_cv_err = np.mean(cv_err, axis=0)

    # argmin of cv error
    lamb_cv = float(lamb_grid[np.argmin(mean_cv_err)])
    return lamb_cv


def fit_final_graph(sp_graph, lamb_cv, prefix, output_path):
    """
    Fit final SpatialGraph using optimal lambda inferred from cross-validation
    :param sp_graph: SpatialGraph
    :param lamb_cv: float, lambda
    :param output_path: str, directory where to save output figure named feems_plot.pdf
    """
    sp_graph.fit(lamb_cv)
    projection = ccrs.PlateCarree()
    # Plot the FEEMS result
    fig = plt.figure(figsize=(9, 16), dpi=600)
    ax = fig.add_subplot(1, 1, 1, projection=projection)
    ax.set_extent([-25, 60, -40, 50], crs=ccrs.PlateCarree())
    v = Viz(ax, sp_graph, projection=projection, edge_width=.5,
            edge_alpha=1, edge_zorder=100, sample_pt_size=20,
            obs_node_size=7.5, sample_pt_color="black",
            cbar_font_size=5, cbar_ticklabelsize=4)
    v.draw_map()
    v.draw_edges(use_weights=True)
    v.draw_obs_nodes(use_ids=False)
    v.draw_edge_colorbar()
    fig.savefig(output_path + prefix + '_feems_plot.pdf', bbox_inches='tight', dpi=600)


def main(argv):
    parser = argparse.ArgumentParser()
    parser.add_argument('-d', '--data_path', help='Data directory. default=./data/', default='./data/')
    parser.add_argument('-p', '--prefix', help='Prefix to plink files')
    parser.add_argument('-c', '--coords', help='Path to file coordinates of samples. '
                                               'default=data/population_geo_coords.tab',
                        default='data/population_geo_coords.tab')
    parser.add_argument('-o', '--output_path', help='Output directory. Will save Figure called feems_plot.pdf '
                                                    'in this direcotry. default=./results/', default='./results/')
    args = parser.parse_args()
    data_path = args.data_path
    prefix = args.prefix
    coords = args.coords
    output_path = args.output_path
    if not data_path.endswith('/'):
        data_path += '/'
    if not output_path.endswith('/'):
        output_path += '/'
    sp_graph = create_spatial_graph(data_path, prefix, coords)
    lamb_cv = cross_validate(sp_graph)
    fit_final_graph(sp_graph, lamb_cv, prefix, output_path)


if __name__ == '__main__':
    main(sys.argv[1:])

Python Pandas numpy matplotlib pLink sklearn Cartopy FEEMS pandas-plink From line 7 of scripts/run_feems.py

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import glob
import sys
import argparse
import seaborn as sns
import numpy.ma as ma
from pykrige.ok import OrdinaryKriging
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import cartopy.io.shapereader as shpreader
import shapely.geometry as sgeom
from shapely.ops import unary_union
from shapely.prepared import prep
from itertools import permutations


def is_land(x, y, land):
    """
    Function to determine if a given point is on land or not
    :param x: float, longitude
    :param y: float, latitude
    :param land:
    :return: boolean
    """
    return land.contains(sgeom.Point(x, y))


def build_mask(x, y, land):
    """
    Create a mask of points that are on land and should be plotted
    :param x: np.array, longitude values
    :param y: np.array, latitude values
    :param land:
    :return: np.array, boolean
    """
    # mesh
    xv, yv = np.meshgrid(x, y, indexing='xy')
    # initialize mask
    mask = np.zeros(xv.shape, dtype=bool)
    for i in range(x.shape[0]):
        for j in range(y.shape[0]):
            # check if point is land
            mask[j, i] = is_land(xv[j, i], yv[j, i], land)
    return mask


def sort_individuals(df, K):
    """
    Sort individuals by modal ancestry components and it's intensity
    :param df: pd.DataFrame, data frame containing ancestry porportions
    :param K: int, K used to run admixture
    :return: pd.DataFrame, sorted df
    """
    # get mean ancestry per component for each population
    mean_ancestry_pops = df.groupby('population').mean()
    # identify modal component for each population, i.e., the component with the highest ancestry proportion
    modal_clusters_pops = mean_ancestry_pops.idxmax(axis=1).sort_values()
    individual_orders = []
    # iterate over all k
    for i in range(K):
        # get all populations for which the current component is model
        pops = modal_clusters_pops[modal_clusters_pops == i].index.values.tolist()
        # sort the population by they mean ancestry proportions of the modal component
        pops_sorted = mean_ancestry_pops.loc[pops, i].sort_values(ascending=False).index.values
        inds = []
        # sort the individuals within a population by ancestry proportion
        for pop in pops_sorted:
            inds.extend(df.loc[df.population == pop, i].sort_values(ascending=False).index.values.tolist())
        individual_orders.extend(inds)

    df = df.loc[individual_orders, :]
    return df


def get_best_k_and_plot_best_k(qfiles, Kvals, fam, log_dir, colors, prefix):
    """
    Find best K across different ADMIXTURE runs using different K and plot admixture plot of best K separately
    :param qfiles: list, list of paths to aligned .Q files
    :param Kvals: list-like, K values that were used
    :param fam: df, pd.DataFrame of plink .fam file
    :param log_dir: str, path to directory with log files of ADMIXTURE runs
    :param colors: list, colors used for plotting
    :param prefix: str, prefix of output figure "_admixture_plot_best_K.pdf" will be appended
    :return: array-like, array-like; order of individuals, cross-validation error
    """
    min_k = min(Kvals)
    # get CV error of ADMIXTURE runs
    cv_error = np.zeros(len(Kvals))
    logfiles = glob.glob(log_dir + '*.log')
    # read CV error
    for log in logfiles:
        k = int(log.split('.')[-2])
        with open(log, 'r') as fp:
            for line in fp:
                if line.startswith("CV error"):
                    cv_error[k - min_k] = float(line.split(':')[1])
                    break
            fp.close()
    # get best k
    best_k = Kvals[np.argmin(cv_error)]
    df = pd.read_csv(qfiles[np.argmin(cv_error)], sep=' ', header=None, index_col=None)
    df = df.iloc[:, 6:]
    df.columns = [i for i in range(best_k)]
    df['population'] = fam.fid.values
    # sort individuals
    df_sorted = sort_individuals(df, best_k)
    # get order of idnviduals
    plot_best_k(df_sorted, best_k, min(cv_error), colors, prefix)
    return cv_error


def sort_individuals_specified_K(qfile, K, fam):
    """
    Sort individuals based on estimated ancestry proportions given a K
    :param qfile: str, path to aligned Q file
    :param K: int, K value
    :param fam: df, pd.DataFrame of plink .fam file
    :return: array-like,  order of individuals

    """
    df = pd.read_csv(qfile, sep=' ', header=None, index_col=None)
    df = df.iloc[:, 6:]
    df.columns = [i for i in range(K)]
    df['population'] = fam.fid.values
    # sort individuals
    df_sorted = sort_individuals(df, K)
    # get order of idnviduals
    individual_order = df_sorted.index.values
    return individual_order


def plot_krigin(qfile, K, path_to_coords, vmin, fam, colors, prefix):
    """
    Interpolate admixture proportion in space using Kriging method
    :param qfile: str, path to qfile corresponding to K
    :param K: int, K used for admixture run
    :param path_to_coords:
    :param vmin: float, minimum admixture proportion required for Kriging method
    :param fam: df, pd.DataFrame of plink .fam file
    :param colors: list, colors used for plotting
    :param prefix: str, prefix for figure
    """
    # The parameters are tailored to our specific case and may not give meaningful results in other cases
    if K > 4:
        raise AttributeError("Only K<= 4 is supported")
    land_shp_fname = shpreader.natural_earth(resolution='50m',
                                             category='physical', name='land')

    land_geom = unary_union(list(shpreader.Reader(land_shp_fname).geometries()))
    # create land object
    land = prep(land_geom)

    # read data
    coords = pd.read_csv(path_to_coords, sep='\t', header=None, names=['fid', 'lat', 'long'], usecols=[0, 2, 3])
    coords.drop_duplicates('fid', inplace=True)
    df = pd.read_csv(qfile, sep=' ', header=None, index_col=None)
    df = df.iloc[:, 6:]
    df.columns = [i for i in range(K)]
    df['population'] = fam.fid.values
    df = df.join(coords.set_index('fid'), on='population')

    # longitude and latidude coords of interest
    long_coords = np.arange(-25, 60, 0.15, dtype=float)
    lat_coords = np.arange(-40, 50, 0.15, dtype=float)
    # color maps
    cmaps = [
        sns.color_palette(f"blend:white,{colors[0]}", as_cmap=True),
        sns.color_palette(f"blend:white,{colors[1]}", as_cmap=True),
        sns.color_palette(f"blend:white,{colors[2]}", as_cmap=True),
        sns.color_palette(f"blend:white,{colors[3]}", as_cmap=True),
    ]

    projection = ccrs.PlateCarree()

    fig = plt.figure(figsize=(9, 16))
    ax = fig.add_subplot(1, 1, 1, projection=projection)
    ax.set_extent([-25, 60, -40, 50], crs=ccrs.PlateCarree())

    # run kriging for each K
    perms = list(set(permutations([i for i in range(K - 1)])))
    perms = [perm + (K-1,) for perm in perms]
    for perm in perms:
        for i in perm:
            cmap = cmaps[i]
            proportions = df.loc[:, i].values
            ok = OrdinaryKriging(df.long.values,
                                 df.lat.values,
                                 proportions, variogram_model='linear',
                                 coordinates_type='geographic', nlags=2)
            z, ss = ok.execute('grid', long_coords, lat_coords)
            # get mask
            mask = build_mask(long_coords, lat_coords, land)
            # normalize data
            data = (z.data - z.data.min()) / (z.data.max() - z.data.min())
            # update mask
            mask[data < vmin] = False
            masked_data = ma.masked_array(data, ~mask)
            # plot
            ax.imshow(np.flip(masked_data, 0), extent=[-25, 60, -40, 50], cmap=cmap,
                      interpolation='none', vmax=1, vmin=vmin, alpha=1 / len(perms))
    # formatting
    ax.add_feature(cfeature.COASTLINE, color='black')
    ax.add_feature(cfeature.BORDERS, linestyle=':', color='black')
    # plot populations
    ax.scatter(coords.long, coords.lat, color='black', s=8)
    ax.axis('off')
    fig.savefig(f"{prefix}_admixture_kriging_K{K}.pdf", bbox_inches='tight', dpi=600)


def plot_best_k(df, K, cv, colors, prefix):
    """
    Create horizontal bar plot of ADMIXTURE results
    :param df: pd.DataFrame, sorted DataFrame with ancestry proportions
    :param K: int, K used for admixture run
    :param colors: list, list of colors to use
    :param cv: float, CV error of ADMIXTURE run
    :param prefix: str, prefix for figure
    """
    fig, ax = plt.subplots(figsize=(2, 8))
    populations = df.population.values
    y_coords = [0]
    y_ticks = []
    y_labels = []
    prev_pop = populations[0]
    pop_coord_start = 0
    cumulative_padding = 0
    padding = 0.15
    for i, pop in enumerate(populations[1:], 1):
        if prev_pop != pop:
            # get ytick positions and labels
            y_ticks.append((pop_coord_start + i + cumulative_padding - 1) / 2)
            y_labels.append(prev_pop)
            prev_pop = pop
            # add white space between populations
            cumulative_padding += padding
            pop_coord_start = i + cumulative_padding
        # determine y coords --> add white space between populations
        y_coords.append(i + cumulative_padding)
    y_ticks.append((pop_coord_start + i + cumulative_padding) / 2)
    y_labels.append(prev_pop)
    prev_k = []
    # do plotting
    for i in range(K):
        if i > 0:
            ax.barh(y_coords, df.loc[:, i], left=df.loc[:, prev_k].sum(axis=1), height=1, color=colors[i])
        else:
            ax.barh(y_coords, df.loc[:, i], height=1, color=colors[i])
        prev_k.append(i)
    # formatting
    ax.set_yticks(y_ticks)
    ax.set_yticklabels([x.replace('Sabue', 'Chabu').replace('Xun', '!Xun').replace('_ERR_', "_Errachidia_")\
                        .replace('Juhoansi', "Ju|'Hoan").replace('SEBantu', 'BantuSAfrica')\
                        .replace('SWBantu', 'Herero').replace('Haihom', 'Hai||om').replace("_TIZ_", "_Tiznit_")\
                        .replace('_Sen_', '_Sened_').replace('_Chen_', '_Chenini_').replace('_', " ")\
                        .replace("BER", "Berber")
                        for x in y_labels], fontsize=5)
    ax.set_xticks([])
    ax.set_xticklabels([])
    ax.invert_yaxis()
    ax.set_xlabel(f"K={K}\nCV={round(cv, 4)}", fontsize=8)
    ax.set_ylim([len(populations) + cumulative_padding - 0.5, -0.5])
    ax.set_xlim([0, 1])
    fig.savefig(f"{prefix}_admixture_plot_best_K.pdf", bbox_inches='tight', dpi=600)


def plot_all_k(qfiles, Kvals, cv_error, fam, individual_order, colors, prefix):
    """
    Plot admixture proportions across all K
    :param qfiles: list, list of paths to aligned .Q files
    :param Kvals: list-like, K values that were used
    :param cv_error: array-like, cross-validation errors of individual runs
    :param fam: pd.DataFrame, df of plink .fam file
    :param individual_order: list, order of individudals in which to plot them to guarantee consistency
    :param colors: list, colors used for plotting
    :param prefix: str, prefix of output figure "_admixture_plots.pdf" will be appended
    """
    fig, ax = plt.subplots(1, len(qfiles), figsize=(8, 11))
    plt.subplots_adjust(wspace=0.15)
    min_k = min(Kvals)
    for qfile, K, cv in zip(qfiles, Kvals, cv_error):
        df = pd.read_csv(qfile, sep=' ', header=None, index_col=None)
        df = df.iloc[:, 6:]
        df.columns = [i for i in range(K)]
        df['population'] = fam.fid.values
        df_sorted = df.loc[individual_order]
        plot_admixture_proportions(df_sorted, K, ax[K - min_k], cv, colors, min_k)
    fig.savefig(prefix + "_admixture_plots.pdf", bbox_inches='tight', dpi=600)
    plt.close()


def plot_admixture_proportions(df, K, ax, cv, colors, min_k):
    """
    Helper to plot admixture proportions of for specific value of K
    :param df: pd.DataFrame, sorted DataFrame with ancestry proportions
    :param K: int, K used for admixture run
    :param ax: ax object to plot on
    :param cv: float, CV error of ADMIXTURE run
    :param colors: list, list of colors to use
    :param min_k: int, minimum k that was used in any run
    :return: ax
    """
    populations = df.population.values
    y_coords = [0]
    y_ticks = []
    y_labels = []
    prev_pop = populations[0]
    pop_coord_start = 0
    cumulative_padding = 0
    padding = 0.15
    for i, pop in enumerate(populations[1:], 1):
        if prev_pop != pop:
            # get ytick positions and labels
            y_ticks.append((pop_coord_start + i + cumulative_padding - 1) / 2)
            y_labels.append(prev_pop)
            prev_pop = pop
            # add white space between populations
            cumulative_padding += padding
            pop_coord_start = i + cumulative_padding
        # determine y coords --> add white space between populations
        y_coords.append(i + cumulative_padding)
    y_ticks.append((pop_coord_start + i + cumulative_padding) / 2)
    y_labels.append(prev_pop)
    prev_k = []
    # do plotting
    for i in range(K):
        if i > 0:
            ax.barh(y_coords, df.loc[:, i], left=df.loc[:, prev_k].sum(axis=1), height=1, color=colors[i])
        else:
            ax.barh(y_coords, df.loc[:, i], height=1, color=colors[i])
        prev_k.append(i)
    # formatting
    if K > min_k:
        ax.set_yticks([])
    else:
        ax.set_yticks(y_ticks)
        ax.set_yticklabels([x.replace('Sabue', 'Chabu').replace('Xun', '!Xun').replace('_ERR_', "_Errachidia_")\
                           .replace('Juhoansi', "Ju|'Hoan").replace('SEBantu', 'BantuSAfrica')\
                           .replace('SWBantu', 'Herero').replace('Haihom', 'Hai||om').replace("_TIZ_", "_Tiznit_")\
                           .replace('_Sen_', '_Sened_').replace('_Chen_', '_Chenini_').replace('_', " ")\
                           .replace("BER", "Berber")
                            for x in y_labels], fontsize=5)
    ax.set_xticks([])
    ax.set_xticklabels([])
    ax.invert_yaxis()
    ax.set_xlabel(f"K={K}\nCV={round(cv, 4)}", fontsize=5)
    ax.set_ylim([len(populations) + cumulative_padding - 0.5, -0.5])
    ax.set_xlim([0, 1])
    return ax


def main(argv):
    parser = argparse.ArgumentParser()
    parser.add_argument('-i', '--input_dir',
                        help='Input directory of "converted" files, i.e., .Q files aligned with CLUMPAK')
    parser.add_argument('-l', '--logfile_dir',
                        help='Directory containing log files from admixture runs to extract CV error.')
    parser.add_argument('-f', '--fam', help='Plink fam file to extract individual and family IDs')
    parser.add_argument('-c', '--coords', help='File with individual geographic coordinates')
    parser.add_argument('--vmin', type=float,
                        help='Only admixture proportions greater vmin are considered for Kriging method. default=0.5',
                        default=0.5)
    parser.add_argument('-k', '--K_kriging', type=int, help='K to use for plotting Kriging')
    parser.add_argument('-o', '--output_prefix', help='Output prefix')
    args = parser.parse_args()
    input_dir = args.input_dir
    log_dir = args.logfile_dir
    if not input_dir.endswith('/'):
        input_dir += '/'
    if not log_dir.endswith('/'):
        log_dir += '/'
    # get input q files
    aligned_q_files = glob.glob(input_dir + "*.converted")
    # read fam file
    fam = pd.read_csv(args.fam, sep='\t', names=['fid', 'iid'], usecols=[0, 1])
    # get k values
    Kvals = [int(f.split('.')[-3]) for f in aligned_q_files]
    # sort based on K
    aligned_q_files = np.array(aligned_q_files)[np.argsort(Kvals)]
    Kvals = np.sort(Kvals)
    max_k = max(Kvals)

    # initialize colors
    colors = ["purple", "orange", "blue", "green"]
    # pad colors
    additional_colors = []
    while max_k > len(colors) + len(additional_colors):
        additional_colors.append((np.random.random(), np.random.random(), np.random.random()))
        additional_colors = list(set(additional_colors))
    colors.extend(additional_colors)

    cv_error = get_best_k_and_plot_best_k(aligned_q_files, Kvals, fam, log_dir, colors, args.output_prefix)
    individual_order = sort_individuals_specified_K(aligned_q_files[np.where(Kvals == args.K_kriging)[0][0]],
                                                    args.K_kriging, fam)
    plot_all_k(aligned_q_files, Kvals, cv_error, fam, individual_order, colors, args.output_prefix)
    plot_krigin(aligned_q_files[np.where(Kvals == args.K_kriging)[0][0]], args.K_kriging, args.coords, args.vmin, fam,
                colors, args.output_prefix)


if __name__ == '__main__':
    main(sys.argv[1:])

Python Pandas numpy matplotlib seaborn pLink Cartopy PyKrige shapely From line 2 of scripts/visualize_admixture.py

shell:
    "wget -q {params.url}; unzip {params.zip}; rm {params.zip}; mv {output} {output}_tmp; "
    "unzip -d {output}_tmp {output}_tmp/26_03_2015_CLUMPAK.zip; mv {output}_tmp/26_03_2015_CLUMPAK/CLUMPAK .;"
    "rm -r {output}_tmp; chmod +x {output}/distruct/*; chmod +x {output}/CLUMPP/*; chmod +x {output}/mcl/*"

SnakeMake oschmod From line 53 of main/Snakefile

shell:
    "wget -q -P {params.path} {params.url}"

SnakeMake From line 64 of main/Snakefile

shell:
    "wget -q -O- {params.url} | awk -F \"\\t\" 'BEGIN{{OFS=\"\\t\"}}{{print $3, $5, $6, $7, $8, $12, $13, $2}}' | "
    "grep -E \"{params.pops_of_interest}\" > {output}"

SnakeMake From line 74 of main/Snakefile

shell:
    "mkdir -p  {params.dir}; tar xvf {input} -C {params.dir}"

SnakeMake From line 85 of main/Snakefile

shell:
    "{params.bcftools} merge -0 -r {params.chrom} -Oz -o {output} --threads {threads} {input}"

SnakeMake From line 103 of main/Snakefile

shell:
    "plink2 --vcf {input} --snps-only --max-alleles 2 --make-bed --set-all-var-ids @:# "
    "--out {params.base} --threads {threads}"

SnakeMake plink2 From line 116 of main/Snakefile

shell:
    "cut -f2 {input.fam} | tr \"\n\" \"|\" | xargs -I {{}} grep -Ew {{}} {input.metadata} | "
    "awk -F \"\\t\" 'BEGIN{{OFS=\"\\t\"}}{{print 0, $NF}}' > {output};"

SnakeMake From line 126 of main/Snakefile

shell:
    "plink2 --bfile {params.input_base} --keep {input.samples} --make-bed --out {params.output_base} --threads {threads}"

SnakeMake plink2 From line 143 of main/Snakefile

shell:
    "cut -f2 {input.old} | tr \"\n\" \"|\" | xargs -I {{}} grep -E {{}} {input.metadata} | "
    "awk -F \"\\t\" 'BEGIN{{OFS=\"\\t\"}}{{print $3, $2}}' | paste {input.old} - > {output}"

SnakeMake From line 152 of main/Snakefile

shell:
    'plink2 --bfile {params.input_base} --update-ids {input.ids} --make-bed --out {params.output_base} '
    '--threads {threads}'

SnakeMake plink2 From line 170 of main/Snakefile

shell:
    "cut -f1 {input.fam} | sort | uniq | xargs -I {{}} bash -c \"grep {{}} {input.fam} | cut -f1,2 | "
    "shuf -n{params.n_samples}\" bash > {output}"

SnakeMake From line 181 of main/Snakefile

shell:
    "plink2 --bfile {params.input_base} --keep {input[0]} --make-bed --out {params.output_base} --threads {threads}"

SnakeMake plink2 From line 197 of main/Snakefile

shell:
    "cut -f1,2 {input.coords} > {output.to_keep}"

SnakeMake From line 206 of main/Snakefile

shell:
    "plink2 --bfile {params.input_base} --keep {input.keep} --make-bed --out {params.output_base} --threads {threads}"

SnakeMake plink2 From line 221 of main/Snakefile

shell:
    "cut -f2 {input.fam} | xargs -I {{}} grep {{}} {input.meta} | "
    "awk -F \"\\t\" 'BEGIN{{OFS=\"\\t\"}}{{print $3, $2}}' | sed -e 's/\///g' | "
    "paste <(cut -f1,2 {input.fam} | sed 's/\\s/\\t/g') - > {output}"

SnakeMake From line 230 of main/Snakefile

shell:
    "plink2 --bfile {params.input_base} --update-ids {input.ids} --make-bed --threads {threads} "
    "--out {params.output_base}"

SnakeMake plink2 From line 247 of main/Snakefile

shell:
    "cat {input.samples} | sed -e 's/\///g' -e 's/|/|\\\/' | "
    "cut -f3 | sort | uniq | xargs -I {{}} bash -c \"grep -w \\\"{{}}\\\" <( cat {input.samples} | sed -e 's/\///g') | "
    "shuf -n {params.n_samples} | cut -f3,4 >> {output}\" sh"

SnakeMake From line 259 of main/Snakefile

shell:
    "plink2 --bfile {params.input_base} --keep {input.samples} --make-bed --out {params.output_base} "
    "--threads {threads}"

SnakeMake plink2 From line 276 of main/Snakefile

shell:
    "wget -q -P {params.data} {params.url}"

SnakeMake From line 286 of main/Snakefile

shell:
    "tar xvzf {input} -C {params.data_path}"

SnakeMake From line 296 of main/Snakefile

shell:
    "cut -d ' ' -f1 {input.fam} | sort | uniq | xargs -I {{}} bash -c \"grep {{}} {input.fam} | cut -d ' ' -f1,2 | "
    "shuf -n {params.n_samples}\" bash | grep -v -E \"{params.exclude}\" > {output}"

SnakeMake From line 307 of main/Snakefile

shell:
    "plink2 --bfile {params.input_base} --keep {input.to_keep} --make-bed --set-all-var-ids @:# "
    "--out {params.output_base} --threads {threads}"

SnakeMake plink2 From line 323 of main/Snakefile

shell:
    "cut -f2 {input} | sort | uniq -d > {output}"

SnakeMake From line 332 of main/Snakefile

shell:
    "plink2 --bfile {params.input_base} --exclude {input.ids} --make-bed --out {params.outputbase} "
    "--threads {threads}"

SnakeMake plink2 From line 347 of main/Snakefile

shell:
    "wget -q -P {params.data_path} {params.url_fam}; mv {params.fam_file} {output.fam}; "
    "wget -q -P {params.data_path} {params.url_tped}; "
    "tar -xzvf {params.tped_file}.tar.gz; mv {params.tped_file} {output.tped}; "
    "rm {params.tped_file}.tar.gz; "

SnakeMake From line 361 of main/Snakefile

shell:
    "plink --tfile {params.base} --recode --make-bed --out {params.base} --threads {threads}"

SnakeMake pLink From line 377 of main/Snakefile

shell:
    "cut -d ' ' -f1 {input.fam} | sort | uniq | xargs -I {{}} bash -c \"grep {{}} {input.fam} | cut -d ' ' -f1,2 | "
    "shuf -n {params.n_samples}\" bash | grep -v -E \"{params.exclude}\" > {output}"

SnakeMake From line 388 of main/Snakefile

shell:
    "plink2 --bfile {params.input_base} --keep {input.to_keep} --make-bed --set-all-var-ids @:# "
    "--snps-only --max-alleles 2 --out {params.output_base} --threads {threads}"

SnakeMake plink2 From line 404 of main/Snakefile

shell:
    "wget -q {params.url_bed}; mv {params.name_bed} {params.data_path}arauna_et_al_2017.bed; "
    "wget -q {params.url_bim}; mv {params.name_bim} {params.data_path}arauna_et_al_2017.bim; "
    "wget -q {params.url_fam}; mv {params.name_fam} {params.data_path}arauna_et_al_2017.fam;"

SnakeMake From line 419 of main/Snakefile

shell:
    "cut -d ' ' -f1 {input.coords} | tail -n+2 | xargs -I {{}} bash -c \"grep -w {{}} {input.fam} | "
    "shuf -n{params.n_samples} | cut -d ' ' -f1,2\" > {output}"

SnakeMake From line 433 of main/Snakefile

shell:
    "plink2 --bfile {params.input_base} --keep {input.to_keep} --make-bed --set-all-var-ids @:# "
    "--out {params.output_base} --threads {threads}"

SnakeMake plink2 From line 449 of main/Snakefile

shell:
    "cut -f2 {input} | sort > {output}"

SnakeMake From line 478 of main/Snakefile

shell:
    "comm -12 {input} > {output}"

SnakeMake From line 493 of main/Snakefile

shell:
    "plink2 --bfile {params.input_base} --extract {input.snps} --make-bed --out {params.output_base}"

SnakeMake plink2 From line 509 of main/Snakefile

shell:
    "if [[ -f {params.output_base}_trial1-merge.missnp ]]; " # second trial --> flip SNPs once
    "then "
        "plink --bfile {params.base_b} --flip {params.output_base}_trial1-merge.missnp --make-bed --out "
        "{params.base_b}_tmp --threads {threads}; " # flip once
        "rm {params.output_base}_trial1-merge.missnp; "
        "plink --bfile {params.base_a} --bmerge {params.base_b}_tmp --make-bed --geno {params.geno} "
        "--maf {params.maf} --out {params.output_base}_trial2 --threads {threads}; "
        "mv {params.output_base}_trial2.bed {params.output_base}.bed; "
        "mv {params.output_base}_trial2.bim {params.output_base}.bim; "
        "mv {params.output_base}_trial2.fam {params.output_base}.fam; "
    "elif [[ -f {params.output_base}_trial2-merge.missnp ]]; " # last trial --> excluding SNPS that caused problems
    "then "
        "plink --bfile {params.base_a} --exclude {params.output_base}_trial2-merge.missnp --make-bed "
        "--out {params.base_a}_tmp --threads {threads}; "
        "plink --bfile {params.base_b}_tmp --exclude {params.output_base}_trial2-merge.missnp --make-bed "
        "--out {params.base_b}_tmp1 --threads {threads}; "
        "plink --bfile {params.base_a}_tmp --bmerge {params.base_b}_tmp1 --make-bed --geno {params.geno} "
        "--maf {params.maf} --out {params.output_base} --threads {threads}; "
        "rm {params.base_a}_tmp.*; "
        "rm {params.base_b}_tmp.*; "
        "rm {params.base_b}_tmp1.*; "
        "rm {params.output_base}_trial2-merge.missnp; "
    "else " # first pass
        "plink --bfile {params.base_a} --bmerge {params.base_b} --make-bed --geno {params.geno} "
        "--maf {params.maf} --out {params.output_base}_trial1 --threads {threads}; "
        "mv {params.output_base}_trial1.bed {params.output_base}.bed; "
        "mv {params.output_base}_trial1.bim {params.output_base}.bim; "
        "mv {params.output_base}_trial1.fam {params.output_base}.fam; "
    "fi"

SnakeMake pLink From line 539 of main/Snakefile

shell:
    "awk -F '\\t' '{{if ($5 == \"A\" && $6 != \"T\" || $5 != \"A\") print $0}}' {input} | "
    "awk -F '\\t' '{{if ($5 == \"T\" && $6 != \"A\" || $5 != \"T\") print $0}}' | "
    "awk -F '\\t' '{{if ($5 == \"C\" && $6 != \"G\" || $5 != \"C\") print $0}}' | "
    "awk -F '\\t' '{{if ($5 == \"G\" && $6 != \"C\" || $5 != \"G\") print $2}}' | sort > {output}"

SnakeMake From line 575 of main/Snakefile

shell:
    "plink2 --bfile {params.input_base} --extract {input[0]} --make-bed --out {params.output_base} "
    "--threads {threads}"

SnakeMake plink2 From line 593 of main/Snakefile

shell:
    "awk  -F '\\t' 'BEGIN{{OFS=\"\\t\"}}{{if ($5 == \".\" || $6 == \".\") print $2}}' {input} > {output}"

SnakeMake From line 602 of main/Snakefile

shell:
    "grep -v -w -f {input.exclude} {input.bim} | cut -f2 | sort > {output}"

SnakeMake From line 611 of main/Snakefile

shell:
    "comm -12 {input} > {output}"

SnakeMake From line 620 of main/Snakefile

shell:
    'cut -f2 {input} | sort > {output}'

SnakeMake From line 668 of main/Snakefile

shell:
    'cut -f2 {input} | sort > {output}'

SnakeMake From line 676 of main/Snakefile

shell:
    "comm -12 {input} > {output}"

SnakeMake From line 685 of main/Snakefile

shell:
    "cut -f2 {input} | sort > {output}"

SnakeMake From line 731 of main/Snakefile

shell:
    "cut -f2 {input} | sort > {output}"

SnakeMake From line 739 of main/Snakefile

shell:
    "comm -12 {input} > {output}"

SnakeMake From line 748 of main/Snakefile

shell:
    "cut -f2 {input} | sort > {output}"

SnakeMake From line 799 of main/Snakefile

shell:
    "cut -f2 {input} | sort > {output}"

SnakeMake From line 807 of main/Snakefile

shell:
    "comm -12 {input} > {output}"

SnakeMake From line 816 of main/Snakefile

shell:
    "plink2 --bfile {params.input_base} --indep-pairwise 50 1 {params.max_r2} --out {params.input_base} "
    "--threads {threads}"

SnakeMake plink2 From line 874 of main/Snakefile

shell:
    "plink2 --bfile {params.input_base} --extract {input.extract} --make-bed --out {params.output_base} "
    "--threads {threads}"

SnakeMake plink2 From line 891 of main/Snakefile

shell:
    "plink2 --bfile {params.base} --pca 20 --out {params.base} --threads {threads}"

SnakeMake plink2 From line 906 of main/Snakefile

shell:
    "scripts/plot_pca.py --eigenvec {input.eigenvec} --plot_dir {params.output_dir} --coords {input.coords}"

SnakeMake From line 917 of main/Snakefile

shell:
    "cut -f2 {input.sgdp_fam} | xargs -I {{}} bash -c \"grep {{}} {input.meta_sgdp} | cut -f6,7\" | "
    "paste <(cut -f1,2 {input.sgdp_fam}) - > {output}; "
    "cut -f2 {input.crawford_and_scheinfeldt_fam} | "
    "xargs -I {{}} bash -c \"grep {{}} {input.meta_crawford_and_scheinfeldt}\" | cut -f4,5 | "
    "paste <(cut -f1,2 {input.crawford_and_scheinfeldt_fam}) - >> {output}; "
    "cut -f1 {input.hollfelder_fam} | xargs -I {{}} bash -c \"grep {{}} {input.meta_hollfelder}\" | cut -f6,7 | "
    "paste <(cut -f1,2 {input.hollfelder_fam}) - >> {output}; "
    "cut -f1 {input.arauna_fam} | xargs -I {{}} bash -c \"grep -w {{}} {input.meta_arauna}\" | cut -d ' ' -f2,3 | "
    "paste <(cut -f1,2 {input.arauna_fam}) - | sed -e 's/\\s/\\t/g' >> {output}; "
    "cut -f1 {input.fortes_lima_fam} | xargs -I {{}} bash -c \"grep -w {{}} {input.meta_fortes_lima}\" | "
    "cut -d ' ' -f2,3 | paste <(cut -f1,2 {input.fortes_lima_fam}) - | sed -e 's/\\s/\\t/g' >> {output}; "
    "cut -f1 {input.schlebusch_fam} | xargs -I {{}} bash -c \"grep -w {{}} {input.meta_schlebusch}\" | "
    "cut -d ' ' -f2,3 | paste <(cut -f1,2 {input.schlebusch_fam}) - | sed -e 's/\\s/\\t/g' >> {output}; "

SnakeMake From line 937 of main/Snakefile

shell:
    "admixture --cv {input} {wildcards.K} -j{threads} > {params.prefix}.{wildcards.K}.log; "
    "mkdir -p {params.results}; "
    "mv {params.prefix}.{wildcards.K}.* {params.results}"

SnakeMake From line 965 of main/Snakefile

shell:
    "cd {params.results_path}; zip {params.zip} {params.files}"

SnakeMake From line 979 of main/Snakefile

shell:
    "cut -f1 {input} > {output}"

SnakeMake From line 987 of main/Snakefile

shell:
    "cd {input.clumpak_dir}; cpanm List::Permutor; "
    "perl distructForManyKs.pl --id 1 --dir ../{output} --file ../{input.qfiles} "
    "--inputtype admixture --indtopop ../{input.pops}"

SnakeMake From line 999 of main/Snakefile

shell:
    "scripts/visualize_admixture.py -i {input.clumpak}/aligned.files/ -f {input.fam} -l {params.results_path} "
    "-c {input.coords} -k {params.K_kriging} -o {params.output_prefix}"

SnakeMake From line 1019 of main/Snakefile

shell:
    "scripts/run_feems.py -d {params.data_path} -p {params.prefix} -c {input.pop_coords} -o {params.output_path}"

SnakeMake From line 1036 of main/Snakefile

ShowHide 59 more snippets with no or duplicated tags.

Comments

Support

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Created: 1yr ago

Updated: 1yr ago

Maitainers: public

URL: https://github.com/LachanceLab/AfricanPopulationStructure

Name: africanpopulationstructure

Version: 1

Badge:

Insert copied code into your website to add a link to this workflow.

License: None

Keywords:

plink2 FEEMS Pandas pLink Snakemake Cartopy matplotlib numpy oschmod pandas-plink PyKrige seaborn shapely sklearn

Future updates

Related Workflows

psychip_snakemake — Show Details View Workflow

ENCODE pipeline for histone marks developed for the psychENCODE project

public

psychip pipeline is an improved version of the ENCODE pipeline for histone marks developed for the psychENCODE project. The o...

raw sequence reads Alignment Sequence alignment report macs2 ucsc-bedclip bedGraphToBigWig BEDTools BWA Picard SAMtools Snakemake

Free

Near-real time tracking of SARS-CoV-2 in Connecticut

public

Repository containing scripts to perform near-real time tracking of SARS-CoV-2 in Connecticut using genomic data. This pipeli...

JSON nextclade Augur Biopython FOCUS Pandas Snakemake bs4 epiweeks geopy matplotlib numpy pycountry pycountry-convert uszipcode

Free

cellranger-snakemake-gke — Show Details View Workflow

snakemake workflow to run cellranger on a given bucket using gke.

public

A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...

macs2 ucsc-bedclip bedGraphToBigWig BEDTools BWA Picard SAMtools Snakemake

Free

ATLAS - Three commands to start analyzing your metagenome data

public

Metagenome-atlas is a easy-to-use metagenomic pipeline based on snakemake. It handles all steps from QC, Assembly, Binning, t...

raw sequence reads Genome assembly Annotation track checkm2 gunc prodigal snakemake-wrapper-utils MEGAHIT Atlas BBMap Biopython BioRuby Bwa-mem2 cd-hit CheckM DAS Diamond eggNOG-mapper v2 MetaBAT 2 Minimap2 MMseqs MultiQC Pandas Picard pyfastx SAMtools SemiBin Snakemake SPAdes SqueezeMeta TADpole VAMB CONCOCT ete3 gtdbtk h5py networkx numpy plotly psutil utils metagenomics

Free

175

rna-seq-star-deseq2 — Show Details View Workflow

RNA-seq workflow using STAR and DESeq2

public

This workflow performs a differential gene expression analysis with STAR and Deseq2. The usage of this workflow is described ...

Free

dna-seq-gatk-variant-calling — Show Details View Workflow

This Snakemake pipeline implements the GATK best-practices workflow

public

This Snakemake pipeline implements the GATK best-practices workflow for calling small germline variants. The usage of thi...

VCF raw sequence reads Variant calling genetic variants gatk rust-bio-tools snakemake-wrapper-utils tabix BCFtools BWA FastQC MultiQC Pandas Picard SAMtools Snakemake Trimmomatic Variant Effect Predictor (VEP) common matplotlib numpy seaborn DNA

Free