Help improve this workflow!
This workflow has been published but could be further improved with some additional meta data:- Keyword(s) in categories input, output, operation, topic
You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .
This Snakemake workflow automatically generates all results and figures from the Bioconda paper.
Requirements
Any 64-bit Linux installation with GLIBC 2.5 or newer (i.e. any Linux distribution that is newer than CentOS 6). Note that the restriction of this workflow to Linux is purely a design decision (to save space and ensure reproducibility) and not related to Conda/Bioconda. Bioconda packages are available for both Linux and MacOS in general.
Usage
This workflow can be used to recreate all results found in the Bioconda paper.
Step 1: Setup system
Variant a: Installing Miniconda on your system
If you are on a Linux system with GLIBC 2.5 or newer (i.e. any Linux distribution that is newer than CentOS 6), you can simply install Miniconda3 with
curl -o /tmp/miniconda.sh https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh && bash /tmp/miniconda.sh
Make sure to answer
yes
to the question whether your PATH variable shall be modified.
Afterwards, open a new shell/terminal.
Variant b: Use a Docker container
Otherwise, e.g., on MacOS or if you don't want to modify your system setup, install Docker , run
docker run -it continuumio/miniconda3 /bin/bash
and execute all the following steps within that container.
Variant c: Use an existing Miniconda installation
If you want to use an existing Miniconda installation, please be aware that this is only possible if it uses Python 3 by default. You can check this via
python --version
Further, ensure it is up to date with
conda update --all
Step 2: Setup Bioconda channel
Setup Bioconda with
conda config --add channels defaults
conda config --add channels conda-forge
conda config --add channels bioconda
Step 3: Install bioconda-utils and Snakmake
Install bioconda-utils and Snakemake >=4.6.0 with
conda install bioconda-utils snakemake
If you already have an older version of Snakemake, please make sure it is updated to >=4.6.0.
Step 4: Download the workflow
First, create a working directory:
mkdir bioconda-workflow
cd bioconda-workflow
Then, download the workflow archive from https://doi.org/10.5281/zenodo.1068297 and unpack it with
tar -xf bioconda-paper-workflow.tar.gz
Step 5: Run the workflow
Execute the analysis workflow with Snakemake
snakemake --use-conda
Please wait a few minutes for the analysis to finish.
Results can be found in the folder
figs/
.
If you have been running the workflow in the docker container (see above),
you can obtain the results with
docker cp <container-id>:/bioconda-workflow/figs .
whith
<container-id>
being the ID of the container.
Known errors
-
If you see an error like
ImportError: No module named 'appdirs'
when starting Snakemake, you are likely suffering from a bug in an older conda version. Make sure to update your conda installation with
conda update --all
and then reinstall the
appdirs
andsnakemake
package withconda install -f appdirs snakemake
-
If you see an error like
ImportError: Missing required dependencies ['numpy']
you are likely suffering from a bug in an older conda version. Make sure to update your conda installation with
conda update --all
and then reinstall the
snakemake
package withconda install -f snakemake
Code Snippets
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | import os from operator import itemgetter from itertools import filterfalse import pandas as pd from github import Github authors = pd.read_table(snakemake.input[0], index_col=0) authors["commits"] = 0 def add(commit): if commit.author and commit.author.login in authors.index: authors.loc[commit.author.login, "commits"] += 1 # add commits github = Github(os.environ["GITHUB_TOKEN"]) repo = github.get_repo("bioconda/bioconda-recipes") utils_repo = github.get_repo("bioconda/bioconda-utils") for commit in repo.get_commits(): add(commit) for commit in utils_repo.get_commits(): add(commit) # order by commits authors.sort_values("commits", inplace=True, ascending=False) first_authors = ["bgruening", "daler"] core_authors = ["chapmanb", "jerowe", "tomkinsc", "rvalieris", "druvus"] last_author = ["johanneskoester"] # put core into the right order authors = pd.concat([ authors.loc[first_authors], authors.loc[core_authors].sort_values("commits", ascending=False), authors.loc[~authors.index.isin(first_authors + core_authors + last_author)], authors.loc[last_author]]) authors.to_csv(snakemake.output.table, sep="\t") |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 | import json import pandas as pd packages = [] ndownloads = [] ecosystem = [] versions = [] deps = [] for path in snakemake.input: with open(path) as f: meta = json.load(f) ndownloads.append(sum(f["ndownloads"] for f in meta["files"])) name = meta["full_name"].split("/")[1] assert name not in packages, "duplicate package: {}".format(name) packages.append(name) versions.append(len(meta["versions"])) if name.startswith("bioconductor-"): ecosystem.append("Bioconductor") elif name.startswith("r-"): ecosystem.append("R") else: def check_for_dep(dep): for f in meta["files"]: for d in f["dependencies"]["depends"]: if d["name"] == dep: return True return False if check_for_dep("python"): ecosystem.append("Python") elif check_for_dep("perl") or check_for_dep("perl-threaded"): ecosystem.append("Perl") else: ecosystem.append("Other") # stores number of dependencies based on the first (0) recipe deps.append(len(meta['files'][0]['attrs']['depends'])) packages = pd.DataFrame({ "package": packages, "downloads": ndownloads, "ecosystem": ecosystem, "versions": versions, "deps": deps }, columns=["package", "ecosystem", "downloads", "versions", "deps"]) packages.sort_values("downloads", ascending=False, inplace=True) packages.to_csv(snakemake.output[0], sep="\t", index=False) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | import os import pandas as pd from github import Github github = Github(os.environ["GITHUB_TOKEN"]) repo = github.get_repo("bioconda/bioconda-recipes") prs = [] titles = [] files = [] spans = [] for pr in repo.get_pulls(state="closed"): print(pr) if pr.merged: prs.append(pr.id) titles.append(pr.title) files.append(pr.changed_files) spans.append(pr.merged_at - pr.created_at) prs = pd.DataFrame({ "id": prs, "title": titles, "changed_files": files, "span": spans }) prs.to_csv(snakemake.output[0], sep="\t", index=False) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | import os import pandas from bioconda_utils import utils repo_dir = os.path.dirname(os.path.dirname(snakemake.input[0])) recipes = list(utils.get_recipes(os.path.join(repo_dir, 'recipes'))) config = os.path.join(repo_dir, 'config.yml') df = [] for r in recipes: meta = next(utils.load_all_meta(r,config)) d = dict( not_bio_related=" ", summary=meta.get('about', {}).get('summary', "").replace('\n', ''), name=meta['package']['name'], url=meta.get('about', {}).get('home', ""), ) df.append(d) df = pandas.DataFrame(df).drop_duplicates('name') df = df.sort_values('name') df = df[['not_bio_related', 'name', 'summary', 'url']] df.to_csv(snakemake.output[0], sep='\t', index=False) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | from snakemake.shell import shell import matplotlib matplotlib.use("agg") import pandas as pd import seaborn as sns import networkx as nx import glob import os from networkx.drawing.nx_pydot import read_dot, graphviz_layout from matplotlib.colors import rgb2hex import matplotlib.pyplot as plt from matplotlib.ticker import NullLocator packages = pd.read_table(snakemake.input.pkg, index_col=0) packages.loc[packages.ecosystem == 'Bioconductor', 'ecosystem'] = 'Bioconductor/R' packages.loc[packages.ecosystem == 'R', 'ecosystem'] = 'Bioconductor/R' lookup = packages['ecosystem'].to_dict() colors = dict(zip(['Bioconductor/R', 'Other', 'Python', 'Perl'], sns.color_palette('colorblind'))) g = read_dot(snakemake.input.dag) # reduce to largest connected component g = max(nx.weakly_connected_component_subgraphs(g), key=len) pkg = snakemake.wildcards.pkg # obtain dependencies deps = set(nx.ancestors(g, pkg)) sub = deps | {pkg} pos = graphviz_layout(g, prog='neato') plt.figure(figsize=(6,6)) # draw DAG nx.draw_networkx_edges(g, pos, edge_color='#777777', alpha=0.5, arrows=False) nx.draw_networkx_nodes(g, pos, node_color='#333333', alpha=0.5, node_size=6) # draw induced subdag nx.draw_networkx_edges(g, pos, edgelist=[(u, v) for u, v in g.edges(sub) if u in sub and v in sub], edge_color='k', width=3.0, arrows=False) nx.draw_networkx_nodes(g, pos, nodelist=deps, node_color=[rgb2hex(colors[lookup[v]]) for v in deps], linewidths=0, node_size=120) nx.draw_networkx_nodes(g, pos, nodelist=[pkg], node_color=rgb2hex(colors[lookup[pkg]]), linewidths=0, node_size=120, node_shape='s') xs = [x for x, y in pos.values()] ys = [y for x, y in pos.values()] plt.xlim((min(xs) - 10, max(xs) + 10)) plt.ylim((min(ys) - 10, max(ys) + 10)) # remove whitespace plt.axis('off') plt.gca().xaxis.set_major_locator(NullLocator()) plt.gca().yaxis.set_major_locator(NullLocator()) plt.savefig(snakemake.output[0], bbox_inches='tight') |
1 2 3 4 5 6 7 8 9 10 11 | from svgutils.compose import * from common import label Figure( "22cm", "6cm", Panel(SVG(snakemake.input.ecosystems), label("a")), Panel(SVG(snakemake.input.downloads), label("b")).move(285, 0), Panel(SVG(snakemake.input.comp).scale(0.9).move(10, 0), label("c")).move(560, 0), Panel(SVG(snakemake.input.age).scale(0.9).move(19, 0)).move(560, 90), # Grid(40, 40) ).save(snakemake.output[0]) |
1 2 3 4 5 6 7 8 9 10 11 12 13 | from svgutils.compose import * from common import label Figure( "24cm", "6.1cm", Panel(SVG(snakemake.input.contributions), label("a")), Panel(SVG(snakemake.input.add_del), label("b").move(0, -10)).move(0, 90), Panel(SVG(snakemake.input.dag).scale(0.6), label("c")).move(285, 0), Panel(SVG(snakemake.input.workflow).scale(0.5), label("d")).move(505, 0), Panel(SVG(snakemake.input.turnaround).scale(0.9).move(5, 0), label("e")).move(505, 50), Panel(SVG(snakemake.input.usage).scale(0.5), label("f")).move(505, 130) #Grid(40, 40) ).save(snakemake.output[0]) |
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 | import pandas import os import datetime infile = snakemake.input[0] outfile = snakemake.output[0] class chunk(object): def __init__(self, block): commit, author, time = block[0].split('\t') self.author = author self.time = datetime.datetime.strptime(time.split('T')[0], "%Y-%m-%d") self._block = block self.recipes = self._parse_recipes(block[1:]) def _parse_recipes(self, block): recipes = [] for i in block: if not i.startswith('recipes/'): continue if os.path.basename(i) != 'meta.yaml': continue recipes.append(os.path.dirname(i.replace('recipes/', ''))) return set(recipes) def gen(): lines = [] for line in open(infile): line = line.strip() if len(line) == 0: yield chunk(lines) lines = [] continue lines.append(line.strip()) yield chunk(lines) dfs = [] cumulative_recipes = set() cumulative_authors = set() for i in sorted(gen(), key=lambda x: x.time): if len(i.recipes) == 0: continue unique_recipes = i.recipes.difference(cumulative_recipes) if len(unique_recipes) > 0: dfs.append( { 'time': i.time, 'author': i.author, 'recipes': unique_recipes, 'nadded': len(unique_recipes), 'new_author': i.author not in cumulative_authors }, ) cumulative_recipes.update(i.recipes) cumulative_authors.update([i.author]) df = pandas.DataFrame(dfs) df['cumulative_authors'] = df.new_author.astype(int).cumsum() df['cumulative_recipes'] = df.nadded.cumsum() df["time"] = pandas.to_datetime(df["time"]) df.to_csv(outfile, sep='\t') |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | import os import matplotlib matplotlib.use("agg") import matplotlib.pyplot as plt import seaborn as sns from github import Github import matplotlib.dates as mdates import common github = Github(os.environ["GITHUB_TOKEN"]) repo = github.get_repo("bioconda/bioconda-recipes") weeks = [] additions = [] deletions = [] print(repo.get_stats_participation().all) for freq in repo.get_stats_code_frequency(): weeks.append(freq.week) additions.append(freq.additions) deletions.append(abs(freq.deletions)) plt.figure(figsize=(4,1.2)) plt.semilogy(weeks, additions, "-", label="additions") plt.semilogy(weeks, deletions, "-", label="deletions") plt.ylabel("count per week") plt.legend(bbox_to_anchor=(0.68, 0.65)) plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m')) plt.xticks(rotation=45, ha="right") sns.despine() plt.savefig(snakemake.output[0], bbox_inches="tight") |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | import matplotlib matplotlib.use("agg") from matplotlib import pyplot as plt import datetime import numpy as np import seaborn as sns import pandas as pd import common try: log = snakemake.input.log pkg = snakemake.input.pkg outfile = snakemake.output[0] except NameError: # run in the scripts dir for interactive clicking of points log = '../git-log/parsed-log.tsv' pkg = '../package-data/all.tsv' outfile = None c = pd.read_table(log) d = pd.read_table(pkg) s = c.apply(lambda x: pd.Series(list(eval(x['recipes']))), axis=1).stack().reset_index(level=1, drop=True) s.name = 'recipe' cc = c.join(s)[['recipe', 'time']] cc['package'] = cc.recipe.apply(lambda x: x.split('/')[0]) e = cc.groupby('package')['time'].agg('min') df = d.set_index('package') df['time'] = pd.to_datetime(e) df['time'] -= pd.Timestamp(datetime.datetime.now()) df['days'] = df.dropna().time.apply(lambda x: -x.days) df['log10 downloads'] = np.log10(df['downloads'] + 1) # note we have to dropna ahead of time so that when interactively picking # points, the event ind matches the df ind df = df.dropna() def callback(event): print(df.iloc[event.ind]) fig = plt.figure() ax = fig.add_subplot(111) sns.regplot('days', 'log10 downloads', df, ax=ax, scatter_kws=dict(picker=5, s=2, color='k', alpha=0.6)) plt.gca().set_xlabel('Package age (days)') sns.despine() if outfile: plt.savefig(outfile) else: plt.gcf().canvas.mpl_connect('pick_event', callback) plt.show() |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | import matplotlib matplotlib.use("agg") import matplotlib.pyplot as plt import seaborn as sns import pandas as pd from datetime import datetime import matplotlib.dates as mdates import numpy as np import common summary = pd.read_table(snakemake.input[0]) bio_related = summary.shape[0] - (summary["not_bio_related"] == "x").sum() # Counts from October 2017 data = pd.DataFrame.from_dict({ "Bioconda": [bio_related, "2015-09"], "Debian Med": [882, "2002-05"], "Gentoo Science": [480, "2005-10"], # category sci-biology "EasyBuild": [371, "2012-03"], # moduleclass bio "Biolinux": [308, "2006"], "Homebrew Science": [297, "2009-10"], # tag bioinformatics "GNU Guix": [254, "2014-12"], # category bioinformatics "BioBuilds": [118, "2015-11"]}, orient="index").reset_index() data.columns = ["source", "count", "date"] data["date"] = pd.to_datetime(data["date"]) # age in years data["age"] = pd.to_timedelta(datetime.now() - data["date"]).astype('timedelta64[M]') / 12 plt.figure(figsize=(4,1)) sns.barplot(x="source", y="count", data=data) plt.gca().set_xticklabels([]) plt.xlabel("") plt.ylabel("Number of explicitly\nbio-related packages") # set maximum tick to be that of bioconda yticks = plt.gca().get_yticks() yticks[-1] = bio_related plt.gca().set_yticks(yticks) sns.despine() plt.savefig(snakemake.output.counts, bbox_inches="tight") plt.figure(figsize=(4,1)) sns.barplot(x="source", y="age", data=data) plt.xlabel("") plt.ylabel("\nage in years") plt.xticks(rotation=45, ha="right") #plt.gca().yaxis.set_major_formatter(mdates.AutoDateFormatter(mdates.AutoDateLocator())) sns.despine() plt.savefig(snakemake.output.age, bbox_inches="tight") # store results as csv data[["source", "count", "age"]].to_csv(snakemake.output.csv, sep="\t", index=False) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | import os import matplotlib matplotlib.use("agg") from matplotlib import pyplot as plt import seaborn as sns import datetime import pandas as pd import matplotlib.dates as mdates import common infile = snakemake.input[0] outfile = snakemake.output[0] df = pd.read_table(infile) df["time"] = pd.to_datetime(df["time"]) fig = plt.figure(figsize=(4,1)) plt.semilogy('time', 'cumulative_authors', data=df, label="contributors") plt.semilogy('time', 'cumulative_recipes', data=df, label="recipes") plt.legend() plt.ylabel("count") plt.xlabel("") # deactivate xticks because we have them in the plot below in the figure plt.xticks([]) sns.despine() fig.savefig(outfile, bbox_inches="tight") |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 | import os import glob import matplotlib matplotlib.use("agg") import matplotlib.pyplot as plt import seaborn as sns import pandas as pd import numpy as np import common plt.figure(figsize=(4,2)) packages = pd.read_table(snakemake.input[0]) total_downloads = packages["downloads"].sum() packages.loc[packages.ecosystem == 'Bioconductor', 'ecosystem'] = 'Bioconductor/R' packages.loc[packages.ecosystem == 'R', 'ecosystem'] = 'Bioconductor/R' # In case we want to filter downloads by whether or not a current recipe exists recipes = set(map(os.path.basename, glob.glob('bioconda-recipes/recipes/*'))) sns.boxplot(x="ecosystem", y="downloads", data=packages, color="white", whis=False, showfliers=False, order=['Bioconductor/R', 'Other', 'Python', 'Perl'], ) sns.stripplot(x="ecosystem", y="downloads", data=packages, jitter=True, alpha=0.5, order=['Bioconductor/R', 'Other', 'Python', 'Perl'], ) plt.gca().set_yscale("log") plt.ylabel("downloads (total: {:,})".format(total_downloads)) sns.despine() plt.savefig(snakemake.output[0], bbox_inches="tight") # Violin plots to see a little more structure (e.g., 3 tiers of downloads in # Perl, BioC, R) and lower-limits (e.g., all BioC downloaded at least once, but # some Perl, Python, R never downloaded). # # Take the log10 ahead of time so the KDE works well. packages['log10 downloads'] = np.log10(packages.downloads + 1) fig = plt.figure(figsize=(4, 3)) ax = fig.add_subplot(1, 1, 1) sns.violinplot( x="ecosystem", y="log10 downloads", alpha=0.5, cut=0, data=packages, ax=ax, order=['Bioconductor/R', 'Other', 'Python', 'Perl'], ) ax.text(x=0.5, y=1.0, s="Total downloads: {:,}".format(total_downloads), horizontalalignment="center", verticalalignment="top", transform=ax.transAxes) ax.set_xlabel('') plt.ylabel("downloads") ax.set_yticklabels(["$10^{{{:.0f}}}$".format(y) for y in ax.get_yticks()]) # make a little room for the "total" text ax.axis(ymax=6) fig.tight_layout() sns.despine() plt.savefig(snakemake.output[1], bbox_inches="tight") |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | import matplotlib matplotlib.use("agg") import matplotlib.pyplot as plt import seaborn as sns import pandas as pd import glob import os import common plt.figure(figsize=(4,2)) summary = pd.read_table(snakemake.input.bio) not_bio_related = summary['name'][summary.not_bio_related == 'x'] packages = pd.read_table(snakemake.input.pkg_data) packages.loc[packages.ecosystem == 'Bioconductor', 'ecosystem'] = 'Bioconductor/R' packages.loc[packages.ecosystem == 'R', 'ecosystem'] = 'Bioconductor/R' recipes = set(map(os.path.basename, glob.glob('bioconda-recipes/recipes/*'))) packages['has_current_recipe'] = packages['package'].isin(recipes) packages['not_bio_related'] = packages['package'].isin(not_bio_related) fig = plt.figure(figsize=(4, 3)) ax = fig.add_subplot(1, 1, 1) all_cnts = packages.ecosystem.value_counts() bio_cnts = packages[~packages.not_bio_related].ecosystem.value_counts() non_cnts = packages[packages.not_bio_related].ecosystem.value_counts() x = range(len(all_cnts)) ax.bar(x=x, height=bio_cnts, color=sns.color_palette()) ax.bar(x=x, height=non_cnts, bottom=bio_cnts, color=sns.color_palette(sns.color_palette(), desat=0.5)) ax.set_ylabel('Available packages') ax.set_xticks(x) ax.set_xticklabels(list(all_cnts.index)) ax.set_ylabel("count") ax.text(x=0.5, y=1, s="Total packages: {}".format(packages.shape[0]), horizontalalignment="center", verticalalignment="top", transform=ax.transAxes) sns.despine() fig.tight_layout() plt.savefig(snakemake.output[0], bbox_inches="tight") |
5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | import glob import matplotlib matplotlib.use("agg") import matplotlib.pyplot as plt import seaborn as sns import json import pandas as pd import common plt.figure(figsize=(4,2)) packages = pd.read_table(snakemake.input[0]) deps = packages["deps"] plt.hist(deps, range(0,30), lw=1) plt.xlim([0,30]) plt.grid() plt.xlabel("Package degree", fontsize=16) plt.savefig(snakemake.output[0], bbox_inches="tight") |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | from datetime import timedelta import matplotlib matplotlib.use("agg") import matplotlib.pyplot as plt import seaborn as sns import pandas as pd import common # Default palette was a little too dark for the text to show up in the last # block; increasing available colors lets us stay on the lighter side of the # palette. sns.set_palette("Greys", n_colors=8) prs = pd.read_table(snakemake.input[0]) prs.span = pd.to_timedelta(prs.span) categories = pd.Series([timedelta(minutes=0), timedelta(minutes=30), timedelta(hours=1), timedelta(hours=5), timedelta(days=1), timedelta(days=365)]) labels = [r"$\leq 30$ min", r"$\leq 1$ hour", r"$\leq 5$ hours", r"$\leq 1$ day", r"$ > 1$ day"] binning = pd.cut(prs.span.dt.total_seconds(), categories.dt.total_seconds(), labels=labels) counts = binning.value_counts() # fix order counts = counts[labels] perc = counts / counts.sum() fig = plt.figure(figsize=(5, 1.1)) ax = fig.add_subplot(1, 1, 1) left = 0 for label, x in perc.items(): ax.barh(y=0, width=x, left=left, label=label) ax.text(left + x/2, 0, label, horizontalalignment='center', verticalalignment='center') left += x sns.despine(top=True, left=True, right=True, trim=True) ax.set_xlabel('Fraction of pull requests merged') ax.yaxis.set_visible(False) fig.tight_layout() fig.subplots_adjust(top=0.9) #plt.pie(counts, shadow=False, labels=counts.index, autopct="%.0f%%") plt.savefig(snakemake.output[0], bbox_inches="tight") |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | import os import pandas as pd import glob import csv packages = pd.read_table(snakemake.input.pkg) # restrict to existing recipes recipes = set(map(os.path.basename, glob.glob('bioconda-recipes/recipes/*'))) packages['has_current_recipe'] = packages['package'].isin(recipes) packages = packages[packages.has_current_recipe] with open(snakemake.output[0], "w") as out: out = csv.writer(out, delimiter="\t") out.writerow(["downloads", packages["downloads"].sum()]) out.writerow(["versions", packages["versions"].sum()]) out.writerow(["packages", packages.shape[0]]) |
50 51 52 53 | shell: "curl -X GET --header 'Accept: application/json' " "https://api.anaconda.org/package/bioconda/{wildcards.package} " "> {output} && sleep 1" |
63 64 | script: "scripts/collect-pkg-data.py" |
72 73 74 75 76 | shell: "rm -rf bioconda-recipes; " "git clone https://github.com/bioconda/bioconda-recipes.git bioconda-recipes; " "cd bioconda-recipes; " "git reset --hard d819a66147566d31316198f89e7744b7a36356fe" |
86 87 88 89 90 91 | shell: '(cd bioconda-recipes && ' 'git log ' '--pretty=format:' '"%h\t%aN\t%aI" ' '--name-only ' |
105 106 107 108 | shell: "cd bioconda-recipes; " "bioconda-utils dag --hide-singletons --format dot " "recipes config.yml > ../{output}" |
118 119 | script: "scripts/parse-log.py" |
127 128 | script: "scripts/collect-pr-data.py" |
139 140 | script: "scripts/collect-summaries-and-urls.py" |
151 152 | script: "scripts/plot-add-del.py" |
162 163 | script: "scripts/plot-package-degrees.py" |
173 174 175 176 177 | shell: "set +o pipefail; ccomps -zX#0 {input} | neato -Tsvg -o {output} " '-Nlabel="" -Nstyle=filled -Nfillcolor="#1f77b4" ' '-Ecolor="#3333335f" -Nwidth=0.2 -LC10 -Gsize="12,12" ' "-Nshape=circle -Npenwidth=0" |
188 189 | script: 'scripts/color-dag.py' |
215 216 | script: "scripts/plot-downloads.py" |
228 229 | script: "scripts/plot-ecosystems.py" |
241 242 | script: "scripts/plot-comparison.py" |
252 253 | script: "scripts/plot-contributions.py" |
264 265 | script: "scripts/plot-age-vs-downloads.py" |
275 276 | script: "scripts/plot-turnaround.py" |
289 290 | script: "scripts/stats.py" |
300 301 | script: "scripts/author-list.py" |
312 313 | script: "scripts/author-tex.py" |
328 329 | script: "scripts/fig1.py" |
344 345 | script: "scripts/fig2.py" |
355 356 | shell: "cairosvg -f {wildcards.fmt} {input} -o {output}" |
Support
- Future updates
Related Workflows





