A Snakemake workflow for easy visualization of genome browser tracks of aligned BAM files powered by the wrapper gtracks for the package pyGenomeTracks.

public 1yr ago Version: v0.1.0 0 bookmarks

View Workflow

Genome Browser Track Visualization Snakemake Workflow Powered by pyGenomeTracks for Aligned BAM Files

A Snakemake workflow for easy visualization of genome browser tracks of aligned BAM files (e.g., RNA-seq, ATAC-seq,...) powered by the wrapper gtracks for the package pyGenomeTracks .

This workflow adheres to the module specifications of MR.PARETO , an effort to augment research by modularizing (biomedical) data science. For more details and modules check out the project's repository.

If you use this workflow in a publication, don't forget to give credits to the authors by citing the URL of this (original) repository (and its DOI, see Zenodo badge above -> coming soon).

Workflow Rulegraph

Authors
Software
Methods
Features
Usage
Configuration
Examples
Links
Resources
Publications

Authors

Stephan Reichl

Software

This project wouldn't be possible without the following software and their dependencies:

Software	Reference (DOI)
deeptools	https://doi.org/10.1093/nar/gkw257
gtracks	https://gitlab.com/salk-tm/gtracks
pygenometracks	https://doi.org/10.1093/bioinformatics/btaa692
samtools	https://doi.org/10.1093/bioinformatics/btp352

Methods

This is a template for the Methods section of a scientific publication and is intended to serve as a starting point. Only retain paragraphs relevant to your analysis. References [ref] to the respective publications are curated in the software table above. Versions (ver) have to be read out from the respective conda environment specifications (workflow/envs/*.yaml file) or post execution in the result directory (/envs/genome_tracks/*.yaml). Parameters that have to be adapted depending on the data or workflow configurations are denoted in squared brackets e.g., [X].

Processing. Aligned (and filtered) BAM files were merged by [group] using samtools (ver) [ref]. Each BAM file was converted to a bigWig file for dowmnstream analysis and visualization using bamCoverage from the command line tool deepTools (ver) [ref]. Finally, we extracted coordinates, extended start and end by [base_buffer] bases, and number of isoforms of all relevant genes/genomic regions [gene_list] from the 12 column BED file genome [genome] annotation [genome_bed].

Visualization. Visualizations for each relevant gene/genomic region and [category] were generated by using the prepared bigWig files and vertically stacking genome browser tracks with their annotation at the [x_axis] and each track scaled by [y_max] reads. Additionally, a visualization including all categories per gene/genomic region was generated. The plotting was performed using the python wrapper gtracks (ver) [ref] for the package pyGenomeTracks (ver) [ref].

The processing and visualizations described here were performed using a publicly available Snakemake (ver) [ref] workflow [ref - cite this workflow here].

Features

The workflow performs the following steps that produce the outlined results:

Processing
1. BAM files of the same group are merged and indexed using samtools. (merged_bams/{group}.bam)
2. A bigWig file per merged BAM file is generated using deepTools::bamCoverage. (bigwigs/{group}.bw)
3. Information per requested gene from the 12-column BED file is retrieved (not necessary for genomic regions).
  - coordinates from the 12-column BED file are extracted and extended at start/end by the parameter base_buffer.
  - the number of isoforms i.e. number of lines in the BED file is determined ( only for genes, for genomic regions it is hardcoded to 1 ) to plot below the tracks.
Visualization
- generate one plot per category of bigWigs and gene/region with the before determined gene-parameters i.e, coordinates and gene-rows (tracks/{category}_{gene/region}.svg).
- generate one plot including ALL categories per gene/region (tracks/ALL_{gene/region}.svg)

Usage

Here are some tips for the usage of this workflow:

Start with the 5-10 most interesting genes (e.g., the most differentially expressed between conditions) and few/relevant samples for a test run.
Set y-max to auto (i.e., '') for the first run to get a feeling of the magnitudes.
Merging BAM files and generating bigWig files takes longest, but is performed only once. The plot generation for different genes/genomic regions afterward is very fast. Therefore, it is recommended to get the workflow going with a few samples and then increase the number of samples.

Configuration

Detailed specifications can be found here ./config/README.md

Examples

--- COMING SOON ---

Resources

UCSC Genome Browser annotation track database
- recommended source for the required 12 column BED file annotation of the respective genome.

Publications

The following publications successfully used this module for their analyses.

Code Snippets

shell:
    """
    conda env export > {output}
    """

SnakeMake From line 18 of rules/envs_export.smk

run:
    with open(output["configs"], 'w') as outfile:
        json.dump(config, outfile)

SnakeMake From line 38 of rules/envs_export.smk

shell:
    """
    cp {input} {output}
    """

SnakeMake From line 59 of rules/envs_export.smk

shell:
    """
    cp {input} {output}
    """

SnakeMake From line 81 of rules/envs_export.smk

shell:
    """
    samtools merge {output.merged_bam} {input}

    samtools index -b {output.merged_bam}
    """

SnakeMake SAMtools From line 18 of rules/prepare.smk

shell:
    """
    bamCoverage --bam {input.merged_bam} \
        -p max --binSize 10  --normalizeUsing RPGC \
        --effectiveGenomeSize {params.genome_size} {params.extendReads} \
        -o "{output.bigwig}" > "{output.bigwig}.log" 2>&1;
    """

SnakeMake DeepTools From line 44 of rules/prepare.smk

shell:
    """
    export GTRACKS_GENES_PATH={params.genome_bed}

    gtracks {params.coordinates} \
        {input} \
        {output.genome_track} \
        --genes {params.genome_bed} {params.ymax} \
        --gene-rows {params.gene_rows} \
        --genes-height {params.gene_rows} \
        --x-axis {params.xaxis} \
        --width {params.width}
    """