A Snakemake workflow for easy visualization of genome browser tracks of aligned BAM files powered by the wrapper gtracks for the package pyGenomeTracks.
Genome Browser Track Visualization Snakemake Workflow Powered by pyGenomeTracks for Aligned BAM Files
A Snakemake workflow for easy visualization of genome browser tracks of aligned BAM files (e.g., RNA-seq, ATAC-seq,...) powered by the wrapper gtracks for the package pyGenomeTracks .
This workflow adheres to the module specifications of MR.PARETO , an effort to augment research by modularizing (biomedical) data science. For more details and modules check out the project's repository.
If you use this workflow in a publication, don't forget to give credits to the authors by citing the URL of this (original) repository (and its DOI, see Zenodo badge above -> coming soon).
Table of contents
Authors
Software
This project wouldn't be possible without the following software and their dependencies:
Software | Reference (DOI) |
---|---|
deeptools | https://doi.org/10.1093/nar/gkw257 |
gtracks | https://gitlab.com/salk-tm/gtracks |
pygenometracks | https://doi.org/10.1093/bioinformatics/btaa692 |
samtools | https://doi.org/10.1093/bioinformatics/btp352 |
Methods
This is a template for the Methods section of a scientific publication and is intended to serve as a starting point. Only retain paragraphs relevant to your analysis. References [ref] to the respective publications are curated in the software table above. Versions (ver) have to be read out from the respective conda environment specifications (workflow/envs/*.yaml file) or post execution in the result directory (/envs/genome_tracks/*.yaml). Parameters that have to be adapted depending on the data or workflow configurations are denoted in squared brackets e.g., [X].
Processing. Aligned (and filtered) BAM files were merged by [group] using samtools (ver) [ref]. Each BAM file was converted to a bigWig file for dowmnstream analysis and visualization using bamCoverage from the command line tool deepTools (ver) [ref]. Finally, we extracted coordinates, extended start and end by [base_buffer] bases, and number of isoforms of all relevant genes/genomic regions [gene_list] from the 12 column BED file genome [genome] annotation [genome_bed].
Visualization. Visualizations for each relevant gene/genomic region and [category] were generated by using the prepared bigWig files and vertically stacking genome browser tracks with their annotation at the [x_axis] and each track scaled by [y_max] reads. Additionally, a visualization including all categories per gene/genomic region was generated. The plotting was performed using the python wrapper gtracks (ver) [ref] for the package pyGenomeTracks (ver) [ref].
The processing and visualizations described here were performed using a publicly available Snakemake (ver) [ref] workflow [ref - cite this workflow here].
Features
The workflow performs the following steps that produce the outlined results:
-
Processing
-
BAM files of the same group are merged and indexed using samtools. (merged_bams/{group}.bam)
-
A bigWig file per merged BAM file is generated using deepTools::bamCoverage. (bigwigs/{group}.bw)
-
Information per requested gene from the 12-column BED file is retrieved (not necessary for genomic regions).
-
coordinates from the 12-column BED file are extracted and extended at start/end by the parameter base_buffer.
-
the number of isoforms i.e. number of lines in the BED file is determined ( only for genes, for genomic regions it is hardcoded to 1 ) to plot below the tracks.
-
-
-
Visualization
-
generate one plot per category of bigWigs and gene/region with the before determined gene-parameters i.e, coordinates and gene-rows (tracks/{category}_{gene/region}.svg).
-
generate one plot including ALL categories per gene/region (tracks/ALL_{gene/region}.svg)
-
Usage
Here are some tips for the usage of this workflow:
-
Start with the 5-10 most interesting genes (e.g., the most differentially expressed between conditions) and few/relevant samples for a test run.
-
Set y-max to auto (i.e., '') for the first run to get a feeling of the magnitudes.
-
Merging BAM files and generating bigWig files takes longest, but is performed only once. The plot generation for different genes/genomic regions afterward is very fast. Therefore, it is recommended to get the workflow going with a few samples and then increase the number of samples.
Configuration
Detailed specifications can be found here ./config/README.md
Examples
--- COMING SOON ---
Links
Resources
-
UCSC Genome Browser annotation track database
- recommended source for the required 12 column BED file annotation of the respective genome.
Publications
The following publications successfully used this module for their analyses.
- ...
Code Snippets
18 19 20 21 | shell: """ conda env export > {output} """ |
38 39 40 | run: with open(output["configs"], 'w') as outfile: json.dump(config, outfile) |
59 60 61 62 | shell: """ cp {input} {output} """ |
81 82 83 84 | shell: """ cp {input} {output} """ |
18 19 20 21 22 23 | shell: """ samtools merge {output.merged_bam} {input} samtools index -b {output.merged_bam} """ |
44 45 46 47 48 49 50 | shell: """ bamCoverage --bam {input.merged_bam} \ -p max --binSize 10 --normalizeUsing RPGC \ --effectiveGenomeSize {params.genome_size} {params.extendReads} \ -o "{output.bigwig}" > "{output.bigwig}.log" 2>&1; """ |
35 36 37 38 39 40 41 42 43 44 45 46 47 | shell: """ export GTRACKS_GENES_PATH={params.genome_bed} gtracks {params.coordinates} \ {input} \ {output.genome_track} \ --genes {params.genome_bed} {params.ymax} \ --gene-rows {params.gene_rows} \ --genes-height {params.gene_rows} \ --x-axis {params.xaxis} \ --width {params.width} """ |
Support
- Future updates
Related Workflows





