muChip Analysis Workflow: Trackhub Creation and Sample Visualization
Help improve this workflow!
This workflow has been published but could be further improved with some additional meta data:- Keyword(s) in categories input, output, operation
- Lack of a description for a new keyword tool/pypi/trackhub .
You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .
Snakemake workflow to analyze muChip experiments. Creates a trackhub with called peaks and domains as well as fragment pileup and control track for each sample. Also creates average-, tss- and heatplots for each sample.
Usage
If you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this (original) repository and, if available, its DOI (see above).
Step 1: Obtain a copy of this workflow
-
Create a new github repository using this workflow as a template .
-
Clone the newly created repository to your local system, into the place where you want to perform the data analysis.
Step 2: Configure workflow
Configure the workflow according to your needs via editing the files in the
config/
folder. Adjust
config.yaml
to configure the workflow execution, and
samples.tsv
to specify your sample setup. See
config/samples.tsv
for local file example and
config/public_samples.tsv
for example using SRA.
Step 3: Get the read files
Move any local read files to the
results/reads
folder. The file names should end with
_R1_001.fastq.gz
(
R2
for mate pair files).
Step 4: Install Snakemake
Install Snakemake using conda :
conda create -c bioconda -c conda-forge -n snakemake snakemake
Step 5: Execute workflow
Activate the conda environment:
conda activate snakemake
Test your configuration by performing a dry-run via
snakemake --use-conda -n
Execute the workflow locally via
snakemake --use-conda --cores $N
using
$N
cores or run it in a cluster environment via
snakemake --use-conda --cluster qsub --jobs 100
Step 6: Investigate results
After successful execution, you can create a self-contained interactive HTML report with all results via:
snakemake --report report.html
This report can, e.g., be forwarded to your collaborators. An example (using some trivial test data) can be seen here .
A trackhub folder is created in
results/trackhub/
which can be used to view the tracks on
https://genome-euro.ucsc.edu/cgi-bin/hgHubConnect
Figures are places in
results/plots/joined/
Code Snippets
9 10 | shell: "wget https://hgdownload.soe.ucsc.edu/goldenPath/{wildcards.species}/database/chromInfo.txt.gz -O {output}" |
17 18 19 20 21 22 | shell: "z%s {input} > {output}" % chromosome_grep rule download_genes: output: "results/{species}/data/refGene.txt.gz" |
23 24 | shell: "wget https://hgdownload.soe.ucsc.edu/goldenPath/{wildcards.species}/database/refGene.txt.gz -O {output}" |
31 32 | shell: """z%s {input} | awk '{{OFS="\t"}}{{print $3, $5, $6, ".", ".", $4}}' | uniq > {output}""" % chromosome_grep |
40 41 | shell: """awk '{{OFS="\t"}}{{if ($6=="+") {{print $1,$2,$2+1,".",".","+"}} else {{print $1, $3-1, $3,".",".","-"}}}}' {input} | uniq > {output}""" |
48 49 | shell: "wget https://genome-idx.s3.amazonaws.com/bt/{wildcards.species}.zip -O {output} -o {log}" |
56 57 | shell: "unzip {input} -d {wildcards.path}" |
70 71 | wrapper: "0.67.0/bio/fastqc" |
17 18 | shell: "macs2 callpeak -t {input.treatment} -c {input.control} -g {params.gs} --bdg --broad --outdir results/{wildcards.species}/se_broadpeakcalling -n {wildcards.celltype}_{wildcards.condition} > {log}" |
34 35 | shell: "macs2 callpeak -t {input.treatment} -c {input.control} -g {params.gs} --bdg --broad --outdir results/{wildcards.species}/pe_broadpeakcalling -n {wildcards.celltype}_{wildcards.condition} -f BAMPE > {log}" |
44 45 | shell: "bedtools merge -d 5000 -i {input} > {output}" |
33 34 | wrapper: "0.67.0/bio/trimmomatic/pe" |
50 51 | wrapper: "0.49.0/bio/bwa/mem" |
66 67 | wrapper: "0.67.0/bio/bowtie2/align" |
76 77 | wrapper: "0.67.0/bio/samtools/sort" |
89 90 | wrapper: "0.64.0/bio/picard/markduplicates" |
99 100 | wrapper: "0.50.4/bio/samtools/view" |
107 108 | shell: """samtools view -h {input} | awk '($9>=0 ? $9 : -$9) > 50 || $1 ~ /^@/' | samtools view -bS - > {output}""" |
115 116 | shell: "samtools collate -Ou {input} | bedtools bamtobed -i - | chiptools pairbed | bedtools sort > {output}" |
123 124 | shell: "awk '{{if (($3-$2)>50) print}}' {input} > {output}" |
136 137 | wrapper: "0.64.0/bio/bedtools/genomecov" |
15 16 | shell: "bdgplot heat {input.bedgraph} {input.regions} -o {output[0]} -od {output[1]}" |
27 28 | shell: "bdgplot tss {input.bedgraph} {input.regions} -o {output[0]} -od {output[1]}" |
39 40 | shell: "bdgplot average {input.bedgraph} {input.regions} -o {output[0]} -od {output[1]}" |
51 52 | shell: "bdgtools joinfigs {wildcards.plottype} {input} -o {output} --name {wildcards.comparisongroup}" |
28 29 | wrapper: "0.67.0/bio/trimmomatic/se" |
43 44 | wrapper: "0.49.0/bio/bwa/mem" |
58 59 | wrapper: "0.67.0/bio/bowtie2/align" |
68 69 | wrapper: "0.67.0/bio/samtools/sort" |
78 79 | wrapper: "0.50.4/bio/samtools/view" |
91 92 | wrapper: "0.64.0/bio/picard/markduplicates" |
101 102 | shell: "bedtools bamtobed -i {input} | gzip > {output}" |
111 112 | shell: "zcat {input} | bedtools sort | gzip > {output}" |
11 12 | shell: "prefetch {wildcards.sra} -o {output}" |
22 23 | shell: "fastq-dump --split-files --gzip {input} -O results/dumped_fastq/" |
32 33 | shell: "fastq-dump --gzip {input} -O results/dumped_fastq/" |
40 41 | shell: "mv {input} {output}" |
48 49 | shell: "mv {input} {output}" |
8 9 10 11 12 | run: text = "\n".join( f"genome {species}\ntrackDb {species}/trackDb.txt\n" for species in set(samples["species"])) open(output[0], "w").write(text) |
19 20 21 22 | run: name = config.get("name", "mu-chip") mail = config.get("mail", "example@mail.com") open(output[0], "w").write(f"""hub {name} |
38 39 | script: "../scripts/trackhub.py" |
50 51 | shell: "LC_COLLATE=C sort -k1,1 -k2,2n {input} -T {params.tmp} > {output.bdg}" |
59 60 | wrapper: "0.50.3/bio/ucsc/bedGraphToBigWig" |
67 68 | shell: """awk '{{OFS="\t"}}{{$5=($5>1000?1000:$5);print}} {{input}} > {{output}}'""" |
78 79 | shell: "bedToBigBed -type=bed6+3 {input.peaks} {input.sizes} {output}" |
89 90 | shell: "bedToBigBed {input.domains} {input.sizes} {output}" |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 | from pathlib import PurePath import colorsys import trackhub def get_color(i, n_colors): h = i/float(n_colors) s = 0.5 # color_intensity[filetype] color = [255*v for v in colorsys.hsv_to_rgb(h, 0.5, 0.5)] return ",".join([str(int(c)) for c in list(color)]) def histone_track(name, color="200,0,0"): composite = trackhub.CompositeTrack( name=name+"_composite", short_label=name, tracktype='bigWig', visibility='full', color=color ) signal_view = trackhub.ViewTrack( name=name+"_signal", view='signal', visibility='full', tracktype='bigWig', short_label=name+'_signal', autoScale="on" ) regions_view = trackhub.ViewTrack( name=name+'_region', view='regions', visibility='dense', tracktype='bigWig', short_label=name+'_regions') composite.add_view(signal_view) composite.add_view(regions_view) for signal_type in ["treat_pileup", "control_lambda"]: track = trackhub.Track( tracktype='bigWig', name=name+"_"+signal_type, url="%s_%s.bw" % (name, signal_type), short_label=signal_type, autoScale="on" ) signal_view.add_tracks(track) for region_type in ["domains"]: track = trackhub.Track( name=name+"_"+region_type, url="%s_%s.bb" %(name, region_type), short_label=region_type, tracktype='bigBed') regions_view.add_tracks(track) return composite names = [PurePath(name).stem.replace("_domains", "") for name in snakemake.input.domains] colors = [get_color(i, len(names)) for i in range(len(names))] hub, genomes_file, genome, trackdb = trackhub.default_hub( hub_name="testing", genome="hg38", email="knutdrand@gmail.com") trackdb.add_tracks([histone_track(*pair) for pair in zip(names, colors)]) open(snakemake.output[0], "w").write(str(trackdb)) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 | __author__ = "Johannes Köster, Julian de Ruiter" __copyright__ = "Copyright 2016, Johannes Köster and Julian de Ruiter" __email__ = "koester@jimmy.harvard.edu, julianderuiter@gmail.com" __license__ = "MIT" from os import path from snakemake.shell import shell # Extract arguments. extra = snakemake.params.get("extra", "") sort = snakemake.params.get("sort", "none") sort_order = snakemake.params.get("sort_order", "coordinate") sort_extra = snakemake.params.get("sort_extra", "") log = snakemake.log_fmt_shell(stdout=False, stderr=True) # Check inputs/arguments. if not isinstance(snakemake.input.reads, str) and len(snakemake.input.reads) not in { 1, 2, }: raise ValueError("input must have 1 (single-end) or " "2 (paired-end) elements") if sort_order not in {"coordinate", "queryname"}: raise ValueError("Unexpected value for sort_order ({})".format(sort_order)) # Determine which pipe command to use for converting to bam or sorting. if sort == "none": # Simply convert to bam using samtools view. pipe_cmd = "samtools view -Sbh -o {snakemake.output[0]} -" elif sort == "samtools": # Sort alignments using samtools sort. pipe_cmd = "samtools sort {sort_extra} -o {snakemake.output[0]} -" # Add name flag if needed. if sort_order == "queryname": sort_extra += " -n" prefix = path.splitext(snakemake.output[0])[0] sort_extra += " -T " + prefix + ".tmp" elif sort == "picard": # Sort alignments using picard SortSam. pipe_cmd = ( "picard SortSam {sort_extra} INPUT=/dev/stdin" " OUTPUT={snakemake.output[0]} SORT_ORDER={sort_order}" ) else: raise ValueError("Unexpected value for params.sort ({})".format(sort)) shell( "(bwa mem" " -t {snakemake.threads}" " {extra}" " {snakemake.params.index}" " {snakemake.input.reads}" " | " + pipe_cmd + ") {log}" ) |
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | __author__ = "Roman Chernyatchik" __copyright__ = "Copyright (c) 2019 JetBrains" __email__ = "roman.chernyatchik@jetbrains.com" __license__ = "MIT" from snakemake.shell import shell log = snakemake.log_fmt_shell(stdout=True, stderr=True) extra = snakemake.params.get("extra", "") shell( "bedGraphToBigWig {extra}" " {snakemake.input.bedGraph} {snakemake.input.chromsizes}" " {snakemake.output} {log}" ) |
1 2 3 4 5 6 7 8 9 10 | __author__ = "Johannes Köster" __copyright__ = "Copyright 2016, Johannes Köster" __email__ = "koester@jimmy.harvard.edu" __license__ = "MIT" from snakemake.shell import shell shell("samtools view {snakemake.params} {snakemake.input[0]} > {snakemake.output[0]}") |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | __author__ = "Antonie Vietor" __copyright__ = "Copyright 2020, Antonie Vietor" __email__ = "antonie.v@gmx.de" __license__ = "MIT" import os from snakemake.shell import shell log = snakemake.log_fmt_shell(stdout=False, stderr=True) genome = "" input_file = "" if (os.path.splitext(snakemake.input[0])[-1]) == ".bam": input_file = "-ibam " + snakemake.input[0] if len(snakemake.input) > 1: if (os.path.splitext(snakemake.input[0])[-1]) == ".bed": input_file = "-i " + snakemake.input.get("bed") genome = "-g " + snakemake.input.get("ref") shell( "(genomeCoverageBed" " {snakemake.params}" " {input_file}" " {genome}" " > {snakemake.output[0]}) {log}" ) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | __author__ = "Johannes Köster" __copyright__ = "Copyright 2016, Johannes Köster" __email__ = "koester@jimmy.harvard.edu" __license__ = "MIT" from snakemake.shell import shell log = snakemake.log_fmt_shell(stdout=True, stderr=True) shell( "picard MarkDuplicates {snakemake.params} INPUT={snakemake.input} " "OUTPUT={snakemake.output.bam} METRICS_FILE={snakemake.output.metrics} " "{log}" ) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | __author__ = "Johannes Köster" __copyright__ = "Copyright 2016, Johannes Köster" __email__ = "koester@jimmy.harvard.edu" __license__ = "MIT" from snakemake.shell import shell extra = snakemake.params.get("extra", "") log = snakemake.log_fmt_shell(stdout=True, stderr=True) n = len(snakemake.input.sample) assert ( n == 1 or n == 2 ), "input->sample must have 1 (single-end) or 2 (paired-end) elements." if n == 1: reads = "-U {}".format(*snakemake.input.sample) else: reads = "-1 {} -2 {}".format(*snakemake.input.sample) shell( "(bowtie2 --threads {snakemake.threads} {snakemake.params.extra} " "-x {snakemake.params.index} {reads} " "| samtools view -Sbh -o {snakemake.output[0]} -) {log}" ) |
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 | __author__ = "Julian de Ruiter" __copyright__ = "Copyright 2017, Julian de Ruiter" __email__ = "julianderuiter@gmail.com" __license__ = "MIT" from os import path from tempfile import TemporaryDirectory from snakemake.shell import shell log = snakemake.log_fmt_shell(stdout=False, stderr=True) def basename_without_ext(file_path): """Returns basename of file path, without the file extension.""" base = path.basename(file_path) split_ind = 2 if base.endswith(".fastq.gz") else 1 base = ".".join(base.split(".")[:-split_ind]) return base # Run fastqc, since there can be race conditions if multiple jobs # use the same fastqc dir, we create a temp dir. with TemporaryDirectory() as tempdir: shell( "fastqc {snakemake.params} --quiet -t {snakemake.threads} " "--outdir {tempdir:q} {snakemake.input[0]:q}" " {log:q}" ) # Move outputs into proper position. output_base = basename_without_ext(snakemake.input[0]) html_path = path.join(tempdir, output_base + "_fastqc.html") zip_path = path.join(tempdir, output_base + "_fastqc.zip") if snakemake.output.html != html_path: shell("mv {html_path:q} {snakemake.output.html:q}") if snakemake.output.zip != zip_path: shell("mv {zip_path:q} {snakemake.output.zip:q}") |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | __author__ = "Johannes Köster" __copyright__ = "Copyright 2016, Johannes Köster" __email__ = "koester@jimmy.harvard.edu" __license__ = "MIT" import os from snakemake.shell import shell prefix = os.path.splitext(snakemake.output[0])[0] # Samtools takes additional threads through its option -@ # One thread for samtools # Other threads are *additional* threads passed to the argument -@ threads = "" if snakemake.threads <= 1 else " -@ {} ".format(snakemake.threads - 1) shell( "samtools sort {snakemake.params} {threads} -o {snakemake.output[0]} " "-T {prefix} {snakemake.input[0]}" ) |
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 | __author__ = "Johannes Köster, Jorge Langa" __copyright__ = "Copyright 2016, Johannes Köster" __email__ = "koester@jimmy.harvard.edu" __license__ = "MIT" from snakemake.shell import shell # Distribute available threads between trimmomatic itself and any potential pigz instances def distribute_threads(input_files, output_files, available_threads): gzipped_input_files = sum(1 for file in input_files if file.endswith(".gz")) gzipped_output_files = sum(1 for file in output_files if file.endswith(".gz")) potential_threads_per_process = available_threads // ( 1 + gzipped_input_files + gzipped_output_files ) if potential_threads_per_process > 0: # decompressing pigz creates at most 4 threads pigz_input_threads = ( min(4, potential_threads_per_process) if gzipped_input_files != 0 else 0 ) pigz_output_threads = ( (available_threads - pigz_input_threads * gzipped_input_files) // (1 + gzipped_output_files) if gzipped_output_files != 0 else 0 ) trimmomatic_threads = ( available_threads - pigz_input_threads * gzipped_input_files - pigz_output_threads * gzipped_output_files ) else: # not enough threads for pigz pigz_input_threads = 0 pigz_output_threads = 0 trimmomatic_threads = available_threads return trimmomatic_threads, pigz_input_threads, pigz_output_threads def compose_input_gz(filename, threads): if filename.endswith(".gz") and threads > 0: return "<(pigz -p {threads} --decompress --stdout {filename})".format( threads=threads, filename=filename ) return filename def compose_output_gz(filename, threads, compression_level): if filename.endswith(".gz") and threads > 0: return ">(pigz -p {threads} {compression_level} > {filename})".format( threads=threads, compression_level=compression_level, filename=filename ) return filename extra = snakemake.params.get("extra", "") log = snakemake.log_fmt_shell(stdout=True, stderr=True) compression_level = snakemake.params.get("compression_level", "-5") trimmer = " ".join(snakemake.params.trimmer) # Distribute threads input_files = [snakemake.input.r1, snakemake.input.r2] output_files = [ snakemake.output.r1, snakemake.output.r1_unpaired, snakemake.output.r2, snakemake.output.r2_unpaired, ] trimmomatic_threads, input_threads, output_threads = distribute_threads( input_files, output_files, snakemake.threads ) input_r1, input_r2 = [ compose_input_gz(filename, input_threads) for filename in input_files ] output_r1, output_r1_unp, output_r2, output_r2_unp = [ compose_output_gz(filename, output_threads, compression_level) for filename in output_files ] shell( "trimmomatic PE -threads {trimmomatic_threads} {extra} " "{input_r1} {input_r2} " "{output_r1} {output_r1_unp} " "{output_r2} {output_r2_unp} " "{trimmer} " "{log}" ) |
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 | __author__ = "Johannes Köster, Jorge Langa" __copyright__ = "Copyright 2016, Johannes Köster" __email__ = "koester@jimmy.harvard.edu" __license__ = "MIT" from snakemake.shell import shell # Distribute available threads between trimmomatic itself and any potential pigz instances def distribute_threads(input_file, output_file, available_threads): gzipped_input_files = 1 if input_file.endswith(".gz") else 0 gzipped_output_files = 1 if output_file.endswith(".gz") else 0 potential_threads_per_process = available_threads // ( 1 + gzipped_input_files + gzipped_output_files ) if potential_threads_per_process > 0: # decompressing pigz creates at most 4 threads pigz_input_threads = ( min(4, potential_threads_per_process) if gzipped_input_files != 0 else 0 ) pigz_output_threads = ( (available_threads - pigz_input_threads * gzipped_input_files) // (1 + gzipped_output_files) if gzipped_output_files != 0 else 0 ) trimmomatic_threads = ( available_threads - pigz_input_threads * gzipped_input_files - pigz_output_threads * gzipped_output_files ) else: # not enough threads for pigz pigz_input_threads = 0 pigz_output_threads = 0 trimmomatic_threads = available_threads return trimmomatic_threads, pigz_input_threads, pigz_output_threads def compose_input_gz(filename, threads): if filename.endswith(".gz") and threads > 0: return "<(pigz -p {threads} --decompress --stdout {filename})".format( threads=threads, filename=filename ) return filename def compose_output_gz(filename, threads, compression_level): if filename.endswith(".gz") and threads > 0: return ">(pigz -p {threads} {compression_level} > {filename})".format( threads=threads, compression_level=compression_level, filename=filename ) return filename extra = snakemake.params.get("extra", "") log = snakemake.log_fmt_shell(stdout=True, stderr=True) compression_level = snakemake.params.get("compression_level", "-5") trimmer = " ".join(snakemake.params.trimmer) # Distribute threads trimmomatic_threads, input_threads, output_threads = distribute_threads( snakemake.input[0], snakemake.output[0], snakemake.threads ) # Collect files input = compose_input_gz(snakemake.input[0], input_threads) output = compose_output_gz(snakemake.output[0], output_threads, compression_level) shell( "trimmomatic SE -threads {trimmomatic_threads} {extra} {input} {output} {trimmer} {log}" ) |
Support
- Future updates
Related Workflows





