Help improve this workflow!
This workflow has been published but could be further improved with some additional meta data:- Keyword(s) in categories input, output, operation
You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .
Introduction
This repository contains a Snakemake workflow for performing a basic ATAC-seq analysis pipeline.
This workflow performs the following steps:
-
Optional: Retrieval of publically available sequencing data from N
Code Snippets
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | __author__ = "Johannes Köster" __copyright__ = "Copyright 2016, Johannes Köster" __email__ = "koester@jimmy.harvard.edu" __license__ = "MIT" from snakemake.shell import shell extra = snakemake.params.get("extra", "") log = snakemake.log_fmt_shell(stdout=True, stderr=True) n = len(snakemake.input.sample) assert ( n == 1 or n == 2 ), "input->sample must have 1 (single-end) or 2 (paired-end) elements." if n == 1: reads = "-U {}".format(*snakemake.input.sample) else: reads = "-1 {} -2 {}".format(*snakemake.input.sample) shell( "(bowtie2 --threads {snakemake.threads} {extra} " "-x {snakemake.params.index} {reads} " "| samtools view -Sbh -o {snakemake.output[0]} -) {log}" ) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | __author__ = "Daniel Standage" __copyright__ = "Copyright 2020, Daniel Standage" __email__ = "daniel.standage@nbacc.dhs.gov" __license__ = "MIT" from snakemake.shell import shell extra = snakemake.params.get("extra", "") log = snakemake.log_fmt_shell(stdout=True, stderr=True) indexbase = snakemake.output[0].replace(".1.bt2", "") shell( "bowtie2-build --threads {snakemake.threads} {snakemake.params.extra} " "{snakemake.input.reference} {indexbase}" ) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 | __author__ = "Antonie Vietor" __copyright__ = "Copyright 2020, Antonie Vietor" __email__ = "antonie.v@gmx.de" __license__ = "MIT" import os import sys from snakemake.shell import shell log = snakemake.log_fmt_shell(stdout=True, stderr=True) in_contr = snakemake.input.get("control") params = "{}".format(snakemake.params) opt_input = "" out_dir = "" ext = "_peaks.xls" out_file = [o for o in snakemake.output if o.endswith(ext)][0] out_name = os.path.basename(out_file[: -len(ext)]) out_dir = os.path.dirname(out_file) if in_contr: opt_input = "-c {contr}".format(contr=in_contr) if out_dir: out_dir = "--outdir {dir}".format(dir=out_dir) if any(out.endswith(("_peaks.narrowPeak", "_summits.bed")) for out in snakemake.output): if any( out.endswith(("_peaks.broadPeak", "_peaks.gappedPeak")) for out in snakemake.output ): sys.exit( "Output files with _peaks.narrowPeak and/or _summits.bed extensions cannot be created together with _peaks.broadPeak and/or _peaks.gappedPeak extended output files.\n" "For usable extensions please see https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/macs2/callpeak.html.\n" ) else: if " --broad" in params: sys.exit( "If --broad option in params is given, the _peaks.narrowPeak and _summits.bed files will not be created. \n" "Remove --broad option from params if these files are needed.\n" ) if any( out.endswith(("_peaks.broadPeak", "_peaks.gappedPeak")) for out in snakemake.output ): if "--broad " not in params and not params.endswith("--broad"): params += " --broad " if any( out.endswith(("_treat_pileup.bdg", "_control_lambda.bdg")) for out in snakemake.output ): if all(p not in params for p in ["--bdg", "-B"]): params += " --bdg " else: if any(p in params for p in ["--bdg", "-B"]): sys.exit( "If --bdg or -B option in params is given, the _control_lambda.bdg and _treat_pileup.bdg extended files must be specified in output. \n" ) shell( "(macs2 callpeak " "-t {snakemake.input.treatment} " "{opt_input} " "{out_dir} " "-n {out_name} " "{params}) {log}" ) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | __author__ = "Jan Forster" __copyright__ = "Copyright 2021, Jan Forster" __email__ = "j.forster@dkfz.de" __license__ = "MIT" import os from snakemake.shell import shell in_file = snakemake.input[0] extra = snakemake.params.get("extra", "") log = snakemake.log_fmt_shell(stdout=False, stderr=True) if in_file.endswith(".sam") and ("-S" not in extra or "--sam-input" not in extra): extra += " --sam-input" shell( "sambamba view {extra} -t {snakemake.threads} " "{snakemake.input[0]} > {snakemake.output[0]} " "{log}" ) |
1 2 3 4 5 6 7 8 9 10 11 | __author__ = "Antonie Vietor" __copyright__ = "Copyright 2020, Antonie Vietor" __email__ = "antonie.v@gmx.de" __license__ = "MIT" from snakemake.shell import shell log = snakemake.log_fmt_shell(stdout=False, stderr=True) shell("samtools idxstats {snakemake.input.bam} > {snakemake.output[0]} {log}") |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | __author__ = "Johannes Köster" __copyright__ = "Copyright 2016, Johannes Köster" __email__ = "koester@jimmy.harvard.edu" __license__ = "MIT" from snakemake.shell import shell log = snakemake.log_fmt_shell(stdout=True, stderr=True) # Samtools takes additional threads through its option -@ # One thread for samtools merge # Other threads are *additional* threads passed to the '-@' argument threads = "" if snakemake.threads <= 1 else " -@ {} ".format(snakemake.threads - 1) shell( "samtools index {threads} {snakemake.params} {snakemake.input[0]} {snakemake.output[0]} {log}" ) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | __author__ = "Johannes Köster" __copyright__ = "Copyright 2016, Johannes Köster" __email__ = "koester@jimmy.harvard.edu" __license__ = "MIT" from snakemake.shell import shell log = snakemake.log_fmt_shell(stdout=True, stderr=True) # Samtools takes additional threads through its option -@ # One thread for samtools merge # Other threads are *additional* threads passed to the '-@' argument threads = "" if snakemake.threads <= 1 else " -@ {} ".format(snakemake.threads - 1) shell( "samtools merge {threads} {snakemake.params} " "{snakemake.output[0]} {snakemake.input} " "{log}" ) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | __author__ = "Johannes Köster" __copyright__ = "Copyright 2016, Johannes Köster" __email__ = "koester@jimmy.harvard.edu" __license__ = "MIT" import os from snakemake.shell import shell extra = snakemake.params.get("extra", "") log = snakemake.log_fmt_shell(stdout=True, stderr=True) out_name, out_ext = os.path.splitext(snakemake.output[0]) tmp_dir = snakemake.params.get("tmp_dir", "") if tmp_dir: prefix = os.path.join(tmp_dir, os.path.basename(out_name)) else: prefix = out_name # Samtools takes additional threads through its option -@ # One thread for samtools # Other threads are *additional* threads passed to the argument -@ threads = "" if snakemake.threads <= 1 else " -@ {} ".format(snakemake.threads - 1) shell( "samtools sort {extra} {threads} -o {snakemake.output[0]} " "-T {prefix} {snakemake.input[0]} " "{log}" ) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | __author__ = "Antonie Vietor" __copyright__ = "Copyright 2020, Antonie Vietor" __email__ = "antonie.v@gmx.de" __license__ = "MIT" import tempfile from snakemake.shell import shell log = snakemake.log_fmt_shell(stdout=True, stderr=True) extra = snakemake.params.get("extra", "") # optional input files and directories fasta = snakemake.input.get("fasta", "") chr_names = snakemake.input.get("chr_names", "") r_path = snakemake.params.get("r_path", "") if fasta: extra += " -G {}".format(fasta) if chr_names: extra += " -A {}".format(chr_names) if r_path: extra += " --Rpath {}".format(r_path) with tempfile.TemporaryDirectory() as tmpdir: shell( "featureCounts" " -T {snakemake.threads}" " -a {snakemake.input.annotation}" " {extra}" " --tmpDir {tmpdir}" " -o {snakemake.output[0]}" " {snakemake.input.samples}" " {log}" ) |
9 10 | script: "../scripts/fasterq-dump.py" |
21 22 | script: "../scripts/fasterq-dump.py" |
33 34 | shell: "cat {input} > {output} 2> {log}" |
48 49 | wrapper: "v1.1.0/bio/bowtie2/align" |
62 63 | wrapper: "v1.1.0/bio/samtools/sort" |
76 77 | wrapper: "v1.1.0/bio/samtools/index" |
14 15 | shell: "bamCoverage --bam {input.bam} -o {output} -p {threads} {params.extra}" |
26 27 | wrapper: "v1.1.0/bio/samtools/merge" |
40 41 | wrapper: "v1.1.0/bio/samtools/index" |
54 55 | shell: "bamCoverage --bam {input.bam} -o {output} -p {threads} {params.extra}" |
65 66 | script: "../scripts/zscore_normalize_bw.R" |
75 76 | script: "../scripts/zscore_normalize_bw.R" |
14 15 | wrapper: "v1.1.0/bio/macs2/callpeak" |
30 31 | wrapper: "v1.1.0/bio/macs2/callpeak" |
47 48 | wrapper: "v1.1.0/bio/macs2/callpeak" |
61 62 | script: "../scripts/extend_peak_summits.R" |
18 19 | wrapper: "v1.3.2/bio/subread/featurecounts" |
35 36 | script: "../scripts/DEseq2.R" |
52 53 | script: "../scripts/DEseq2_results.R" |
9 10 | wrapper: "v1.1.0/bio/samtools/idxstats" |
23 24 | wrapper: "v1.1.0/bio/sambamba/view" |
37 38 | wrapper: "v1.1.0/bio/samtools/index" |
48 49 | wrapper: "v1.1.0/bio/samtools/idxstats" |
61 62 | shell: "samtools view -bh -L {input.keep_chroms} --output-fmt BAM -o {output} {input.bam} 2>> {log}" |
74 75 | wrapper: "v1.1.0/bio/sambamba/view" |
89 90 | wrapper: "v1.1.0/bio/samtools/index" |
100 101 | wrapper: "v1.1.0/bio/samtools/idxstats" |
12 13 | shell: "curl {params.link} > {output} 2> {log}" |
24 25 | shell: "mv {input} {output} 2> {log}" |
40 41 42 43 | shell: "seqkit grep -f {input.keep_chroms} {input.genome}" " | seqkit fx2tab -nil" " | awk -v OFS='\t' '{{print $1, 1, $2}}' > {output}" |
58 59 | wrapper: "v1.1.0/bio/bowtie2/build" |
10 11 12 13 14 | shell: """ mv {input[0]} {output[0]} 2> {log} mv {input[1]} {output[1]} 2>> {log} """ |
27 28 | wrapper: "v1.1.0/bio/sambamba/view" |
41 42 | wrapper: "v1.1.0/bio/sambamba/view" |
55 56 | wrapper: "v1.1.0/bio/samtools/index" |
69 70 | wrapper: "v1.1.0/bio/samtools/index" |
14 15 | shell: "NGmerge -a -e 20 -u 41 -n 8 -v -1 {input[0]} -2 {input[1]} -o {params.prefix} 2> {log}" |
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | suppressPackageStartupMessages(library(DESeq2)) suppressPackageStartupMessages(library(tidyverse)) # import count table ----------------------------------------------------------- count_table <- read_tsv(snakemake@input[[1]], comment = "#") %>% select(-c(2:6)) %>% column_to_rownames(var = "Geneid") colnames(count_table) <- gsub("_small.bam", "", basename(colnames(count_table))) # create colData table --------------------------------------------------------- sample_table <- read_tsv(snakemake@params[["samples"]]) %>% filter(experiment == snakemake@wildcards[["experiment"]]) coldata <- tibble(sample_name = colnames(count_table)) %>% left_join(sample_table, by = "sample_name") # run DESeq2 ------------------------------------------------------------------- dds <- DESeqDataSetFromMatrix(as.matrix(count_table), coldata, design = as.formula(snakemake@params[["model"]])) # filter out low count genes keep <- rowSums(counts(dds)) >= snakemake@params[["count_threshold"]] dds2 <- dds[keep,] # test for differential gene expression dds2 <- DESeq(dds2) # write dds2 object to Rdata file ---------------------------------------------- saveRDS(dds2, file = snakemake@output[[1]]) |
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | suppressPackageStartupMessages(library(DESeq2)) suppressPackageStartupMessages(library(tidyverse)) suppressPackageStartupMessages(library(rtracklayer)) # import dds object ------------------------------------------------------------ ## import dds2 object dds <- readRDS(snakemake@input[["dds"]]) # get DEseq results for all contrasts ------------------------------------------ contrast <- c("condition", snakemake@params[["contrast"]]) results <- results(dds, contrast = contrast, alpha = snakemake@params[["padj_cutoff"]]) %>% as.data.frame() %>% arrange(padj) %>% rownames_to_column(var = "peak_id") # add additional information to results table ---------------------------------- # #read in peak file peaks <- rtracklayer::import(snakemake@input[["peaks"]]) %>% as.data.frame() %>% dplyr::select(seqnames, start, end, name, signalValue, pValue, qValue) %>% dplyr::rename(peak_chrom = seqnames, peak_start = start, peak_end = end, peak_id = name, MACS2_enrichment = signalValue, MACS2_pValue = pValue, MACS2_qValue = qValue) # add gene symbol to results out_table <- dplyr::left_join(results, peaks, by = "peak_id") ## add column indicating if gene is differentially expressed with padj < 0.05 and FC > 2 out_table <- out_table %>% dplyr::mutate(is_diff = (padj < snakemake@params[["padj_cutoff"]] & (abs(log2FoldChange) > snakemake@params[["FC_cutoff"]]))) %>% replace_na(list(is_diff = FALSE)) # write output file ------------------------------------------------------------ write_tsv(out_table, snakemake@output[[1]]) |
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 | suppressPackageStartupMessages(library(tidyverse)) suppressPackageStartupMessages(library(rtracklayer)) suppressPackageStartupMessages(library(GenomicRanges)) # define functions ------------------------------------------------------------- # function to extract peak summits from narrowPeak file # peak start and end values will be replaced with the summit location extract_summits <- function(gr, extend_width = 1L) { # verify input is a GRanges object if (!inherits(gr, "GRanges") ) { stop("x must be a GRanges object") } # verify that gr object is in narrowPeak format if ( !all(names(mcols(gr)) %in% c("name", "score", "signalValue", "pValue", "qValue", "peak"))) { stop(strwrap("GRanges object does not appear to be in narrowPeak format. Object should contain the following metadata columns: name, score, signalValue, pValue, qValue and peak")) } if (all(gr$peak == -1)) { stop("All values for 'peak' column == -1, indicating that summits were not called. Verify that your peak caller called peak summits") } # replace peak start and end values with summit location summit <- start(gr) + gr$peak - 1 start(gr) <- summit end(gr) <- summit # resize peaks to desired width gr <- resize(gr, width = extend_width, fix = "center") # remove now meaningless peak column gr$peak <- NULL return(gr) } # read peaks ------------------------------------------------------------------- peak_fn <- snakemake@input[[1]] peaks <- rtracklayer::import(peak_fn) # extend peak summits ---------------------------------------------------------- extended_summits <- peaks %>% extract_summits(extend_width = as.integer(snakemake@params[["extend_width"]])) # export filtered peaks to file ----------------------------------------------- extended_summits %>% as.data.frame() %>% select(1:3, 6:7, 5, 8:10) %>% mutate(strand = ".") %>% mutate(peak = -1) %>% write_tsv(snakemake@output[[2]], col_names = FALSE) rtracklayer::export(extended_summits, snakemake@output[[1]]) # export SAF file for featureCounts -------------------------------------------- out_peaks.df <- as.data.frame(extended_summits) out_peaks.df$strand <- "." peaks.saf <- out_peaks.df %>% select(c("name", "seqnames", "start", "end", "strand")) names(peaks.saf) <- c( "GeneID", "Chr", "Start", "End", "Strand") write_tsv(peaks.saf, snakemake@output[[3]]) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 | __author__ = "Johannes Köster, Derek Croote" __copyright__ = "Copyright 2020, Johannes Köster" __email__ = "johannes.koester@uni-due.de" __license__ = "MIT" import os import tempfile from snakemake.shell import shell from snakemake_wrapper_utils.snakemake import get_mem log = snakemake.log_fmt_shell(stdout=True, stderr=True) extra = snakemake.params.get("extra", "") # Parse memory mem_mb = get_mem(snakemake, "MiB") # Outdir outdir = os.path.dirname(snakemake.output[0]) if outdir: outdir = f"--outdir {outdir}" # Output compression compress = "" mem = f"-m{mem_mb}" if mem_mb else "" for output in snakemake.output: out_name, out_ext = os.path.splitext(output) if out_ext == ".gz": compress += f"pigz -p {snakemake.threads} {out_name}; " elif out_ext == ".bz2": compress += f"pbzip2 -p{snakemake.threads} {mem} {out_name}; " with tempfile.TemporaryDirectory() as tmpdir: mem = f"--mem {mem_mb}M" if mem_mb else "" shell( "(fasterq-dump --temp {tmpdir} --threads {snakemake.threads} {mem} " "{extra} {outdir} {snakemake.wildcards.accession}; " "{compress}" ") {log}" ) |
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 | suppressPackageStartupMessages(library(tidyverse)) suppressPackageStartupMessages(library(rtracklayer)) suppressPackageStartupMessages(library(GenomicRanges)) zscore_bw <- function(bw) { require(tidyverse) require(rtracklayer) require(GenomicRanges) # import bigwig file to Granges if (typeof(bw) == "character") { message("reading bigwig file") bw <- import(bw) } # if using a spike-in, filter the seqlevels to only the reference genome if (snakemake@config[["use_spikeIn"]]) { message("removing spikeIn chromosomes") ref_chroms <- seqlevels(bw)[!grepl("spikeIn_", seqlevels(bw))] bw <- keepSeqlevels(bw, ref_chroms, pruning.mode = "coarse") } if (snakemake@config[["filter_chroms"]]) { message("filtering reference chromosomes") keep_chroms <- read_tsv(snakemake@config[["keep_chroms"]], col_names = c("chromosome")) ref_chroms <- seqlevels(bw)[seqlevels(bw) %in% keep_chroms$chromosome] bw <- keepSeqlevels(bw, ref_chroms, pruning.mode = "coarse") } # for large regions with the same score, expand into equal sized bins message("binning genome") min_binsize <- min(width(bw)) all_bins <- tileGenome(seqinfo(bw), tilewidth=min_binsize,cut.last.tile.in.chrom=TRUE) message("getting scores for all bins") # add the coverage/score for both input and IP all_bins <- subsetByOverlaps(all_bins, bw) overlaps <- findOverlaps(all_bins, bw) all_bins$score[overlaps@from] <- bw$score[overlaps@to] # perform z-score normalization message("performing z-score normalization") all_bins$zscore <- scale(all_bins$score)[,1] all_bins$score <- NULL all_bins$score <- all_bins$zscore all_bins$zscore <- NULL # collapse adjacent bins with same score collapsed <- unlist(GenomicRanges::reduce(split(all_bins, ~score))) collapsed$score <- as.numeric(names(collapsed)) names(collapsed) <- NULL all_bins <- collapsed #set seqinfo for z-score normalized version seqinfo(all_bins) <- seqinfo(bw) return(all_bins) } # perform z-score normalization and write new bigwig files --------------------- zscore.gr <- zscore_bw(snakemake@input[[1]]) export(zscore.gr, snakemake@output[[1]]) |
Support
Do you know this workflow well? If so, you can
request seller status , and start supporting this workflow.
Created: 1yr ago
Updated: 1yr ago
Maitainers:
public
URL:
https://github.com/tjgibson/NGS-workflow-ATACseq
Name:
ngs-workflow-atacseq
Version:
1
Downloaded:
0
Copyright:
Public Domain
License:
None
Keywords:
- Future updates
Related Workflows

ENCODE pipeline for histone marks developed for the psychENCODE project
psychip pipeline is an improved version of the ENCODE pipeline for histone marks developed for the psychENCODE project.
The o...

Near-real time tracking of SARS-CoV-2 in Connecticut
Repository containing scripts to perform near-real time tracking of SARS-CoV-2 in Connecticut using genomic data. This pipeli...

snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...

ATLAS - Three commands to start analyzing your metagenome data
Metagenome-atlas is a easy-to-use metagenomic pipeline based on snakemake. It handles all steps from QC, Assembly, Binning, t...
raw sequence reads
Genome assembly
Annotation track
checkm2
gunc
prodigal
snakemake-wrapper-utils
MEGAHIT
Atlas
BBMap
Biopython
BioRuby
Bwa-mem2
cd-hit
CheckM
DAS
Diamond
eggNOG-mapper v2
MetaBAT 2
Minimap2
MMseqs
MultiQC
Pandas
Picard
pyfastx
SAMtools
SemiBin
Snakemake
SPAdes
SqueezeMeta
TADpole
VAMB
CONCOCT
ete3
gtdbtk
h5py
networkx
numpy
plotly
psutil
utils
metagenomics

RNA-seq workflow using STAR and DESeq2
This workflow performs a differential gene expression analysis with STAR and Deseq2. The usage of this workflow is described ...

This Snakemake pipeline implements the GATK best-practices workflow
This Snakemake pipeline implements the GATK best-practices workflow for calling small germline variants. The usage of thi...