Help improve this workflow!
This workflow has been published but could be further improved with some additional meta data:- Keyword(s) in categories input, output, operation, topic
You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .
Yet another snakemake workflow for ATAC-seq data processing. This pipeline was created from code developed by:
-
Crazy Hot Tommy! 's many instructional guides
-
TOBIAS ATAC-seq footprinting Snakemake workflow
For SLURM setup we reference:
-
jdblischak/smk-simple-slur repo for simple submitting snakemake on SLURM
-
Tessa Pierce blog for example templates
Workflow Overview
Snakemake pipelines promote experimental reproducibility. For this project, you should have the following inputs customized for your analysis:
-
A config.yaml that describes the run parameters and location of reference data.
-
A tab-delimited sample meta file file that describes the experiments to download from SRA and how to group them.
-
A unique output directory.
A detailed overview of the steps in the ATAC-seq data processing are found on the maxATAC wiki site .
This version of snakeATAC is geared towards use with maxATAC and TOBIAS for making TF binding predictions.
Installation
This pipeline uses Anaconda and Snakemake. Follow the Snakemake install instructions for the best experience. Below is a brief overview of how to install Snakemake.
Create environment
Create a
conda
environment and download
mamba
:
conda create -n snakeatac -c conda-forge -c bioconda mamba snakemake
Activate the
snakeatac
environment:
conda activate snakeatac
Clone the snakeATAC repository
In your favorite directory clone the snakeATAC repo:
git clone https://github.com/tacazares/snakeATAC.git
Set up run-specific parameters
If you are running this pipeline for your first time, you will need to install all the
conda
environments used and perform a dry-run to make sure that everything was installed right.
-
Adjust the config.yaml and the tab-delimited sample meta file for your specific experiment.
-
Change to the working directory for snakeATAC. By default, Snakemake will look for a file called
Snakefile
with the rules and run information. You can use a customeSnakefile
with the-s
flag followed by the path to the file.cd ./snakeATAC/
-
Next, use the
--conda-create-envs-only
flag to create the environments.snakemake --cores 14 --use-conda --conda-frontend mamba --conda-create-envs-only --configfile ./inputs/config.yaml
-
Test the workflow and scripts are correctly set up by performing a dry-run with the
--dry-run
flag.snakemake --cores 14 --use-conda --conda-frontend mamba --configfile ./inputs/config.yaml --dry-run
Test snakeATAC
The ./snakeATAC/inputs/GM12878_sample.tsv contains information for a test run to process GM12878 OMNI ATAC-seq data.
After install, you can run the full run using your favorite HPC system.
snakemake --cores 14 --use-conda --conda-frontend mamba --configfile ./inputs/config.yaml
Use Snakemake to submit jobs through SLURM
If you want to use Snakemake to submit jobs to slurm, you will need to follow the instruction described by
jdblischak/smk-simple-slur repo
. The directory and scripts are included in this repository, but you will need to adjust the
account
information. You can also adjust any defaults that you wish to use with your job submissions. NOTE: You will need to use
chmod +x status-sacct.sh
to make the script executable.
Example
.bat
file to drive the snakeATAC workflow
#!/bin/bash
#SBATCH -D ./outputs
#SBATCH -J dmnd_snake
#SBATCH -t 96:00:00
#SBATCH --ntasks=8
#SBATCH --mem=16gb
#SBATCH --account={YOUR_ACCOUNT}
#SBATCH --output ./outputs/snakeatac-%j.out
#SBATCH --error ./outputs/snakeatac-%j.err
# Load modules
module load python/3.7-2019.10
# Load the snakemake/mamba env
source activate mamba
# go to a particular directory
cd ./snakeATAC
# make things fail on errors
set -o nounset
set -o errexit
set -x
### run your commands here!
# Develop from the below links
# https://bluegenes.github.io/snakemake-via-slurm/
# https://github.com/jdblischak/smk-simple-slurm
snakemake -s /snakeATAC/Snakefile \
--use-conda \
--conda-frontend mamba \
--configfile /snakeATAC/inputs/config.yaml \
--profile simple/
Code Snippets
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 | __author__ = "Julian de Ruiter" __copyright__ = "Copyright 2017, Julian de Ruiter" __email__ = "julianderuiter@gmail.com" __license__ = "MIT" from os import path import re from tempfile import TemporaryDirectory from snakemake.shell import shell log = snakemake.log_fmt_shell(stdout=True, stderr=True) def basename_without_ext(file_path): """Returns basename of file path, without the file extension.""" base = path.basename(file_path) # Remove file extension(s) (similar to the internal fastqc approach) base = re.sub("\\.gz$", "", base) base = re.sub("\\.bz2$", "", base) base = re.sub("\\.txt$", "", base) base = re.sub("\\.fastq$", "", base) base = re.sub("\\.fq$", "", base) base = re.sub("\\.sam$", "", base) base = re.sub("\\.bam$", "", base) return base # Run fastqc, since there can be race conditions if multiple jobs # use the same fastqc dir, we create a temp dir. with TemporaryDirectory() as tempdir: shell( "fastqc {snakemake.params} -t {snakemake.threads} " "--outdir {tempdir:q} {snakemake.input[0]:q}" " {log}" ) # Move outputs into proper position. output_base = basename_without_ext(snakemake.input[0]) html_path = path.join(tempdir, output_base + "_fastqc.html") zip_path = path.join(tempdir, output_base + "_fastqc.zip") if snakemake.output.html != html_path: shell("mv {html_path:q} {snakemake.output.html:q}") if snakemake.output.zip != zip_path: shell("mv {zip_path:q} {snakemake.output.zip:q}") |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | __author__ = "Johannes Köster, Derek Croote" __copyright__ = "Copyright 2020, Johannes Köster" __email__ = "johannes.koester@uni-due.de" __license__ = "MIT" import os import tempfile from snakemake.shell import shell log = snakemake.log_fmt_shell(stdout=True, stderr=True) extra = snakemake.params.get("extra", "") outdir = os.path.dirname(snakemake.output[0]) if outdir: outdir = f"--outdir {outdir}" compress = "" for output in snakemake.output: out_name, out_ext = os.path.splitext(output) if out_ext == ".gz": compress += f"pigz -p {snakemake.threads} {out_name}; " elif out_ext == ".bz2": compress += f"pbzip2 -p{snakemake.threads} {out_name}; " with tempfile.TemporaryDirectory() as tmp: shell( "(fasterq-dump --temp {tmp} --threads {snakemake.threads} " "{extra} {outdir} {snakemake.wildcards.accession}; " "{compress}" ") {log}" ) |
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 | myargs <- commandArgs(trailingOnly=TRUE) bamfile <- myargs[1] species <- myargs[2] print("loading packages (ATACseqQC, ggplot, etc)...") suppressPackageStartupMessages(library(ggplot2, quietly=TRUE)) suppressPackageStartupMessages(library(Rsamtools, quietly=TRUE)) suppressPackageStartupMessages(library(ATACseqQC, quietly=TRUE)) suppressPackageStartupMessages(library(ChIPpeakAnno, quietly=TRUE)) suppressPackageStartupMessages(library("GenomicAlignments", quietly=TRUE)) if (species == "mm") { suppressPackageStartupMessages(library(TxDb.Mmusculus.UCSC.mm10.knownGene, quietly=TRUE)) suppressPackageStartupMessages(library(BSgenome.Mmusculus.UCSC.mm10, quietly=TRUE)) txdb <- TxDb.Mmusculus.UCSC.mm10.knownGene bsgenome <- BSgenome.Mmusculus.UCSC.mm10 genome <- Mmusculus print("species is 'mm' using mm10 for analysis") ### Note: Everything below is deprecated until I can figure out a way to port a ### static/local package with snakemake # Note: phastCons60way was manually curated from GenomicAlignments, built, and installed as an R package # score was obtained according to: https://support.bioconductor.org/p/96226/ # package was built and installed according to: https://www.bioconductor.org/packages/devel/bioc/vignettes/GenomicScores/inst/doc/GenomicScores.html # (section 5.1: Building an annotation package from a GScores object) #suppressWarnings(suppressPackageStartupMessages(library(GenomicScores, lib.loc="/users/dia6sx/snakeATAC/scripts/", quietly=TRUE))) #suppressWarnings(suppressPackageStartupMessages(library(phastCons60way.UCSC.mm10, lib.loc="/users/dia6sx/snakeATAC/scripts/", quietly=TRUE))) } else if (species == "hs") { suppressPackageStartupMessages(library(TxDb.Hsapiens.UCSC.hg38.knownGene, quietly=TRUE)) suppressPackageStartupMessages(library(BSgenome.Hsapiens.UCSC.hg38, quietly=TRUE)) txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene bsgenome <- BSgenome.Hsapiens.UCSC.hg38 genome <- Hsapiens print("species is 'hs' using hg38 for analysis") } else { print(paste("params ERROR: ATACseqQC is not configured to use species =", species)) print("exiting...") quit(status=1) } doATACseqQC <- function(bamfile, txdb, bsgenome, genome) { # Fragment size distribution print(paste("generating output for ",strsplit(basename(bamfile),split='\\.')[[1]][1],"...",sep="")) print("calculating Fragment size distribution...") bamfile.labels <- gsub(".bam", "", basename(bamfile)) loc_to_save_figures <- paste(dirname(dirname(bamfile)),"/qc/ATACseqQC",sep="") if (file.exists(loc_to_save_figures)) { print("Warning: old figures will be overwritten") } else { dir.create(loc_to_save_figures) } png_file <- paste(loc_to_save_figures,"/",bamfile.labels,"_fragment_size_distribution.png",sep="") png(png_file) fragSizeDist(bamfile, bamfile.labels) dev.off() # Adjust the read start sites print("adjusting read start sites...") ## bamfile tags to be read in possibleTag <- list("integer"=c("AM", "AS", "CM", "CP", "FI", "H0", "H1", "H2", "HI", "IH", "MQ", "NH", "NM", "OP", "PQ", "SM", "TC", "UQ"), "character"=c("BC", "BQ", "BZ", "CB", "CC", "CO", "CQ", "CR", "CS", "CT", "CY", "E2", "FS", "LB", "MC", "MD", "MI", "OA", "OC", "OQ", "OX", "PG", "PT", "PU", "Q2", "QT", "QX", "R2", "RG", "RX", "SA", "TS", "U2")) bamTop100 <- scanBam(BamFile(bamfile, yieldSize = 100), param = ScanBamParam(tag=unlist(possibleTag)))[[1]]$tag tags <- names(bamTop100)[lengths(bamTop100)>0] ## files will be output into outPath ## shift the coordinates of 5'ends of alignments in the bam file outPath <- paste(dirname(dirname(bamfile)),"/alignments_shifted", sep="") seqinformation <- seqinfo(txdb) gal <- readBamFile(bamfile, tag=tags, asMates=TRUE, bigFile=TRUE) shiftedBamfile <- file.path(outPath, paste(bamfile.labels,"_shifted.bam",sep="")) # check if shifted Bam file exists from previous run if (file.exists(shiftedBamfile)) { print("Shifted Bamfile found.") print("Loading in...") gal <- readBamFile(shiftedBamfile, tag=tags, asMates=TRUE, bigFile=TRUE) ## This step is mostly for formating so splitBam can ## take in bamfile. Implementing shift of 0 bp on positive strand ## and 0 bp on negative strand because shifted Bamfile should ## already have these shifts gal1 <- shiftGAlignmentsList(gal, positive = 0L, negative = 0L) } else { # shifted bam file does not exist check if # old shifted alignments directory exists # if so remove and create new one if (file.exists(outPath)){ unlink(outPath,recursive=TRUE) } dir.create(outPath) print("*** creating shifted bam file ***") gal1 <- shiftGAlignmentsList(gal, outbam=shiftedBamfile) } # Promoter/Transcript body (PT) score print("calculating Promoter/Transcript body (PT) score...") txs <- transcripts(txdb) pt <- PTscore(gal1, txs) png_file <- paste(loc_to_save_figures,"/",bamfile.labels,"_ptscore.png",sep="") png(png_file) plot(pt$log2meanCoverage, pt$PT_score, xlab="log2 mean coverage", ylab="Promoter vs Transcript", main=paste(bamfile.labels,"PT score")) dev.off() # Nucleosome Free Regions (NFR) score print("calculating Nucleosome Free Regions (NFR) score") nfr <- NFRscore(gal1, txs) png_file <- paste(loc_to_save_figures,"/",bamfile.labels,"_nfrscore.png",sep="") png(png_file) plot(nfr$log2meanCoverage, nfr$NFR_score, xlab="log2 mean coverage", ylab="Nucleosome Free Regions score", main=paste(bamfile.labels,"\n","NFRscore for 200bp flanking TSSs",sep=""), xlim=c(-10, 0), ylim=c(-5, 5)) dev.off() # Transcription Start Site (TSS) Enrichment Score print("calculating Transcription Start Site (TSS) Enrichment score") tsse <- TSSEscore(gal1, txs) png_file <- paste(loc_to_save_figures,"/",bamfile.labels,"_tss_enrichment_score.png",sep="") png(png_file) plot(100*(-9:10-.5), tsse$values, type="b", xlab="distance to TSS", ylab="aggregate TSS score", main=paste(bamfile.labels,"\n","TSS Enrichment score",sep="")) dev.off() # Split reads, Heatmap and coverage curve for nucleosome positions print("splitting reads by fragment length...") genome <- genome outPath <- paste(dirname(dirname(bamfile)),"/alignments_split", sep="") TSS <- promoters(txs, upstream=0, downstream=1) TSS <- unique(TSS) ## estimate the library size for normalization librarySize <- estLibSize(bamfile) ## calculate the signals around TSSs. NTILE <- 101 dws <- ups <- 1010 splitBamfiles <- paste(outPath,"/",c("NucleosomeFree", "mononucleosome", "dinucleosome", "trinucleosome"),".bam",sep="") # check if split Bam files exists from previous run if (all(file.exists(splitBamfiles))) { print("*** split bam files found! ***") print("Loading in...") sigs <- enrichedFragments(bamfiles=splitBamfiles, index=splitBamfiles, TSS=TSS, librarySize=librarySize, TSS.filter=0.5, n.tile = NTILE, upstream = ups, downstream = dws) } else { # split bam files do not exist check if # old split alignments directory exists # if so remove and create new one if (file.exists(outPath)){ unlink(outPath,recursive=TRUE) } print("*** creating split bam files ***") dir.create(outPath) ## split the reads into NucleosomeFree, mononucleosome, ## dinucleosome and trinucleosome. ## and save the binned alignments into bam files. objs <- splitGAlignmentsByCut(gal1, txs=txs, genome=genome, outPath = outPath) #objs <- splitBam(bamfile, tags=tags, outPath=outPath, # txs=txs, genome=genome, # conservation=phastCons60way.UCSC.mm10, # seqlev=paste0("chr", c(1:19, "X", "Y"))) sigs <- enrichedFragments(gal=objs[c("NucleosomeFree", "mononucleosome", "dinucleosome", "trinucleosome")], TSS=TSS, librarySize=librarySize, TSS.filter=0.5, n.tile = NTILE, upstream = ups, downstream = dws) } ## log2 transformed signals sigs.log2 <- lapply(sigs, function(.ele) log2(.ele+1)) ## plot heatmap png_file <- paste(loc_to_save_figures,"/",bamfile.labels,"_nucleosome_pos_heatmap.png",sep="") png(png_file) featureAlignedHeatmap(sigs.log2, reCenterPeaks(TSS, width=ups+dws), zeroAt=.5, n.tile=NTILE) dev.off() ## get signals normalized for nucleosome-free and nucleosome-bound regions. out <- featureAlignedDistribution(sigs, reCenterPeaks(TSS, width=ups+dws), zeroAt=.5, n.tile=NTILE, type="l", ylab="Averaged coverage") ## rescale the nucleosome-free and nucleosome signals to 0~1 range01 <- function(x){(x-min(x))/(max(x)-min(x))} out <- apply(out, 2, range01) png_file <- paste(loc_to_save_figures,"/",bamfile.labels,"_TSS_signal_distribution.png",sep="") png(png_file) matplot(out, type="l", xaxt="n", xlab="Position (bp)", ylab="Fraction of signal", main=paste(bamfile.labels,"\n","TSS signal distribution",sep="")) axis(1, at=seq(0, 100, by=10)+1, labels=c("-1K", seq(-800, 800, by=200), "1K"), las=2) abline(v=seq(0, 100, by=10)+1, lty=2, col="gray") dev.off() print("QC Finished.") print("Generated QC figures can be found in qc folder under ATACseQC") print(paste("*** removing temp files in",outPath,"***")) unlink(outPath,recursive=TRUE) outPath <- paste(dirname(dirname(bamfile)),"/alignments_shifted", sep="") print(paste("*** removing temp files in",outPath,"***")) unlink(outPath,recursive=TRUE) } doATACseqQC(bamfile, txdb, bsgenome, genome) |
7 8 9 10 11 12 13 14 15 16 | read_counts=$(gunzip -c ${1} | wc -l) # Sort and merge input peaks. Then intersect with list of tags # Bedtools docs for -u: Write original A entry (tag) once if any overlaps found in B (peaks). In other words, just report the fact at least one overlap was found in B reads_in_peaks=$(bedtools sort -i ${2} | bedtools merge -i - | bedtools intersect -u -a ${1} -b - | wc -l) echo ${reads_in_peaks} > ${3} echo ${read_counts} >> ${3} echo $(bc -l <<< "${reads_in_peaks}/${read_counts}") >> ${3} |
10 11 12 13 14 15 16 | mapped_reads=$(samtools view -c -F 260 ${1}) reads_factor=$(bc -l <<< "1/${mapped_reads}") rpm_factor=$(bc -l <<< "${reads_factor} * ${2}") bedtools genomecov -i ${3} -g ${4} -bg -scale ${rpm_factor} | LC_COLLATE=C sort -k1,1 -k2,2n > ${5} bedGraphToBigWig ${5} ${4} ${6} |
9 10 11 12 13 | bedtools bamtobed -i ${1} | \ awk 'BEGIN {OFS = "\t"} ; {if ($6 == "+") print $1, $2 + 4, $2 + 5, $4, $5, $6; else print $1, $3 - 5, $3 - 4, $4, $5, $6}' | \ bedtools slop -i - -g ${2} -b ${3} | \ bedtools intersect -a - -b ${4} -v | \ pigz > ${5} |
12 13 14 15 | shell: """ Rscript ./scripts/doATACseqQC.R {input} {params.species} > {log} """ |
25 26 27 28 | shell: """ bowtie2 {params.bowtie} -p {threads} -x {config[idx_bt2]} -1 {input[0]} -2 {input[1]} -S {output} 2> {log} """ |
46 47 48 49 50 | shell: """ samtools view -@ {threads} -b -u -q 30 {input} | \ samtools sort -@ {threads} -n -o {output} - """ |
69 70 71 72 73 74 75 76 | shell: """ # Add mate information to reads for deduplication samtools fixmate -@ {threads} -r -m {input} - 2> {log} | \ samtools sort -@ {threads} -o {output.fixmate_bam} - 2> {log} samtools index -@ {threads} {output.fixmate_bam} 2> {log} """ |
99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 | shell: """ # use markdup to remove PCR duplicates samtools markdup -@ {threads} -r -s {input} - 2> {log} | \ samtools sort -@ {threads} -o {output.dedup_bam} - 2> {log} samtools index -@ {threads} {output.dedup_bam} 2> {log} # remove unmapped reads with -F 4 samtools view -@ {threads} -b -f 3 -F 4 {output.dedup_bam} {params} | \ samtools sort -@ {threads} -o {output.final_bam} - 2> {log} samtools index -@ {threads} {output.final_bam} 2> {log} """ |
135 136 137 138 | shell: """ sh ./scripts/shift_reads.sh {input} {params.chrom_sizes} {params.slop} {params.blacklist} {output} """ |
163 164 165 166 | shell: """ sh ./scripts/generate_bigwig.sh {input.bam} {params.millions_factor} {input.tn5_bed} {params.chrom_sizes} {output.tn5_bedgraph} {output.tn5_bigwig} """ |
183 184 185 186 187 | shell: """ zcat {input.tn5_bed} | grep -w "+" | pigz > {output.tn5_bed_pos} zcat {input.tn5_bed} | grep -w "-" | pigz > {output.tn5_bed_neg} """ |
214 215 216 217 218 | shell: """ sh ./scripts/generate_bigwig.sh {input.bam} {params.millions_factor} {input.tn5_bed_pos} {params.chrom_sizes} {output.tn5_bedgraph_pos} {output.tn5_bigwig_pos} sh ./scripts/generate_bigwig.sh {input.bam} {params.millions_factor} {input.tn5_bed_neg} {params.chrom_sizes} {output.tn5_bedgraph_neg} {output.tn5_bigwig_neg} """ |
243 244 245 246 | shell: """ macs2 callpeak -t {input} --name {params.NAME} -g {params.species} --outdir {params.PEAK_DIR} --nomodel --shift -{params.shift_size} --extsize {params.ext_size} --keep-dup=all -q 0.{params.qvalue} """ |
266 267 268 269 | shell: """ samtools flagstat {input} > {output} 2> {log} """ |
289 290 291 292 | shell: """ samtools flagstat {input} > {output} 2> {log} """ |
312 313 314 315 | shell: """ sh ./scripts/frip.sh {input[0]} {input[1]} {output} """ |
337 338 339 340 | shell: """ bamPEFragmentSize -b {input} -p {threads} --binSize {params.bin_size} --blackListFileName {params.blacklist} --outRawFragmentLengths {output} """ |
358 359 360 361 | shell: """ samtools idxstats {input} | cut -f 1,3 > {output} """ |
380 381 382 383 | shell: """ maxatac average -i {input} --prefix {params.prefix} --output {params.outdir} """ |
402 403 404 405 | shell: """ maxatac normalize --signal {input} --prefix {params.prefix} --output {params.outdir} """ |
425 426 427 428 | shell: """ maxatac predict -tf {params.TF_model} -s {input} --prefix {params.prefix} --output {params.outdir} """ |
21 22 | wrapper: "0.77.0/bio/sra-tools/fasterq-dump" |
38 39 40 41 42 | shell: """ cat {input.R1} > {output.R1_OUT} cat {input.R2} > {output.R2_OUT} """ |
61 62 | wrapper: "0.77.0/bio/fastqc" |
86 87 88 89 | shell: """ trim_galore -q 30 -paired -j 4 -o {params.TRIM_DIR} {input.fq1} {input.fq2} 2> {log} """ |
108 109 | wrapper: "0.77.0/bio/fastqc" |
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 | shell: """ # Select Parameter Description # runThreadN: Number of threads to use # readFilesIn: Fastq files # outFileNamePrefix: Prefix of the outputDIR and the name to use # outSAMtype: Output the file as a BAM file that is sorted # outSAMunmapped: Output unmapped reads in the SAM file with special flag # outSAMattributes: Standard SAM attributes # alignIntronMax: Allow only 1 max intron. This is specific to ATAC-seq # STAR was designed for RNA transcripts so we want to ignore some parameters # alignMatesGapMax: Allow a maximum of 2000 bp gap. # alignEndsType: This aligns the full read and considers the whole read in score calculation. # outMultimapperOrder: Output multimapped reads in random order # outFilterMultimapNmax: Max 1000 multimapped reads # outSAMmultNmax: 1 # outFilterMismatchNoverLmax: .1 mismatch # outFilterMatchNmin: 20 # alignSJDBoverhangMin: 999 # alignEndsProtrude: 10 ConcordantPair # outFilterScoreMinOverLread: 0.66 # outFilterMatchNminOverLread: 0.66 # readFilesCommand: zcat echo "Align Files with STAR: In Progress" STAR --genomeDir {params.STAR_INDEX} \ --runThreadN {params.THREADS} \ --readFilesIn {input[0]} {input[1]} \ --outFileNamePrefix {params.Prefix} \ --outSAMtype BAM SortedByCoordinate \ --outSAMattributes Standard \ --alignIntronMax 1 \ --alignMatesGapMax 2000 \ --alignEndsType EndToEnd \ --outMultimapperOrder Random \ --outFilterMultimapNmax 999 \ --outSAMmultNmax 1 \ --outFilterMismatchNoverLmax 0.1 \ --outFilterMatchNmin 20 \ --alignSJDBoverhangMin 999 \ --alignEndsProtrude 10 ConcordantPair \ --outFilterScoreMinOverLread 0.66 \ --outFilterMatchNminOverLread 0.66 \ --readFilesCommand zcat """ |
84 85 86 87 88 89 | shell: """ samtools merge -@ {threads} {output} {input} samtools index -@ 4 {output} """ |
108 109 110 111 | shell: """ macs2 callpeak -t {input} --name {params.NAME} -g {params.species} --outdir {params.PEAK_DIR} --nomodel --shift -100 --extsize 200 --broad """ |
114 115 | shell: "TOBIAS ATACorrect --bam ${file} --genome Homo_sapiens.GRCh38.fa --peaks ${peaks} --blacklist hg38_maxatac_blacklist_V2.bed --outdir /TOBIAS/${base_filename} --cores 4" |
Support
- Future updates
Related Workflows





