Snakemake based analysis pipeline to identify m6As from eCLIP data

public 1yr ago 0 bookmarks

View Workflow

Help improve this workflow!

This workflow has been published but could be further improved with some additional meta data:

Keyword(s) in categories input, output, operation, topic

You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .

Overview of meCLIP

meCLIP is method to identify m6As residues at single-nucleotide resolution using eCLIP. It uses the Snakemake workflow management system to handle analysis of the resulting sequencing reads. The meCLIP analysis pipeline requires minimal input from the user once executed and automatically generates a list of identified m6As along with a report summarizing the analysis. The following steps outline installation of Snakemake and execution of the meCLIP workflow.

Requirements

If installed as recommended, the pipeline automatically handles the installation of required software. For reference, the following programs are used by meCLIP:

Snakemake (v7.24.2)
STAR (v2.7.10b)
SeqKit (v2.3.0)
BEDOPS (v2.4.41)
Perl (v5.32.1)
FastQC (v0.11.7)
UMI-tools (v1.1.4)
cutadapt (v4.2)
samtools (v1.16.1)
Java (OpenJDK v11.0.15)
bedtools (v2.30.0)
R (v4.2.2)
 "scales" and "ggplot2" packages
MultiQC (v1.14)

The meCLIP pipeline should work on all modern operating systems and has been specifically tested on Ubuntu (Bionic, Focal, Jammy), OS X (10.15), and Windows 10/11 using WSL (for a great guide on installing WSL in Windows, see [here][id2]).

The pipeline itself is not very resource intensive, although providing more cores / threads does speed up execution (for reference, running on a human sample with 11 threads takes ~6-8 hours). The biggest bottleneck is the genome indexing step, which requires at least 16Gb of memory (see Reference Genome section below if this is an issue). The aligner indices, genomes, and corresponding annotations can also be quite large (45Gb for human genome), so please ensure you have sufficient disk space to store the files before executing the pipeline.

Conda Installation

The recommended way to setup the meCLIP pipeline is via conda because it also enables any software dependencies to be easily installed.

First, download the latest version [here][id] (making sure to download the Python3 version) and then execute the following command:

bash Miniconda3-latest-Linux-x86_64.sh

Answer 'yes' to the question about whether conda shall be put into your PATH. You can check that the installation was successful by running:

conda list

For a successful installation, a list of installed packages appears.

Prepare Working Directory

The meCLIP pipeline creates a number of intermediate and temporary files as different underlying tools are executed. In order to keep these files organized and separated from other meCLIP instances, we recommend creating an independent working directory for each meCLIP experiment (where a single meCLIP experiment would consist of an 'IP' sample with a corresponding 'Input' sample). While there are several ways to accomplish this, the easiest is to simply clone the meCLIP files from GitHub into a new directory for each independent meCLIP experiment.

First, create a new directory in a reasonable place to use as your working directory.

mkdir experiment-name

Next, clone the meCLIP files from GitHub into that directory:

git clone https://github.com/ajlabuc/meCLIP.git

After the pipeline finishes running the meCLIP files themselves can be safely deleted, leaving an independent directory containing all the relevant files from the analysis of that particular experiment.

Create Conda Environment

The environment.yaml file that was downloaded from GitHub can be used to install all the software required by meCLIP into an isolated Conda environment. This ensures that the correct version of the software is utilized and any other dependencies are reconciled.

The default Conda solver is a bit slow and sometimes has issues with selecting the latest package releases. Therefore, we recommend to install Mamba as a drop-in replacement via:

conda install -c conda-forge mamba

Then, to create an environment with the required software:

mamba env create --name meCLIP --file workflow/envs/environment.yaml

Finally, to activate the meCLIP environment, execute:

conda activate meCLIP

Be sure to exit the environment once the analysis is complete.

Customize Configuration File

One of the few steps in the meCLIP analysis pipeline that actually requires opening a file is customizing the configuration. This is where you inform the pipeline where relevant files are on your system, namely the sequencing reads and reference genomes. A sample configuration file is included in the downloaded files and detailed below.

sample_name: meCLIP
resources:
 threads: 11
 ram: 29
reads:
 ip:
 ip_read1: reads/ip_read_1.fq
 ip_read2: reads/ip_read_2.fq
 input:
 input_read1: reads/input_read_1.fq
 input_read2: reads/input_read_2.fq
species:
 name: homo_sapiens
 assembly: GRCh38
 release: 109
adapters:
 dna: 
 randomer_size: 10
 rna: 
 name: X1A
 sequence: AUAUAGGNNNNNAGAUCGGAAGAGCGUCGUGUAG

sample_name: easily identifiable name of the experiment (this will be included in most of the file names and plot titles)
threads: defines the number of threads available to the pipeline (we recommend one less than the total number of usable CPU cores)
ram: defines the amount of RAM available to the pipeline (mainly used for genome indexing)
reads: specifies the locations / filenames of the paired sequencing read files for the IP and INPUT samples (relative to working directory)
species: describes the reference species to identify m6A residues in (see 'Reference Genome' below for details)
adapters: details the specific adapters used in the experiment (see 'Adapters' below for details)

Reference Genome

The pipeline will attempt to download the appropriate reference genome automatically based on the information provided in the species section of the configuration file. The script attempts to download the 'primary assembly' reference and reverts to the 'top level' reference if there is no primary assembly for that species. Automatic downloading is currently only supported for vertebrates listed in the Ensembl database (see [here][id3]).

The name should be all lowercase with underscores for spaces (see Ensembl link above for proper formatting)
The assembly should be the appropriate reference genome assembly for the indicated species (i.e., GRCh38 for humans)
The release should be the desired Ensembl release number (new releases occur every few months, see [here][id4] for info)

The pipeline will parse the Ensembl FTP site to obtain the reference genome FASTA file and associated GTF annotation. As previously mentioned, automatic downloading is currently only available for vertebrate species. To utilize other organisms (i.e., bacteria, fungi, plants, etc.), reference genomes and annotations can be manually provided using the 'manual-reference-genome.yaml' configuration file located in the 'config' directory.

resources:
 threads: 11
 ram: 29
genome:
 fasta: <url-to-fasta-file>
 gtf: <url-to-gtf-file>

resources: same as main configuration file, specify available memory and threads for genome indexing / annotation
genome: URL links (i.e., from FTP site) where the FASTA and GTF files can be downloaded (see sample config file for example)

Once the configuration file has been updated with the desired genome information, the files can be downloaded and annotated with the following command:

snakemake --snakefile workflow/rules/genome-download.smk --cores N

The script will download the defined reference files and index / annotate them so that they can be used in the main pipeline.

Adapters

The meCLIP library preparation consists of adding two separate adapters to the m6A containing fragment. The first is a (possibly indexed) RNA adapter that is ligated while the transcript is still on the immunoprecipitation beads, and the second is a ssDNA adapter ssDNA adapter that is ligated to the cDNA following reverse transcription. The ssDNA adapter contains a randomized UMI (either N5 or N10) to determine whether identical reads are unique transcripts or PCR duplicates of the same RNA fragment. Therefore, the following information should be provided in the adapter section of the configuration file:

The randomer should indicate the length of the random UMI sequence of the ssDNA adapter (either 5 or 10)
The name should be the given name of the RNA adapter used in the library prep (see meCLIP / eCLIP paper)
The sequence should be the complete sequence of the RNA adapter used in the library prep

The strategy outlined in the original meCLIP paper was based on the eCLIP protocol which recommend using two different RNA adapters to ensure proper balancing as on the Illumina HiSeq sequencer. Given load balancing is not as much of an issue on more recent instruments, we find it is sufficient to only use a single adapter which is what is assumed by the current version of this pipeline.

The ends of each adapter are complimentary to the Illumina TruSeq adapters and therefore the resulting reads generally have the following structure where 'Read 1' begins with the RNA adapter and 'Read 2' (corresponding to the sense strand) begins with the UMI followed by a sequence corresponding to the 5′ end of the original RNA fragment.

Read 1
[RNA Adapter (Reverse)] [Sequenced Fragment (Reverse)] [DNA Adapter]

Read 2
[DNA Adapter (Reverse)] [Sequenced Fragment] [RNA Adapter]

Execute the Analysis Pipeline

Once the location of the reads and genomes are saved into the configuration file, the analysis pipeline can then be executed simply by executing the following command within the working directory (where 'N' is the number of CPU cores available to the pipeline):

snakemake --use-conda --cores N

This will identify m6As in the IP and INPUT samples, cross-reference the two sets to remove any overlaps (to account for mutations not induced by the m6A antibody), and then annotate the resulting m6As and generate a report summarizing the analysis.

Final Remarks

The goal of the meCLIP analysis pipeline is to simplify the identification of m6As from sequencing reads by streamlining the workflow so that the vast majority of steps are automatically executed and relevant output files are automatically generated. We hope this pipeline will become a valuable tool for researchers interesting in identifying m6As.

Citation

If you use this pipeline to identify m6A sites, please cite the following manuscript:

“Identification of m6A residues at single-nucleotide resolution using eCLIP and an accessible custom analysis pipeline.”
Justin T. Roberts, Allison M. Porman, Aaron M. Johnson.
RNA. 2020 Dec 29;27(4):527-541. doi:10.1261/rna.078543.120

Code Snippets

suppressMessages(if (!require(scales)) 
  install.packages('scales', repos = "http://cran.us.r-project.org"))
suppressMessages(library("scales"))

## Read in distance measure file
dist <- read.delim (snakemake@input[["txt"]], header = T)
title <- snakemake@params[["sample_name"]]

## Select the longest isoforms
trx_len <- dist$utr5_size + dist$cds_size + dist$utr3_size
dist <- dist[order(dist$gene_name, trx_len),] # sort by gene name, then transcript length
dist <- dist[duplicated(dist$gene_name),] # select the longest isoform

## Rescale regions and determine scale
utr5.SF <- median(dist$utr5_size, na.rm = T)/median(dist$cds_size, na.rm = T)
utr3.SF <- median(dist$utr3_size, na.rm = T)/median(dist$cds_size, na.rm = T)

# Assign the regions to new dataframes
utr5.dist <- dist[dist$rel_location < 1, ]
utr3.dist <- dist[dist$rel_location >= 2, ]
cds.dist <- dist [dist$rel_location < 2 & dist$rel_location >= 1, ]

# Rescale 5'UTR and 3'UTR
utr5.dist$rel_location <- rescale(utr5.dist$rel_location, to = c(1-utr5.SF, 1), from = c(0,1))
utr3.dist$rel_location <- rescale(utr3.dist$rel_location, to = c(2, 2+utr3.SF), from = c(2,3))

# Combine the regions in a new dataframe and plot
all.regions <- c(utr5.dist$rel_location, cds.dist$rel_location, utr3.dist$rel_location)

## A smooth density plot
png(filename=snakemake@output[["png"]])
plot(density(all.regions), main=title, xaxt='n', xlab="")
abline (v = 1, lty  = 1, col = "red")
abline (v = 2, lty  = 1, col = "red")
axis(at=c(0.6,1.5,2.5), labels=c("5'UTR","CDS","3'UTR"), side=1)

R scales From line 1 of metaPlotR/visualize_metagenes.R

shell:
    """
    name="{params.name}"; \
    name="${{name^}}"; \
    ram="$(({params.ram}*1000000000))"; \
    curl -s --list-only {params.url_fasta}/ | if grep -q "primary_assembly"; then wget -P {output} {params.url_fasta}/${{name}}.{params.assembly_fasta}.dna.primary_assembly.fa.gz; else wget -P {output} {params.url_fasta}/${{name}}.{params.assembly_fasta}.dna.toplevel.fa.gz; fi; \
    wget -P {output} {params.url_gtf}/${{name}}.{params.assembly_gtf}.gtf.gz; \
    gzip -dr {output}; \
    echo "$(date +%b' '%d' '%H:%M:%S) Indexing genome (takes a while)..."; \
    STAR --runThreadN {threads_max} --limitGenomeGenerateRAM ${{ram}} --runMode genomeGenerate --genomeSAsparseD 2 --genomeFastaFiles {output}/*.fa --sjdbGTFfile {output}/*.gtf --genomeDir {output}/index > {log} && \
    echo "$(date +%b' '%d' '%H:%M:%S) Annotating genome (takes a while)..." && \
    seqkit split {output}/*.fa -i --by-id-prefix "" --id-regexp "([^\s]+)" -O {output}/chroms --quiet && \
    awk -F'"' -v OFS='"' '{{for(i=2; i<=NF; i+=2) gsub(";", "-", $i)}} 1' {output}/${{name}}.{params.assembly_gtf}.gtf | gtf2bed --attribute-key=gene_name - > {output}/${{name}}.{params.assembly_gtf}.geneNames.bed && \
    gtfToGenePred -genePredExt {output}/${{name}}.{params.assembly_gtf}.gtf {output}/${{name}}.{params.assembly_gtf}.genePred && \
    perl workflow/scripts/metaPlotR/make_annot_bed.pl --genomeDir {output}/chroms/ --genePred {output}/${{name}}.{params.assembly_gtf}.genePred > {output}/${{name}}.{params.assembly_gtf}.annotated.bed && \
    echo "$(date +%b' '%d' '%H:%M:%S) Sorting annotation..." && \
    sort -k1,1 -k2,2n {output}/${{name}}.{params.assembly_gtf}.annotated.bed > {output}/${{name}}.{params.assembly_gtf}.annotated.sorted.bed && \
    rm {output}/${{name}}.{params.assembly_gtf}.annotated.bed && \
    echo "$(date +%b' '%d' '%H:%M:%S) Calculating size of transcript regions (i.e. 5'UTR, CDS and 3'UTR)..." && \
    perl workflow/scripts/metaPlotR/size_of_cds_utrs.pl --annot {output}/${{name}}.{params.assembly_gtf}.annotated.sorted.bed > {output}/${{name}}.{params.assembly_gtf}.region_sizes.txt
    """

SnakeMake STAR seqkit GFFutils gtftogenepred From line 22 of workflow/Snakefile

shell:
    """
    mkdir -p {output} && \
    rna_seq="{params.rna_seq}"; \
    rna_seq=`echo -e ">seq\n${{rna_seq}}" | seqkit subseq -r 1:7 | seqkit seq -s -t RNA --rna2dna --quiet`; \
    rna_seq_rc=`echo -e ">seq\n${{rna_seq}}" | seqkit seq -p -r -s -t DNA --quiet`; \
    echo -e "TRUSEQ_I5\t{params.truseq_read2}\nTRUSEQ_I7\t{params.truseq_read1}\n{params.rna_name}\t${{rna_seq}}\n{params.rna_name}_RC\t${{rna_seq_rc}}" > config/adapterList.txt && \
    fastqc --quiet -t {threads} -a config/adapterList.txt -o {output} {input.read2} && \
    mv {output}/*.zip {output}/{params.sample_name}_pre_fastqc.zip
    """

SnakeMake FastQC seqkit subSeq From line 58 of workflow/Snakefile

shell:
    """
	randomer_size="{params.randomer_size}"; \
    randomer_size=`echo "{{"${{randomer_size}}"}}"`; \
    umi_tools extract --extract-method=regex --bc-pattern="^(?P<umi_1>.${{randomer_size}})" -I {input.read2} --read2-in={input.read1} --stdout={output.read2} --read2-out={output.read1} --log={log}
    """

SnakeMake umi_tools From line 83 of workflow/Snakefile

shell:
    """
    rna_seq="{params.rna_seq}"; \
    rna_seq_rc=`echo -e ">seq\n${{rna_seq}}" | seqkit seq -p -r -s -t RNA --rna2dna --quiet`; \
    randomer_size="{params.randomer_size}"; \
    randomer_size=`echo "{{"${{randomer_size}}"}}"`; \
    cutadapt {params.args} -j {threads} -a "ssDNA=${{rna_seq_rc}};min_overlap=5...N${{randomer_size}}{params.truseq_read1};min_overlap=15" -A ssRNA={params.rna_seq} -A TruSeq={params.truseq_read2} -o {output.read1} -p {output.read2} {input} > {log}
    """

SnakeMake Cutadapt seqkit From line 107 of workflow/Snakefile

shell:
    """
    mkdir -p {output} && \
    fastqc --quiet -t {threads} -a config/adapterList.txt -o {output} {input} && \
    mv {output}/*.zip {output}/{params.sample_name}_post_fastqc.zip
    """

SnakeMake FastQC From line 125 of workflow/Snakefile

shell:
    """
    STAR --runThreadN {threads} --runMode alignReads --genomeDir resources/index --readFilesIn {input.read1} {input.read2} --outFileNamePrefix {params.prefix} {params.args} > {output.bam} && \
    samtools index -@ {threads} {output.bam}
    """

SnakeMake SAMtools STAR From line 144 of workflow/Snakefile

shell:
    """
    umi_tools dedup -I {input.bam} --paired -S {output.bam} --log={log} && \
    samtools index -@ {threads} {output.bam} && \
    samtools view -hb -f 128 {output.bam} > {output.r2} && \
    samtools index -@ {threads} {output.r2}
    """

SnakeMake SAMtools umi_tools From line 160 of workflow/Snakefile

shell:
    "samtools mpileup -B -f resources/*.fa {input.bam} > {output.mpileup} 2> {log}"

SnakeMake SAMtools From line 176 of workflow/Snakefile

shell:
    """
    java -cp workflow/scripts/meclip/target/meCLIP.jar com.github.ajlabuc.meclip.MpileupParser {input.mpileup} {params.prefix} C T 0.025 0.5 2 0 && \
    seqkit subseq --quiet --bed results/mpileup/IP/{params.prefix}_motifList_positive.bed resources/*.fa > results/mpileup/IP/motifList_positive.fa && \
    seqkit subseq --quiet --bed results/mpileup/IP/{params.prefix}_motifList_negative.bed resources/*.fa > results/mpileup/IP/motifList_negative.fa && \
    java -cp workflow/scripts/meclip/target/meCLIP.jar com.github.ajlabuc.meclip.MotifFrequencyCalculator results/mpileup/IP/motifList_positive.fa results/mpileup/IP/{params.prefix}_MpileupParser_positive.xls [AG]AC && \
    java -cp workflow/scripts/meclip/target/meCLIP.jar com.github.ajlabuc.meclip.MotifFrequencyCalculator results/mpileup/IP/motifList_negative.fa results/mpileup/IP/{params.prefix}_MpileupParser_negative.xls GT[TC] && \
    cat <(echo -e "Chr \tM6A \tRef \tFreq \tMutationCount \tReadCount \tMotifStart \tMotifEnd \tMotif") results/mpileup/IP/{params.prefix}_MpileupParser_positive_motifFrequency.xls results/mpileup/IP/{params.prefix}_MpileupParser_negative_motifFrequency.xls > {output.xls} && \
    awk '{{if ($9!="No") print $0}}' {output.xls} > {output.xls}.temp && mv {output.xls}.temp {output.xls}
    """

SnakeMake seqkit subSeq From line 187 of workflow/Snakefile

shell:
    """
    mkdir -p {output} && \
    fastqc --quiet -t {threads} -a config/adapterList.txt -o {output} {input.read2} && \
    mv {output}/*.zip {output}/{params.sample_name}_pre_fastqc.zip
    """

SnakeMake FastQC From line 208 of workflow/Snakefile

shell:
    """
	randomer_size="{params.randomer_size}"; \
    randomer_size=`echo "{{"${{randomer_size}}"}}"`; \
    umi_tools extract --extract-method=regex --bc-pattern="^(?P<umi_1>.${{randomer_size}})" -I {input.read2} --read2-in={input.read1} --stdout={output.read2} --read2-out={output.read1} --log={log}
    """

SnakeMake umi_tools From line 229 of workflow/Snakefile

shell:
    """
    rna_seq="{params.rna_seq}"; \
    rna_seq_rc=`echo -e ">seq\n${{rna_seq}}" | seqkit seq -p -r -s -t RNA --rna2dna --quiet`; \
    randomer_size="{params.randomer_size}"; \
    randomer_size=`echo "{{"${{randomer_size}}"}}"`; \
    cutadapt {params.args} -j {threads} -a "ssDNA=${{rna_seq_rc}};min_overlap=5...N${{randomer_size}}{params.truseq_read1};min_overlap=15" -A ssRNA={params.rna_seq} -A TruSeq={params.truseq_read2} -o {output.read1} -p {output.read2} {input} > {log}
    """

SnakeMake Cutadapt seqkit From line 253 of workflow/Snakefile

shell:
    """
    mkdir -p {output} && \
    fastqc --quiet -t {threads} -a config/adapterList.txt -o {output} {input} && \
    mv {output}/*.zip {output}/{params.sample_name}_post_fastqc.zip
    """

SnakeMake FastQC From line 271 of workflow/Snakefile

shell:
    """
    STAR --runThreadN {threads} --runMode alignReads --genomeDir resources/index --readFilesIn {input.read1} {input.read2} --outFileNamePrefix {params.prefix} {params.args} > {output.bam} && \
    samtools index -@ {threads} {output.bam}
    """

SnakeMake SAMtools STAR From line 290 of workflow/Snakefile

shell:
    """
    umi_tools dedup -I {input.bam} --paired -S {output.bam} --log={log} && \
    samtools index -@ {threads} {output.bam} && \
    samtools view -hb -f 128 {output.bam} > {output.r2} && \
    samtools index -@ {threads} {output.r2} && \
    cp results/STAR/IP/*.Log.final.out results/logs && \
    cp results/STAR/INPUT/*.Log.final.out results/logs && \
    rm -R results/STAR/
    """

SnakeMake SAMtools umi_tools From line 306 of workflow/Snakefile

shell:
    "samtools mpileup -B -f resources/*.fa {input.bam} > {output.mpileup} 2> {log}"

SnakeMake SAMtools From line 325 of workflow/Snakefile

shell:
    """
    java -cp workflow/scripts/meclip/target/meCLIP.jar com.github.ajlabuc.meclip.MpileupParser {input.mpileup} {params.prefix} C T 0.025 0.5 2 0 && \
    seqkit subseq --quiet --bed results/mpileup/INPUT/{params.prefix}_motifList_positive.bed resources/*.fa > results/mpileup/INPUT/motifList_positive.fa && \
    seqkit subseq --quiet --bed results/mpileup/INPUT/{params.prefix}_motifList_negative.bed resources/*.fa > results/mpileup/INPUT/motifList_negative.fa && \
    java -cp workflow/scripts/meclip/target/meCLIP.jar com.github.ajlabuc.meclip.MotifFrequencyCalculator results/mpileup/INPUT/motifList_positive.fa results/mpileup/INPUT/{params.prefix}_MpileupParser_positive.xls [AG]AC && \
    java -cp workflow/scripts/meclip/target/meCLIP.jar com.github.ajlabuc.meclip.MotifFrequencyCalculator results/mpileup/INPUT/motifList_negative.fa results/mpileup/INPUT/{params.prefix}_MpileupParser_negative.xls GT[TC] && \
    cat <(echo -e "Chr \tM6A \tRef \tFreq \tMutationCount \tReadCount \tMotifStart \tMotifEnd \tMotif") results/mpileup/INPUT/{params.prefix}_MpileupParser_positive_motifFrequency.xls results/mpileup/INPUT/{params.prefix}_MpileupParser_negative_motifFrequency.xls > {output.xls} && \
    awk '{{if ($9!="No") print $0}}' {output.xls} > {output.xls}.temp && mv {output.xls}.temp {output.xls} && \
    rm -R results/mpileup/
    rm -R results/umitools/
    """

SnakeMake seqkit subSeq From line 336 of workflow/Snakefile

shell:
    "java -cp workflow/scripts/meclip/target/meCLIP.jar com.github.ajlabuc.meclip.InputParser {input.ip_sample} {input.input_sample} {output.xls}"

SnakeMake From line 356 of workflow/Snakefile

shell:
    """
    preCount=`wc -l < results/*_IP.MpileupParser_MotifFrequency.xls`; postCount=`wc -l < results/*.m6aList.xls`; finalCount=`echo "scale = 3; 100 - ((($postCount - 1) / ($preCount - 1)) * 100)" | bc`; \
    java -cp workflow/scripts/meclip/target/meCLIP.jar com.github.ajlabuc.meclip.ConfidenceCategorizer {input.xls} {params.name} {params.prefix}.m6aList ${{finalCount}} && \
    sort -k1,1 -k2,2n {params.prefix}.m6aList.bed > {output.bed} && \
    bedtools intersect -a {output.bed} -b resources/*.geneNames.bed -wb -s -sorted > {params.prefix}.m6aList.sorted.annotated.bed && \
    java -cp workflow/scripts/meclip/target/meCLIP.jar com.github.ajlabuc.meclip.BedAnnotator {params.prefix}.m6aList.sorted.annotated.bed {output.xls} && \
    rm {params.prefix}.m6aList.sorted.annotated.bed  && \
    rm results/*_MotifFrequency.xls
    """

SnakeMake BEDTools From line 369 of workflow/Snakefile

shell:
    """
    bedtools intersect -a {input.bed} -b resources/*.annotated.sorted.bed > results/metaPlotR/annot_m6a.sorted.bed -wo -s -sorted && \
    perl workflow/scripts/metaPlotR/rel_and_abs_dist_calc.pl --bed results/metaPlotR/annot_m6a.sorted.bed --regions resources/*.region_sizes.txt > {output.txt} 2> {log}
    """

SnakeMake BEDTools From line 388 of workflow/Snakefile

script:
    "scripts/metaPlotR/visualize_metagenes.R"

SnakeMake From line 402 of workflow/Snakefile

shell: 
    """
    multiqc results --config config/multiqc_config.yaml -n meCLIP_report -o results/report && \
    rm -R results/metaPlotR/
    """

SnakeMake MultiQC From line 412 of workflow/Snakefile

ShowHide 15 more snippets with no or duplicated tags.

Comments

Support

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Created: 1yr ago

Updated: 1yr ago

Maitainers: public

URL: https://github.com/ajlabuc/meCLIP

Name: meclip

Version: 1

Badge:

Insert copied code into your website to add a link to this workflow.

License: MIT License

Keywords:

umi_tools BEDTools Cutadapt FastQC GFFutils gtftogenepred MultiQC SAMtools seqkit Snakemake STAR subSeq scales

Future updates

Related Workflows

psychip_snakemake — Show Details View Workflow

ENCODE pipeline for histone marks developed for the psychENCODE project

public

psychip pipeline is an improved version of the ENCODE pipeline for histone marks developed for the psychENCODE project. The o...

raw sequence reads Alignment Sequence alignment report macs2 ucsc-bedclip bedGraphToBigWig BEDTools BWA Picard SAMtools Snakemake

Free

Near-real time tracking of SARS-CoV-2 in Connecticut

public

Repository containing scripts to perform near-real time tracking of SARS-CoV-2 in Connecticut using genomic data. This pipeli...

JSON nextclade Augur Biopython FOCUS Pandas Snakemake bs4 epiweeks geopy matplotlib numpy pycountry pycountry-convert uszipcode

Free

cellranger-snakemake-gke — Show Details View Workflow

snakemake workflow to run cellranger on a given bucket using gke.

public

A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...

macs2 ucsc-bedclip bedGraphToBigWig BEDTools BWA Picard SAMtools Snakemake

Free

ATLAS - Three commands to start analyzing your metagenome data

public

Metagenome-atlas is a easy-to-use metagenomic pipeline based on snakemake. It handles all steps from QC, Assembly, Binning, t...

raw sequence reads Genome assembly Annotation track checkm2 gunc prodigal snakemake-wrapper-utils MEGAHIT Atlas BBMap Biopython BioRuby Bwa-mem2 cd-hit CheckM DAS Diamond eggNOG-mapper v2 MetaBAT 2 Minimap2 MMseqs MultiQC Pandas Picard pyfastx SAMtools SemiBin Snakemake SPAdes SqueezeMeta TADpole VAMB CONCOCT ete3 gtdbtk h5py networkx numpy plotly psutil utils metagenomics

Free

175

rna-seq-star-deseq2 — Show Details View Workflow

RNA-seq workflow using STAR and DESeq2

public

This workflow performs a differential gene expression analysis with STAR and Deseq2. The usage of this workflow is described ...

Free

dna-seq-gatk-variant-calling — Show Details View Workflow

This Snakemake pipeline implements the GATK best-practices workflow

public

This Snakemake pipeline implements the GATK best-practices workflow for calling small germline variants. The usage of thi...

VCF raw sequence reads Variant calling genetic variants gatk rust-bio-tools snakemake-wrapper-utils tabix BCFtools BWA FastQC MultiQC Pandas Picard SAMtools Snakemake Trimmomatic Variant Effect Predictor (VEP) common matplotlib numpy seaborn DNA

Free