Ribosome Profiling Data Analysis with Snakemake: public_riboseq Workflow
Help improve this workflow!
This workflow has been published but could be further improved with some additional meta data:- Keyword(s) in categories input, output
You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .
This is the template for a new Snakemake workflow. Replace this text with a comprehensive description covering the purpose and domain. Insert your code into the respective folders, i.e.
scripts
,
rules
, and
envs
. Define the entry point of the workflow in the
Snakefile
and the main configuration in the
config.yaml
file.
Usage
If you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this (original) repository and, if available, its DOI (see above).
Step 1: Obtain a copy of this workflow
-
Create a new github repository using this workflow as a template .
-
Clone the newly created repository to your local system, into the place where you want to perform the data analysis.
Step 2: Configure workflow
Configure the workflow according to your needs via editing the files in the
config/
folder. Adjust
config.yaml
to configure the workflow execution, and
samples.tsv
to specify your sample setup.
Step 3: Install Snakemake
Install Snakemake using conda :
conda create -c bioconda -c conda-forge -n snakemake snakemake
For installation details, see the instructions in the Snakemake documentation .
Step 4: Execute workflow
Activate the conda environment:
conda activate snakemake
Test your configuration by performing a dry-run via
snakemake --use-conda -n
Execute the workflow locally via
snakemake --use-conda --cores $N
using
$N
cores or run it in a cluster environment via
snakemake --use-conda --cluster qsub --jobs 100
Step 5: Investigate results
After successful execution, you can create a self-contained interactive HTML report with all results via:
snakemake --report report.html
This report can, e.g., be forwarded to your collaborators. An example (using some trivial test data) can be seen here .
Code Snippets
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 | library("IHW") library("DESeq2") library("ggplot2") library("RColorBrewer") library("pheatmap") library("tximport") #library("biomart") print(snakemake@output[[1]]) #Carrega os nomes dos arquivos de expressão de cada biblioteca tx2gene <- read.csv(snakemake@params[["transcripts_to_genes"]], header = FALSE, sep=",") head(tx2gene) print("\n\n") head(snakemake@params[["samples_file"]]) print("\n\n") samples_table <- read.csv(snakemake@params[["samples_file"]], header = TRUE, sep=",") samples_table <- samples_table[!duplicated(samples_table$sample), ] samples <- unique(samples_table$sample) files <- file.path("results/kallisto",samples, "abundance.tsv") names(files) <- samples files print("\n\n") #Importa os resultados de expressão de cada biblioteca txi.kallisto.tsv <- tximport(files, type = "kallisto", tx2gene = tx2gene, dropInfReps = TRUE) print(samples) head(txi.kallisto.tsv$counts) #colnames(txi.kallisto.tsv$counts) <- samples dim(txi.kallisto.tsv$counts) colDatak <-data.frame(row.names=colnames(txi.kallisto.tsv$counts), condition=as.factor(samples_table$condition), type=as.factor(samples_table$type)) colDatak ddsk <- DESeqDataSetFromTximport(txi = txi.kallisto.tsv, colData = colDatak, design = ~ type + condition + type:condition) #Cria objeto do DESeq2 ddsk <- DESeq(ddsk) print("Criar gráfico do Heatmap") #Cria gráfico do Heatmap vsdk <- rlog(ddsk, blind=FALSE) sampleDistsk <- dist(t(assay(vsdk))) sampleDistMatrixk <- as.matrix(sampleDistsk) rownames(sampleDistMatrixk) <- paste(vsdk$type, vsdk$condition, sep="-") #colnames(sampleDistMatrixk) colors <- colorRampPalette( rev(brewer.pal(9, "Blues")) )(255) print("Heatmap") png("Heatmap.png") pheatmap(sampleDistMatrixk, clustering_distance_rows=sampleDistsk, clustering_distance_cols=sampleDistsk, col=colors) dev.off() #Cria PCA pcaDatak <- plotPCA(vsdk, intgroup=c("type", "condition"), returnData=TRUE) percentVark <- round(100 * attr(pcaDatak, "percentVar")) png("PCA.png") ggplot(pcaDatak, aes(PC1, PC2, color=condition, shape=type)) + geom_point(size=3) + xlab(paste0("PC1: ",percentVark[1],"% variance")) + ylab(paste0("PC2: ",percentVark[2],"% variance")) + coord_fixed() dev.off() print("Results") ddsk$group <- factor(paste0(ddsk$type, "_", ddsk$condition)) design(ddsk) <- ~ group ddsk <- DESeq(ddsk) print(ddsk$group) #Comparações que serão feitas res_trap_npc_neuron<-results(ddsk, contrast=c("group","trap_Neuron","trap_NPC"), filterFun=ihw) res_rna_npc_neuron<-results(ddsk, contrast=c("group","rna_Neuron","rna_NPC"), filterFun=ihw) res_trap_npc_rna_npc<-results(ddsk, contrast=c("group","trap_NPC","rna_NPC"), filterFun=ihw) res_trap_neuron_rna_neuron<-results(ddsk, contrast=c("group","trap_Neuron","rna_Neuron"), filterFun=ihw) design(ddsk) <- ~ type + condition + type:condition ddsk <- DESeq(ddsk, test = "LRT", reduced = ~type + condition) resk <- results(ddsk, filterFun=ihw) res_neuron <- results(ddsk, name="typetrap.conditionNPC", test="Wald") print(resultsNames(ddsk)) #Adiciona anotação para os genes split_gene_id <- function(word){ strsplit(word, ".", fixed = TRUE)[[1]][1] #unlist(strsplit(my.string, ".", fixed = TRUE))[1] } gene_id = apply(as.matrix(rownames(resk)), 1, split_gene_id) rownames(resk) <- gene_id resk$ensembl_id <- rownames(resk) #martk <- biomaRt::useMart(biomart = "ensembl", dataset = "hsapiens_gene_ensembl") #genes.tablek <- biomaRt::getBM(attributes = c("ensembl_gene_id", "external_gene_name", "description", "gene_biotype"), mart = martk) #resk <- merge(x = as.matrix(resk), y = genes.tablek, by.x = "ensembl_id", by.y = "ensembl_gene_id", all.x = T, all.y = F ) counts_normalizedk<-counts(ddsk, normalized=TRUE) counts_rawk<-counts(ddsk) Tabelao<-cbind(res_trap_npc_neuron,res_rna_npc_neuron,res_trap_npc_rna_npc,res_trap_neuron_rna_neuron,resk,res_neuron,counts_rawk,counts_normalizedk) write.table(as.data.frame(Tabelao),file=snakemake@output[[1]], sep = "\t") |
2
of
scripts/kallisto_deseq2.R
35 36 | wrapper: "v0.75.0/bio/sra-tools/fasterq-dump" |
51 52 | shell: "trim_galore {params.trim} --cores {threads} --output_dir {output} {input} 2> {log}" |
66 67 68 69 70 71 72 73 74 75 | shell: """ set +e FQ1=`echo {input} | awk '{{for(X=1;X<=NF;X++){{OUT=OUT $X"/*_1_val_1.fq.gz "}}}}END{{print OUT}}'` FQ2=`echo {input} | awk '{{for(X=1;X<=NF;X++){{OUT=OUT $X"/*_2_val_2.fq.gz "}}}}END{{print OUT}}'` echo $FQ1 echo $FQ2 cat $FQ1 > {output.fq1} 2> {log} cat $FQ2 > {output.fq2} 2>> {log} """ |
94 95 96 97 98 | shell: """ out=`echo {output.unaligned_fq1} | sed 's/.1.fq.gz/.%.fq.gz/'` bowtie2 {params.bowtie2} -p {threads} --un-conc-gz $out -1 {input.fq1} -2 {input.fq2} -S {output.aligned} 2> {log} """ |
115 116 117 118 | shell: "kallisto quant -t {threads} {params.extra} " "-g {params.gtf} " "-i {params.index} -o {output} {input} > {log} 2>&1" |
130 131 | script: "./scripts/kallisto_deseq2.R" |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | __author__ = "Johannes Köster, Derek Croote" __copyright__ = "Copyright 2020, Johannes Köster" __email__ = "johannes.koester@uni-due.de" __license__ = "MIT" import os import tempfile from snakemake.shell import shell log = snakemake.log_fmt_shell(stdout=True, stderr=True) outdir = os.path.dirname(snakemake.output[0]) if outdir: outdir = "--outdir {}".format(outdir) extra = snakemake.params.get("extra", "") with tempfile.TemporaryDirectory() as tmp: shell( "fasterq-dump --temp {tmp} --threads {snakemake.threads} " "{extra} {outdir} {snakemake.wildcards.accession} {log}" ) |
Support
- Future updates
Related Workflows





