RADSeq tool with Snakemake workflow integration for analysis of RAD sequencing data.
Help improve this workflow!
This workflow has been published but could be further improved with some additional meta data:- Keyword(s) in categories input, output, operation, topic
You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .
NodeRAD is a Snakemake workflow for analysis of single-end reads from RAD sequencing without the presence of a reference genome. It detects loci and genomic variants using sequencing error and heterozygosity rates. For more information please have a look at the associated bachelor thesis .
Note: Currently the workflow is limited to diploid species.
Authors
- Antonie Vietor (@AntonieV)
This workflow is part of a bachelor thesis The topic and the underlying model were developed by:
- Johannes Köster (@johanneskoester), https://koesterlab.github.io
Usage
If you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this (original) repository and, if available, its DOI (see above).
Step 1: Obtain a copy of this workflow
-
Create a new GitHub repository using this workflow as a template .
-
Clone the newly created repository to your local system, into the place where you want to perform the data analysis.
Step 2: Configure workflow
Configure the workflow according to your needs via editing the files in the
config/
folder. Adjust
config.yaml
to configure the workflow execution, and
samples.tsv
to specify your sample setup.
Step 3: Install Snakemake
conda install -n base -c conda-forge mamba
conda create -c bioconda -c conda-forge -n snakemake snakemake
For installation details, see the instructions in the Snakemake documentation .
Step 4: Execute workflow
You can run the workflow with some examples through the script
start.sh
. To use your own data, change the paths in
config/config.yaml
for
samples
,
fastq-data
and
eval-data
(if there is data from a ddRAGE simulation).
Activate the conda environment:
conda activate snakemake
Test your configuration by performing a dry-run via
snakemake --use-conda --conda-frontend mamba -n
Execute the workflow locally via
snakemake --use-conda --conda-frontend mamba --cores $N
using
$N
cores or run it in a cluster environment via
snakemake --use-conda --conda-frontend mamba --cluster qsub --jobs 100
or
snakemake --use-conda --conda-frontend mamba --drmaa --jobs 100
If you not only want to fix the software stack but also the underlying OS, use
snakemake --use-conda --conda-frontend mamba --use-singularity
in combination with any of the modes above. See the Snakemake documentation for further details.
Step 5: Investigate results
After successful execution, you can create a self-contained interactive HTML report with all results via:
snakemake --report report.html
This report can, e.g., be forwarded to your collaborators. An example (using some trivial test data) can be seen here .
Step 6: Commit changes
Whenever you change something, don't forget to commit the changes back to your github copy of the repository:
git commit -a
git push
Step 7: Obtain updates from upstream
Whenever you want to synchronize your workflow copy with new developments from upstream, do the following.
-
Once, register the upstream repository in your local copy:
git remote add -f upstream git@github.com:TharjaX/NodeRAD.git
orgit remote add -f upstream https://github.com/TharjaX/NodeRAD.git
if you do not have setup ssh keys. -
Update the upstream version:
git fetch upstream
. -
Create a diff with the current version:
git diff HEAD upstream/master workflow > upstream-changes.diff
. -
Investigate the changes:
vim upstream-changes.diff
. -
Apply the modified diff via:
git apply upstream-changes.diff
. -
Carefully check whether you need to update the config files:
git diff HEAD upstream/master config
. If so, do it manually, and only where necessary, since you would otherwise likely overwrite your settings and samples.
Step 8: Contribute back
In case you have also changed or added steps, please consider contributing them back to the original repository:
-
Fork the original repo to a personal or lab account.
-
Clone the fork to your local system, to a different place than where you ran your analysis.
-
Copy the modified files from your analysis to the clone of your fork, e.g.,
cp -r workflow path/to/fork
. Make sure to not accidentally copy config file contents or sample sheets. Instead, manually update the example config files if necessary. -
Commit and push your changes to your fork.
-
Create a pull request against the original repository.
Testing
More test cases are in the subfolder
.test
. To run the workflow with one of the test data sets adjust the paths for
samples
,
fastq-data
and
eval-data
in
.test/config/config.yaml
for the desired test dataset and run the script
start_test.sh
. Also note that for the different data sets you may have to adjust the threshold values in the configuration file. Test cases are also automatically executed via continuous integration with
Github Actions
.
Code Snippets
16 17 | script: "../scripts/sim_to_fasta.py" |
28 29 | script: "../scripts/vcf_to_fasta.py" |
40 41 | wrapper: "v1.25.0/bio/samtools/faidx" |
56 57 | shell: "makeblastdb -in {input.fasta_sim} -input_type fasta -dbtype nucl -parse_seqids" |
73 74 | shell: "blastn -query {input.res} -outfmt 6 -db {params.dbname} -out {output.blast}" |
85 86 | shell: "echo -ne \"{params.header}\" | cat - {input} > {output}" |
102 103 | script: "../scripts/plots_blast.R" |
9 10 | wrapper: "v1.25.0/bio/minimap2/index" |
24 25 | wrapper: "v1.25.0/bio/minimap2/aligner" |
37 38 | wrapper: "v1.25.0/bio/samtools/view" |
72 73 | script: "../scripts/noderad_main.py" |
11 12 | wrapper: "v1.25.0/bio/fastqc" |
21 22 | wrapper: "v1.25.0/bio/multiqc" |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 | import sys import gzip import pysam from graph_tool.all import * import numpy as np import graph_operations import likelihood_operations sys.stderr = open(snakemake.log[0], "w") sample = snakemake.wildcards.get('sample') # input bam = snakemake.input.get("bam") reads = snakemake.input.get("fastq") # required output vcf = open(snakemake.output.get("vcf", ""), "w") vcf_header = "#CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILTER\tINFO\tFORMAT\t{}\n".format(sample) vcf.write(vcf_header) # optional output graph_xml = snakemake.output.get("graph_xml", "") output_figure = snakemake.output.get("graph_figure", "") connected_components_xml = snakemake.output.get("connected_components_xml", "") connected_components_figure = snakemake.output.get("connected_components_figure", "") dir_subgraphs = snakemake.output.get("components_subgraphs", "") # params for ploidy, threshold, noise and cluster-size threshold = snakemake.params.get("threshold_max_edit_distance", "") ploidy = snakemake.params.get("ploidy", "") # a diploid chromosome set is determined for the prototype noise_small = snakemake.params.get("treshold_seq_noise_small", "") noise_large = snakemake.params.get("treshold_seq_noise_large", "") cluster_size = snakemake.params.get("treshold_cluster_size", "") # get reads format format = "" # format of reads if reads.endswith((".fastq", ".fq", ".fastq.gz", ".fq.gz")): format = "fastq" if reads.endswith((".fasta", ".fa", ".fasta.gz", ".fa.gz")): format = "fasta" # default value of threshold for maximum distance value if threshold == "": threshold = 23 else: threshold = int(threshold) # default value of noise if not noise_small: noise_small = 1 if not noise_large: noise_large = 1 # init graph graph = Graph(directed=True) # set graph properties for (name, prop, prop_type) in graph_operations.set_properties(): if name.startswith("g_"): vars()[name] = graph.new_graph_property(prop_type) graph.graph_properties[prop] = vars()[name] if name.startswith("v_"): vars()[name] = graph.new_vertex_property(prop_type) graph.vertex_properties[prop] = vars()[name] if name.startswith("e_"): vars()[name] = graph.new_edge_property(prop_type) graph.edge_properties[prop] = vars()[name] # set ploidy, sequencing error and heterozygosity graph.graph_properties["ploidy"] = ploidy graph.graph_properties["ins-error-rates"] = snakemake.params.get("err_ins", "") graph.graph_properties["del-error-rates"] = snakemake.params.get("err_del", "") graph.graph_properties["subst-heterozygosity"] = snakemake.params.get("heterozyg_subst", "") graph.graph_properties["ins-heterozygosity"] = snakemake.params.get("heterozyg_ins", "") graph.graph_properties["del-heterozygosity"] = snakemake.params.get("heterozyg_del", "") # create first empty node for graph node = graph.add_vertex() v_id[node] = "{idx}_{sample}".format(idx=0, sample=sample) v_name[node] = "" v_seq[node] = "" v_q_qual[node] = "" # add reads as vertices of the graph if reads.endswith(".gz"): with gzip.open(reads, "rt") as _reads: graph = graph_operations.set_nodes(graph, _reads, format, sample) else: with open(reads, "rU") as _reads: graph = graph_operations.set_nodes(graph, _reads, format, sample) # add edges from all-vs-all alignment of reads (please see rule minimap2) verbose = pysam.set_verbosity(0) # https://github.com/pysam-developers/pysam/issues/939 bam = pysam.AlignmentFile(bam, "rb") pysam.set_verbosity(verbose) for read in bam.fetch(until_eof=True): graph = graph_operations.set_edges(graph, read, threshold) bam.close() graph.remove_vertex(0) # write log files sys.stderr.write( "graph construction summary for sample {}:" "\n nodes:\t{}\n edges:\t{}\n".format(sample, graph.num_vertices(), graph.num_edges())) graph_operations.save_and_draw_graph(graph, xml_out=graph_xml, figure_out=output_figure) # in case subgraph directory is expected as optional output, but some samples do not produce # subgraphs. This way the empty directories for these samples preserved and are not removed by snakemake. graph_operations.set_dir(dir_subgraphs) # step 1: extract connected components message = "CONNECTED COMPONENTS based on the graph construction from the edit distances (minimap2)" connected_components = graph_operations.get_components(graph, message, snakemake.wildcards.get('sample'), dir_subgraphs, connected_components_xml, connected_components_figure, v_color="component-label") graph = None loc_nr = 0 for (comp, comp_nr) in connected_components: # step 2: likelihood of allele fractions alleles = likelihood_operations.get_candidate_alleles(comp, comp.vertices(), noise_small, noise_large, cluster_size) n = len(alleles) vafs_candidates = list(likelihood_operations.get_candidate_vafs(n, ploidy)) nodes = list([comp.vertex_index[node] for node in comp.vertices()]) read_allele_likelihoods = {} # calculate the likelihood over ALL reads vafs_likelihoods = [likelihood_operations.calc_vafs_likelihood(comp, vafs, nodes, alleles, read_allele_likelihoods) for vafs in vafs_candidates] if not vafs_likelihoods: # case empty list, e.g. if the treshold-seq-noise value is set too large continue max_likelihood_idx = np.argmax(vafs_likelihoods) # obtain ML solution for step 2 max_likelihood_vafs = vafs_candidates[max_likelihood_idx] # write to log file sys.stderr.write( "\n\nStats for component {} in sample {} with {} alleles and ploidy = {}:\n".format(comp_nr, sample, n, ploidy)) sys.stderr.write("\n\tMaximum vafs likelihood:\n") for vaf, allele in zip(max_likelihood_vafs, alleles): sys.stderr.write("\t\t{} for allele {}\n".format(vaf, allele)) # step 3: likelihood of loci given alleles and fractions loci_candidates = list(likelihood_operations.get_candidate_loci(n, ploidy, max_likelihood_vafs)) loci_likelihoods = {} loci_likelihoods = [ likelihood_operations.calc_loci_likelihoods(comp, max_likelihood_vafs, alleles, loci, loci_likelihoods) for loci in loci_candidates ] max_likelihood_idx = np.argmax(loci_likelihoods) max_likelihood_loci = loci_candidates[max_likelihood_idx] # write to log file sys.stderr.write("\n\tMaximum loci likelihood calculations:\n") sys.stderr.write("\t\tloci_likelihoods:\n\t\t\t{}\n\t\tmax_likelihood_idx:\n\t\t\t{}" "\n\t\tmax_likelihood_loci:\n\t\t\t{}\n".format(loci_likelihoods, max_likelihood_idx, max_likelihood_loci)) # step 4: results output to VCF file loci_alleles = likelihood_operations.get_sorted_loci_alleles(alleles, max_likelihood_loci) gt_indices = likelihood_operations.get_gt_indices(alleles, max_likelihood_loci, loci_alleles) for gt_idx_locus in list(set(gt_indices)): vcf.write("{chrom}\t{pos}\t.\t{ref}\t{alt}\t.\t.\t.\tGT\t{gt}\n".format(chrom="LOC{}".format(loc_nr), pos="1", ref=loci_alleles[0], alt=', '.join(loci_alleles[1:]) if len(loci_alleles) > 1 else ".", gt=likelihood_operations.get_genotype(gt_idx_locus))) loc_nr += 1 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 | library("tidyverse") library("stringr") log <- file(snakemake@log[[1]], open="wt") sink(log) sink(log, type="message") blast <- snakemake@input[[1]] blast_data <- read_tsv(blast, col_names = TRUE, trim_ws = TRUE) names(blast_data) <- c('res_id','sim_id','identity', 'len_alignment', 'mismatches', 'gap_opens', 'q_start', 'q_end', 's_start', 's_end', 'evalue', 'bit_score') blast_data <- blast_data %>% mutate(res_id=as.character(res_id)) %>% separate(res_id, c("sample", "individual", "locus"), sep = "\\|", extra="merge", convert=TRUE) #%>% for(i in blast_data$locus) { if(nchar(strsplit(i, "-")[[1]][1])<5){ blast_data$locus[blast_data$locus==i] <- str_replace(blast_data$locus[blast_data$locus==i], 'LOC', 'LOC0') } } plot <- ggplot(data = blast_data, aes(y=locus, x=identity)) + geom_bar(width = 1.0, position = "dodge", stat="identity", aes(fill = identity), colour="Black") + scale_fill_gradient(low="mediumpurple3",high="green3", name = "Identity") + ggtitle("Indentity [%] of loci identified by NodeRAD vs. simulated loci") + xlab("Identity [%]") + ylab("Locus") + theme_minimal() + theme(aspect.ratio = 2.5/1.5, plot.title = element_text(hjust = 0.5), legend.position = "right", legend.key.size = unit(0.8, "cm"), axis.text.y = element_text(hjust = 0)) plot ggsave(snakemake@output[["ident"]], width = 7, height = 7) blast_data$intervals <- cut(blast_data$identity, seq(0,100,by=1)) plot <- ggplot(blast_data, aes(intervals)) + geom_histogram(stat= "count", aes(fill = ..count..)) + xlab("Identity [%]") + ylab("Counts") + scale_fill_gradient(name = "Counts")+ theme(aspect.ratio = 2.5/1.5, plot.title = element_text(hjust = 0.5), legend.position = "right") + ggtitle("Histogram of number of alleles identified by NodeRAD\nwith respect to their similarity to simulated data.") plot ggsave(snakemake@output[["ident_hist"]], width = 7, height = 7) plot <- ggplot(data = blast_data, aes(x=locus, y=bit_score, group = individual)) + geom_line(color = "gray70") + geom_point(aes(color = bit_score), size =3) + geom_point(shape = 1,size = 3, colour = "black") + scale_color_gradient(low="red", high="green", name = "Bit score") + ggtitle("Bitscores of loci identified by NodeRAD vs. simulated loci") + xlab("Locus") + ylab("Bit score") + theme_minimal() + theme(axis.text.x = element_text(color = "black", size = 7, angle = 90, hjust = 0, face = "plain"), plot.title = element_text(hjust = 0.5), legend.position = "right", legend.key.size = unit(0.4, "cm"), axis.text.y = element_text(hjust = 0)) plot ggsave(snakemake@output[["bit_scores"]], width = 7, height = 7) plot <- ggplot(data = blast_data, aes(x=locus, y=evalue, group = individual)) + geom_line(color = "gray70") + geom_point(aes(color = evalue), size =3) + geom_point(shape = 1,size = 3, colour = "black") + scale_color_gradient(low="green", high="red", name = "E-value") + ggtitle("E-Values of loci identified by NodeRAD vs. simulated loci") + xlab("Locus") + ylab("E-Value") + theme_minimal() + theme(axis.text.x = element_text(color = "black", size = 7, angle = 90, hjust = 0, face = "plain"), plot.title = element_text(hjust = 0.5), legend.position = "right", legend.key.size = unit(0.4, "cm"), axis.text.y = element_text(hjust = 0)) plot ggsave(snakemake@output[["evalues"]], width = 7, height = 7) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 | import sys import yaml from itertools import chain import re sys.stderr = open(snakemake.log[0], "w") sample = snakemake.wildcards.get('sample') # input sim_in = snakemake.input.get("sim_data_stats") individual_name=snakemake.params.get("individual") individual = snakemake.params.get("individual").replace("_", " ") fasta = "" # read yaml file yaml_in = open(sim_in, 'r') yaml_in_load = list(yaml.load_all(yaml_in, Loader=yaml.FullLoader)) sim_data_file = yaml_in_load[1] # parse fasta data from yaml for key_loc in sim_data_file.keys(): # loci sequences seq = sim_data_file[key_loc]["p5 seq"] # loci mutations of given individual indiv_data = sim_data_file[key_loc]["individuals"][individual] mutations = list([indiv_data[item]["mutations"] for item in indiv_data]) # modifying the sequence when mutations are present for given individual and locus index = 0 if mutations: for mutation in mutations: if mutation: mut_list = [re.split("@", item)[1] for item in mutation] mut_tuples = [tuple(re.split(":", mut)) for mut in mut_list] for (pos, mut) in mut_tuples: if '(' or ')' in pos: idx = re.split("\(", re.split("\)", pos)[0])[-1] idx = int(pos) # nucleotide position indexing in ddRAGE starts at 0, # but it should start at position 1 https://varnomen.hgvs.org/bg-material/numbering/ if ">" in mut: mut_subs = re.split(">", mut) if seq[idx] == mut_subs[0]: seq = seq[:idx] + mut_subs[-1] + seq[idx + 1:] if "+" in mut: mut_subs = re.split('\+', mut)[-1] seq = seq[:idx] + mut_subs + seq[idx:] if "-" in mut: mut_subs = re.split('\-', mut)[-1] if all([seq[idx] == mut_subs[i] for i in range(len(mut_subs))]): seq = seq[:idx] + seq[idx + len(mut_subs):] # parsing to fasta format fasta_id = ">{}|{}-{}|simulated".format(individual_name, key_loc.replace(" ", "_"), index) fasta += "{}\n{}\n".format(fasta_id, seq) index += 1 else: # parsing to fasta format fasta_id = ">{}|{}|simulated".format(individual_name, key_loc.replace(" ", "_")) fasta += "{}\n{}\n".format(fasta_id, seq) # write fasta file fasta_file = open(snakemake.output.get("fasta_sim", ""), "w") fasta_file.write(fasta) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | import sys import pandas as pd sys.stderr = open(snakemake.log[0], "w") sample = snakemake.wildcards.get('sample') # input vcf_in = pd.read_csv(snakemake.input.get("vcf_data"), sep="\t")[["#CHROM", "REF", "ALT"]] individual = snakemake.params.get("individual").replace(" ", "_") prefix = "{sample}|{indiv}|".format(sample=sample,indiv=individual) fasta = "" for i in range(len(vcf_in["#CHROM"])): name = vcf_in["#CHROM"][i] fasta += ">{prefix}{loc}-REF\n{ref}\n".format(prefix=prefix, loc=name, ref=vcf_in["REF"][i]) variants = str(vcf_in["ALT"][i]).split(",") if not variants[0] == ".": idx = 1 for var in variants: fasta += ">{prefix}{loc}-ALT_{idx}\n{alt}\n".format(prefix=prefix, loc=name, idx=idx, alt=var) idx += 1 # write fasta file fasta_file = open(snakemake.output.get("vcf_fasta", ""), "w") fasta_file.write(fasta) |
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 | __author__ = "Julian de Ruiter" __copyright__ = "Copyright 2017, Julian de Ruiter" __email__ = "julianderuiter@gmail.com" __license__ = "MIT" from os import path import re from tempfile import TemporaryDirectory from snakemake.shell import shell log = snakemake.log_fmt_shell(stdout=True, stderr=True) def basename_without_ext(file_path): """Returns basename of file path, without the file extension.""" base = path.basename(file_path) # Remove file extension(s) (similar to the internal fastqc approach) base = re.sub("\\.gz$", "", base) base = re.sub("\\.bz2$", "", base) base = re.sub("\\.txt$", "", base) base = re.sub("\\.fastq$", "", base) base = re.sub("\\.fq$", "", base) base = re.sub("\\.sam$", "", base) base = re.sub("\\.bam$", "", base) return base # Run fastqc, since there can be race conditions if multiple jobs # use the same fastqc dir, we create a temp dir. with TemporaryDirectory() as tempdir: shell( "fastqc {snakemake.params} -t {snakemake.threads} " "--outdir {tempdir:q} {snakemake.input[0]:q}" " {log}" ) # Move outputs into proper position. output_base = basename_without_ext(snakemake.input[0]) html_path = path.join(tempdir, output_base + "_fastqc.html") zip_path = path.join(tempdir, output_base + "_fastqc.zip") if snakemake.output.html != html_path: shell("mv {html_path:q} {snakemake.output.html:q}") if snakemake.output.zip != zip_path: shell("mv {zip_path:q} {snakemake.output.zip:q}") |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | __author__ = "Tom Poorten" __copyright__ = "Copyright 2017, Tom Poorten" __email__ = "tom.poorten@gmail.com" __license__ = "MIT" from os import path from snakemake.shell import shell from snakemake_wrapper_utils.samtools import infer_out_format from snakemake_wrapper_utils.samtools import get_samtools_opts samtools_opts = get_samtools_opts(snakemake, parse_output=False) extra = snakemake.params.get("extra", "") log = snakemake.log_fmt_shell(stdout=False, stderr=True) sort = snakemake.params.get("sorting", "none") sort_extra = snakemake.params.get("sort_extra", "") out_ext = infer_out_format(snakemake.output[0]) pipe_cmd = "" if out_ext != "PAF": # Add option for SAM output extra += " -a" # Determine which pipe command to use for converting to bam or sorting. if sort == "none": if out_ext != "SAM": # Simply convert to output format using samtools view. pipe_cmd = f"| samtools view -h {samtools_opts}" elif sort in ["coordinate", "queryname"]: # Add name flag if needed. if sort == "queryname": sort_extra += " -n" # Sort alignments. pipe_cmd = f"| samtools sort {sort_extra} {samtools_opts}" else: raise ValueError(f"Unexpected value for params.sort: {sort}") shell( "(minimap2" " -t {snakemake.threads}" " {extra} " " {snakemake.input.target}" " {snakemake.input.query}" " {pipe_cmd}" " > {snakemake.output[0]}" ") {log}" ) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | __author__ = "Tom Poorten" __copyright__ = "Copyright 2017, Tom Poorten" __email__ = "tom.poorten@gmail.com" __license__ = "MIT" from snakemake.shell import shell extra = snakemake.params.get("extra", "") log = snakemake.log_fmt_shell(stdout=True, stderr=True) shell( "(minimap2 -t {snakemake.threads} {extra} " "-d {snakemake.output[0]} {snakemake.input.target}) {log}" ) |
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | __author__ = "Julian de Ruiter" __copyright__ = "Copyright 2017, Julian de Ruiter" __email__ = "julianderuiter@gmail.com" __license__ = "MIT" from os import path from snakemake.shell import shell extra = snakemake.params.get("extra", "") # Set this to False if multiqc should use the actual input directly # instead of parsing the folders where the provided files are located use_input_files_only = snakemake.params.get("use_input_files_only", False) if not use_input_files_only: input_data = set(path.dirname(fp) for fp in snakemake.input) else: input_data = set(snakemake.input) output_dir = path.dirname(snakemake.output[0]) output_name = path.basename(snakemake.output[0]) log = snakemake.log_fmt_shell(stdout=True, stderr=True) shell( "multiqc" " {extra}" " --force" " -o {output_dir}" " -n {output_name}" " {input_data}" " {log}" ) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | __author__ = "Michael Chambers" __copyright__ = "Copyright 2019, Michael Chambers" __email__ = "greenkidneybean@gmail.com" __license__ = "MIT" from snakemake.shell import shell from snakemake_wrapper_utils.samtools import get_samtools_opts samtools_opts = get_samtools_opts( snakemake, parse_threads=False, parse_write_index=False, parse_output_format=False ) extra = snakemake.params.get("extra", "") log = snakemake.log_fmt_shell(stdout=True, stderr=True) shell("samtools faidx {samtools_opts} {extra} {snakemake.input[0]} {log}") |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | __author__ = "Johannes Köster" __copyright__ = "Copyright 2016, Johannes Köster" __email__ = "koester@jimmy.harvard.edu" __license__ = "MIT" from snakemake.shell import shell from snakemake_wrapper_utils.samtools import get_samtools_opts samtools_opts = get_samtools_opts(snakemake) extra = snakemake.params.get("extra", "") region = snakemake.params.get("region", "") log = snakemake.log_fmt_shell(stdout=True, stderr=True, append=True) shell("samtools view {samtools_opts} {extra} {snakemake.input[0]} {region} {log}") |
Support
- Future updates
Related Workflows





