A tool for generating bacterial genomes from metagenomes with nanopore long read sequencing

public 1yr ago 0 bookmarks

View Workflow

Help improve this workflow!

This workflow has been published but could be further improved with some additional meta data:

Keyword(s) in categories input, output, operation, topic

You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .

Lathe

A tool for generating bacterial genomes from metagenomes with Nanopore long read sequencing

Installation

First, install miniconda3

Then install snakemake. This can be done with the following.

conda install snakemake
snakemake --version #please ensure this is >=5.4.3

Next, clone this github directory to some location where it can be stored permanently. Remember to keep it updated with git pull .

git clone https://github.com/elimoss/lathe.git

Instructions to enable cluster execution with SLURM can be found at https://github.com/bhattlab/slurm

Inputs

Alter config.yaml to provide the following:

sample_name : Name of sample and output directory
fast5_dirs_list : text file containing a list of absolute paths to run/fast5/* subfolders containing .fast5 files. A good way to generate this is with find -maxdepth 2 -mindepth 2 fast5_parent_dir > fodn.txt
flowcell : flowcell code, e.g. FLO-MIN106, passed to basecaller
kit : kit code, e.g. SQK-LSK109, passed to basecaller
genome_size : Estimated genome size, e.g. 50m, passed to Canu.
singularity : location (including on the internet) of a singularity image to be used for the workflow. Don't change this.
short_reads : location of short reads to be used for Pilon polishing, or empty quotes for long-read polishing.
use_grid : should Canu execute in distributed mode on a cluster?
grid_options : Extra options for execution on a cluster
canu_args : Extra options for Canu
skip_circularization : Should circularization be omitted from the workflow?

Lathe uses the Flye assembler by default. For Canu, please specify 'canu' for the assembler parameter in the config. For cluster Canu execution, please note: if set to True, you will need to install Canu, e.g. conda install -c conda-forge -c bioconda Canu=1.8 as well as provide any additional required parameters for your job scheduler in the config.yaml file. Please see the example config file. When executing on a cluster, Canu will appear to fail, as the first process does not produce an assembly and instead spawns subsequent jobs on the cluster. Don't worry, just re-run Lathe when the assembly completes.

To execute please run the following. Please note, you must substitute a parent directory containing all of your data and working directories for /labs/ .

snakemake --use-singularity --singularity-args '--bind /labs/ --bind /scratch/ --bind /scg/ ' -s /path/to/lathe/Snakefile \
--configfile path/to/modified_config.yaml --restart-times 0 --keep-going --latency-wait 30
# --profile scg #enable cluster support, highly recommended. See above.

Outputs

The outputs generated by this workflow will look like the following:

samplename/
├── 0.basecall
│ ├── samplename.fq
│ └── nanoplots
├── 1.assemble
│ ├── samplename_merged.fasta
│ ├── samplename_raw_assembly.fa
│ ├── samplename_raw_assembly.fa.amb
│ ├── samplename_raw_assembly.fa.ann
│ ├── samplename_raw_assembly.fa.bwt
│ ├── samplename_raw_assembly.fa.fai
│ ├── samplename_raw_assembly.fa.pac
│ ├── samplename_raw_assembly.fa.paf
│ ├── samplename_raw_assembly.fa.sa
│ ├── assemble_100m (if specified)
│ └── assemble_250m (if specified)
├── 2.polish
│ ├── samplename_polished.corrected.fasta
│ ├── samplename_polished.fasta
│ ├── samplename_polished.fasta.bam
│ ├── samplename_polished.fasta.bam.bai
│ ├── samplename_polished.fasta.fai
│ ├── samplename_polished.fasta.misassemblies.tsv
│ ├── medaka (if specified)
│ ├── pilon (if specified)
│ └── racon (if specified)
├── 3.circularization
│ ├── 1.candidate_genomes
│ ├── 2.circularization
│ ├── 3.circular_sequences #circularized genomes
│ ├── 4.samplename_circularized.corrected.fasta
│ ├── 4.samplename_circularized.fasta
│ ├── 4.samplename_circularized.fasta.bam
│ ├── 4.samplename_circularized.fasta.bam.bai
│ ├── 4.samplename_circularized.fasta.fai
│ └── 4.samplename_circularized.fasta.misassemblies.tsv
└── 5.final
 ├── samplename_final.fa
 └── samplename_final.fa.fai

Code Snippets

import sys
import os
import argparse
import pysam

fastaf = snakemake.input[0]
out = open(snakemake.output[0], 'w')

#knobs
smooth_gap_width = 150000
contig_edge_margin = 150000
min_smoothed_aln_len = 10000
min_aln_len = 5000

#align
print("Aligning...".format(t=str(snakemake.threads)))
os.system("nucmer -p {delta} -b 4000 -l 2000 --maxmatch {fa} {fa}".format( #-t {threads} back to mummer3, no more multithreading : (
	threads = str(snakemake.threads),
	delta = snakemake.params['delta'],
	fa = fastaf
))
os.system("show-coords -T {delta}.delta -L 2000 | sed '1,5d' > coords.tsv".format(
	delta = snakemake.params['delta']
)) #the 1,5d gets rid of the header and identity hit as well
print("Trimming genome...")

max_tiglen = 0

with open('coords.tsv') as coords:
	lines = coords.readlines()
	smoothed_lines = []
	if len(lines) > 0:
		aln_start_line = lines[0].strip().split("\t")
		prev_line = lines[0].strip().split("\t")

		#goal: identify corner-cutting parallel off-diagonals.
	#	print(prev_line)

		for l in lines[1:] + [lines[0]]: #just so we don't miss the last item
			s = l.strip().split("\t")
			#print(s)

			tigname = s[-1] #store the name of the tig.  this repeats nugatorily.

			if int(s[1]) > max_tiglen: #keep track of the largest alignment coordinate found, used as the tig length
				max_tiglen = int(s[1])

			if int(s[0]) > int(s[1]): #ignore inversions
	#				print('ignoring')
				continue

			if int(s[1]) - int(s[0]) < min_aln_len:
		#		print('ignoring')
				continue

			if abs(int(s[0]) - int(prev_line[1])) < smooth_gap_width and \
				abs(int(s[2]) - int(prev_line[3])) < smooth_gap_width:
			#	print("joining")
				pass
			else:
	#				print("terminating")
				newline = aln_start_line
				newline[1] = prev_line[1]
				newline[3] = prev_line[3]
				if int(newline[1]) - int(newline[0]) > min_smoothed_aln_len:
	#				print("storing")
					smoothed_lines.append(newline)
				aln_start_line = s
			prev_line = s


#print("output")
#for s in smoothed_lines:
#	print(s)

if len(smoothed_lines) > 0:
	#is it a corner cutting parallel off-diagonal?
	if int(smoothed_lines[0][0]) < contig_edge_margin and int(smoothed_lines[0][3]) > max_tiglen - contig_edge_margin:
		if int(smoothed_lines[-1][2]) < contig_edge_margin and int(smoothed_lines[-1][1]) > max_tiglen - contig_edge_margin:
			#out.write("It's overcircularized!")

			out.write(tigname + ":" + smoothed_lines[0][0] + '-' + smoothed_lines[-1][0] + "\n")

out.write("done\n")

Python pysam mummer2circos From line 3 of scripts/encircle.py

import fileinput
import os

prev = []
prev_is_terminal = False
verbose = True

margin = snakemake.params['margin']

with open(snakemake.output[0], 'w') as oot:
	for l in open(snakemake.input[0]):
		s = l.strip().split("\t")
		if prev == []:
			next
		if s == ['no circularizations']:
			oot.write("no circularizations\n")
			break

		target_tig = s[9]
		spanner = s[10]

		#test for terminal alignment
		r_start = min([int(i) for i in s[0:2]])
		r_end = max([int(i) for i in s[0:2]])
		q_start = min([int(i) for i in s[2:4]])
		q_end = max([int(i) for i in s[2:4]])

		r_len = int(s[7])
		q_len = int(s[8])

		if r_start < margin or r_end > (r_len - margin): #ref is terminal
			if q_start < margin or q_end > (q_len - margin): #query is terminal
				if prev_is_terminal and spanner == prev[10] and target_tig == prev[9]: #spanned event

					#trim when the center of the spanning contig aligns to both ends of the genome (consistent with overcircularization)
					if q_start < prev_q_end:
						trim_bases = prev_q_end - q_start
			#			oot.write('Need to trim ' + str(trim_bases) + " bases.\n")
						oot.write("{c}:1-{e}\n".format(c = target_tig, e = str(r_len-trim_bases)))

					#extend with spanning contig when it closes a gap
					else:
						insert_bases = [q_start, prev_q_end]
						insert_bases.sort()
						insert_string = spanner + ":" + '-'.join([str(i) for i in insert_bases])
						#print(insert_string)
						oot.write(target_tig + "\n")
						oot.write(insert_string + "\n")

					prev_is_terminal = False
					next

				prev_is_terminal=True
			else:
				prev_is_terminal = False
		else:
			prev_is_terminal = False

		#assign prevs
		prev = s
		prev_r_start = r_start
		prev_r_end   = r_end
		prev_q_start = q_start
		prev_q_end = q_end

		prev_r_len = r_len
		prev_q_len = q_len

	oot.write('done\n')

Python From line 3 of scripts/spancircle.py

shell:
	"ln -s {input} {output}"

SnakeMake From line 52 of master/Snakefile

	shell:
		"guppy_basecaller --cpu_threads_per_caller {threads} -i {params.in_dir} -s {params.out_dir} " +
		"--flowcell {fc} --kit {k}" .format(fc=config['flowcell'], k = config['kit'])

rule basecall_final:
	#Collate called bases into a single fastq file
	input: expand('{{sample}}/0.basecall/raw_calls/{foo}/sequencing_summary.txt', foo = [os.path.basename(f).split('.')[0] for f in fast5_files])
	output: '{sample}/0.basecall/{sample}.fq'
	shell:
		"find {sample}/0.basecall/raw_calls/*/*.fastq | xargs cat > {output}"

SnakeMake GUPPY From line 67 of master/Snakefile

	shell: "NanoPlot --fastq {input} -t {threads} --color green  -o " + "{s}/0.basecall/nanoplots".format(s = sample)

rule assemble_canu:
	#Run canu. This can be run either in distributed fashion on the cluster or in a single job. If run in distributed fashion (usedgrid is set to 'True')
	#then canu must be installed in the user's environment. In this case, the singualarity image is not used. Canu arguments are passed through from
	#the config within the 'canu_args' key. 'gridOptions' are also passed through from the config and instruct Canu on any additional parameters
	#needed to run on the cluster.
	input: rules.basecall_final.output
	output:
		'{sample}/1.assemble/assemble_{genome_size}/{sample}_{genome_size}.contigs.fasta',
		#'{sample}/1.assemble/assemble_{genome_size}/{sample}_{genome_size}.correctedReads.fasta.gz'
	threads: 1
	resources:
		mem=100,
		time=80
	singularity: singularity_image if config['usegrid'] != 'True' and config['usegrid'] != True else '' #switch between image and local environment for canu depending on whether cluster execution is required
	shell:
		"canu -p {sample}_{wildcards.genome_size} -d {sample}/1.assemble/assemble_{wildcards.genome_size}/ -nanopore-raw {input} " +
		config['canu_args'] +
		" stopOnReadQuality=false genomeSize={wildcards.genome_size} " +
		"useGrid={grid} gridOptions='{opts}'".format(
			grid = config['usegrid'],
			opts = config['grid_options']
		)

rule assemble_flye:
	input: rules.basecall_final.output
	output:
		"{sample}/1.assemble/assemble_{genome_size}/assembly.fasta"

SnakeMake NanoPlot CANU From line 87 of master/Snakefile

shell:
	"flye -t {threads} --nano-raw {input} --meta -o {wildcards.sample}/1.assemble/assemble_{wildcards.genome_size}/ -g {wildcards.genome_size}"

SnakeMake Flye From line 121 of master/Snakefile

shell:
	"""
    bedtools makewindows -g {input[1]} -w {params.window_width} | join - {input[1]} | tr ' ' '\t' | \
    cut -f1-4 | awk '{{if ($2 > {params.window_width} && $3 < $4 - {params.window_width} && $4 > {params.min_tig_size}) print $0}}' | \
    xargs -P 16 -l bash -c '
     htsbox samview {input[2]} $0:$1-$1 -p | \
     cut -f8,9 | awk "{{if (\$1 < $1 - ({params.window_width}/2) && \$2 > $1 + ({params.window_width}/2)) print \$0}}" | wc -l | \
     paste <(echo $0) <(echo $1) - ' | awk '{{if ($3 < 2) print $0}}
    ' > {output}
	"""

SnakeMake BEDTools From line 143 of master/Snakefile

shell:
	"""
	if [ -s {input[0]} ] #if the input is nonempty...
	then
		cat {input[0]} | grep -v ^# | sort -k1,1 -k2,2g | join - <(cat {input[1]} | sort -k1,1 -k2,2g) | \
		awk '{{
			if ($1 == prev_tig){{
			 print($1,prev_coord,$2)
			 }}
			else{{
				if (prev_len > 0){{
					print(prev_tig,prev_coord,prev_len)
				}}
				print($1,"1",$2)
				}}
			prev_tig = $1
			prev_coord = $2
			prev_len = $4
			}}
			END {{ print(prev_tig,prev_coord,prev_len)
		}}' | sed "s/\(.*\) \(.*\)\ \(.*\)/\\1:\\2-\\3/g" |  xargs samtools faidx {input[2]} \
		| cut -f1 -d ':' | awk '(/^>/ && s[$0]++){{$0=$0\"_\"s[$0]}}1;' > {output[0]}

		cut -f1 {input[0]} > {sample}/{wildcards.sequence}.tigs.toremove
		grep -vf {sample}/{wildcards.sequence}.tigs.toremove {input[1]} | cut -f1 | xargs samtools faidx {input[2]} >> {output[0]}
		rm {sample}/{wildcards.sequence}.tigs.toremove
	else
		cp {input[2]} {output}
	fi
	"""

SnakeMake SAMtools From line 165 of master/Snakefile

shell:
	"merge_wrapper.py {input} -ml 10000 -c 5 -hco 10; mv merged_out.fasta {output}"

SnakeMake From line 210 of master/Snakefile

shell:
	"ln -s ../../{input} {output}"

SnakeMake From line 216 of master/Snakefile

shell: "sort -k2,2gr {input[1]} | awk '{{if ($2 > {wildcards.contig_cutoff}) print $1}}' | xargs samtools faidx {input[0]} > {output}"

SnakeMake SAMtools From line 232 of master/Snakefile

shell:
	"cat {input} | cut -f1 -d '_' | fold -w 120 | awk '(/^>/ && s[$0]++){{$0=$0\"_\"s[$0]}}1;' > {output}"

SnakeMake From line 247 of master/Snakefile

shell:
	"""
	racon -m 8 -x -6 -g -8 -w 500 -t {threads} {input} > {output}
	"""

SnakeMake Racon From line 294 of master/Snakefile

shell:
	"""
	mkdir -p {output}
	cut -f1 {input[1]} | xargs -n 1 -I foo sh -c 'touch {output[0]}/foo; samtools faidx {input[0]} foo > {output[1]}/foo.fa'
	"""

SnakeMake SAMtools From line 307 of master/Snakefile

shell:
	"""
	medaka consensus {input[1]} {output} --model r941_flip213 --threads {threads} --regions {wildcards.range}
	"""

SnakeMake Consensus From line 325 of master/Snakefile

shell:
	"""
	medaka stitch {input[0]} {input[1]} {output}
	"""

SnakeMake From line 342 of master/Snakefile

shell:
	"""
	cat {params.indir} | cut -f1 -d ':' > {output}
	"""

SnakeMake From line 366 of master/Snakefile

shell:
	"bwa index {input[0]}; bwa mem -t {threads} {input} | samtools sort --threads {threads} > {output}"

SnakeMake SAMtools BWA From line 385 of master/Snakefile

shell:
	"""
	mkdir {output}
	bedtools makewindows -w 100000 -g {input[1]} | awk '{{print $1,\":\", $2+ 1, \"-\", $3}}'  | tr -d ' ' |
	xargs -n 1 -I foo touch {output}/foo
	"""

SnakeMake BEDTools From line 397 of master/Snakefile

	shell:
		"""
		# set env var $i to be the smallest read subset decimal (in increments of 0.1, with a couple very low values thrown in, too)
    	# sufficient to generate at least 40x coverage depth of the target sequence, or
		# 1 if 40x coverage cannot be achieved with the available read data

		for i in 0.01 0.05 $(seq 0.1 0.1 1);
		do
		   cov=$(samtools view {input[1]} -s $i -h {wildcards.range} | samtools depth - | cut -f3 | awk '{{sum+=$1}}END{{print sum/(NR+1)}}')
		   if [ $(echo $cov'>'{params.target_coverage}|bc) -eq 1 ]
		   then
		       break
		   fi
		done
		echo Using $i x of total coverage;

		samtools view -h -O BAM -s $i {input[1]} {wildcards.range} > {params.bam}
		samtools index {params.bam}
		samtools faidx {input[0]} $(echo {wildcards.range}| cut -f1 -d ':') | cut -f1 -d ':' > {params.fa}
		java -Xmx{params.java_mem}G -jar $(which pilon | sed 's/\/pilon//g')/../share/pilon*/pilon*.jar \
			--genome {params.fa} --targets {wildcards.range} \
			--unpaired {params.bam} --output {sample}_{wildcards.range} --outdir {params.subrun_folder} \
			--vcf --nostrays --mindepth 1
		bgzip {params.subrun_folder}/{sample}_{wildcards.range}.vcf
		tabix -fp vcf {params.subrun_folder}/{sample}_{wildcards.range}.vcf.gz
		"""

SnakeMake SAMtools tabix pilon From line 425 of master/Snakefile

shell:
	"""
	#workaround!  Snakemake was causing the vcf's to appear newer than the indices, which tabix didn't like
	touch {sample}/2.polish/pilon/sub_runs/*/*.vcf.gz.tbi

	#get properly sorted intervals
	ls {sample}/2.polish/pilon/ranges/ | tr ':-' '\t' | sort -k1,1 -k2,2g | awk '{{print $1,":",$2,"-",$3}}' | tr -d ' ' > sorted_ranges.tmp

	#get header
	(zcat {sample}/2.polish/pilon/sub_runs/*/*.vcf.gz | head -1000 | grep ^#

	#get corrections within each range (omitting DUP records, which bcftools can't understand)
	cat sorted_ranges.tmp | xargs -n 1 -I foo sh -c "
	tabix {sample}/2.polish/pilon/sub_runs/foo/{sample}_foo.vcf.gz foo | grep -v '0/0' | grep -v DUP" |

	#sort by position
	sort -k1,1 -k2,2g) |

	#compress and store
	bgzip > {output} || true

	#index
	tabix -p vcf {output}
	"""

SnakeMake BCFtools tabix From line 475 of master/Snakefile

shell:
	"""
	bcftools consensus -f {input} -o {output}
	"""

SnakeMake BCFtools Consensus From line 508 of master/Snakefile

shell:
	"""
	cut -f1 -d ':' {input} > {output}
	"""

SnakeMake From line 518 of master/Snakefile

shell:
	"""
	#mkdir {output}
	cat {input[1]} | awk '{{if ($2 > {params.min_size}) print $1}}' | xargs -n 1 -I foo sh -c "
		samtools faidx {input[0]} foo > {sample}/3.circularization/1.candidate_genomes/foo.fa
	"
	"""

SnakeMake SAMtools From line 535 of master/Snakefile

shell:
	"""
	(samtools idxstats {input[0]} | grep {wildcards.tig} | awk '{{if ($2 > 50000) print $1, ":", $2-50000, "-", $2; else print $1, ":", 1, "-", $2 }}' | tr -d ' ';
	 samtools idxstats {input[0]} | grep {wildcards.tig} | awk '{{if ($2 > 50000) print $1, ":", 1, "-", 50000; else print $1, ":", 1, "-", $2 }}' | tr -d ' ') |
	xargs -I foo sh -c 'samtools view -h {input[0]} foo | samtools fastq - || true' | paste - - - - | sort | uniq | tr '\t' '\n' | bgzip > {output}
	"""

SnakeMake SAMtools From line 552 of master/Snakefile

shell:
	"""
	flye -t {threads} --nano-raw {input} -o {params.directory} -g 1m
	"""

SnakeMake Flye From line 572 of master/Snakefile

shell:
	"""
	nucmer -b 5000 {input[0]} {input[1]} -p {params.directory}/{params.prefix} #-t {threads} #reverted nucmer back down to 3, no more multithreading :(

	delta-filter -q {params.directory}/{params.prefix}.delta > {params.directory}/{params.prefix}.filt.delta

	show-coords -Tq {params.directory}/{params.prefix}.filt.delta | cut -f8,9 | sed '1,3d' | sort | \
	uniq -c | tr -s ' ' '\\t' | cut -f2-99 | grep -v ^1 | cut -f2,3 > {params.directory}/potential_circularizations.tsv || true

	show-coords -Tql {params.directory}/{params.prefix}.filt.delta | grep -f {params.directory}/potential_circularizations.tsv | cat > {output} || true
	"""

SnakeMake mummer2circos From line 600 of master/Snakefile

script:
	"scripts/spancircle.py"

SnakeMake From line 619 of master/Snakefile

run:
	span_out = open(input[0], 'r').readlines()
	cmd = ''
	if span_out == ['done\n'] or span_out[0].strip() == 'no circularizations': #no circularization occurred
		print('Nothing to do')
	else:
		trim = span_out[0].strip()
		trim_cmd = 'samtools faidx ' + input[1] + ' ' + trim + " > " + params.outfa + "\n"
		cmd += trim_cmd

		if len(span_out) == 3:
			extend = span_out[1].strip()
			extend_cmd = 'samtools faidx ' + input[3] + ' ' + extend + " | grep -v '>'" + " >> " + params.outfa + "\n"

SnakeMake SAMtools From line 634 of master/Snakefile

script:
	"scripts/encircle.py"

SnakeMake From line 671 of master/Snakefile

run:
	span_out = open(input[0], 'r').readlines()
	cmd = ''
	if span_out == ['done\n']: #no circularization occurred
		print('No over-circularization')
	else:
		trim = span_out[0].strip()
		trim_cmd = 'samtools faidx ' + input[1] + ' ' + trim + " > " + params.outfa + "\n"
		cmd += trim_cmd

	open(output[0], 'w').write(cmd + '\n')

SnakeMake SAMtools From line 686 of master/Snakefile

shell:
	"""
	find {sample}/3.circularization/3.circular_sequences/sh/ -type f | xargs cat | bash
	ls {sample}/3.circularization/3.circular_sequences/ | grep .fa$ | cut -f1-2 -d '_' > circs.tmp || true
	(cat {input[1]} | grep -vf circs.tmp |
	cut -f1 | xargs samtools faidx {input[0]}; ls {sample}/3.circularization/3.circular_sequences/* | grep .fa$ | xargs cat) |
	tr -d '\\n' | sed 's/\\(>[contigscaffold_]*[0-9]*\\)/\\n\\1\\n/g' | fold -w 120 | cut -f1 -d ':' | grep -v '^$' > {output} || true
	rm circs.tmp
	"""

SnakeMake SAMtools From line 719 of master/Snakefile

shell: "cp {input} {output}"

SnakeMake From line 743 of master/Snakefile

shell:
	"minimap2 -t {threads} -x map-ont {input} > {output}"

SnakeMake Minimap2 From line 757 of master/Snakefile

shell:
	"minimap2 -t {threads} -ax map-ont {input} | samtools sort --threads {threads} > {output}"

SnakeMake SAMtools Minimap2 From line 772 of master/Snakefile

shell:
	"samtools index {input}"

SnakeMake SAMtools From line 782 of master/Snakefile

shell: "samtools faidx {input}"

SnakeMake SAMtools From line 790 of master/Snakefile

ShowHide 25 more snippets with no or duplicated tags.

Comments

Support

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Created: 1yr ago

Updated: 1yr ago

Maitainers: public

URL: https://github.com/elimoss/lathe

Name: lathe

Version: 1

Badge:

Insert copied code into your website to add a link to this workflow.

Other Versions:

License: MIT License

Keywords:

mummer2circos tabix BCFtools BEDTools BWA CANU Consensus Flye GUPPY Medaka Minimap2 NanoPlot pilon Racon SAMtools Snakemake pysam

Future updates

Related Workflows

psychip_snakemake — Show Details View Workflow

ENCODE pipeline for histone marks developed for the psychENCODE project

public

psychip pipeline is an improved version of the ENCODE pipeline for histone marks developed for the psychENCODE project. The o...

raw sequence reads Alignment Sequence alignment report macs2 ucsc-bedclip bedGraphToBigWig BEDTools BWA Picard SAMtools Snakemake

Free

Near-real time tracking of SARS-CoV-2 in Connecticut

public

Repository containing scripts to perform near-real time tracking of SARS-CoV-2 in Connecticut using genomic data. This pipeli...

JSON nextclade Augur Biopython FOCUS Pandas Snakemake bs4 epiweeks geopy matplotlib numpy pycountry pycountry-convert uszipcode

Free

cellranger-snakemake-gke — Show Details View Workflow

snakemake workflow to run cellranger on a given bucket using gke.

public

A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...

macs2 ucsc-bedclip bedGraphToBigWig BEDTools BWA Picard SAMtools Snakemake

Free

ATLAS - Three commands to start analyzing your metagenome data

public

Metagenome-atlas is a easy-to-use metagenomic pipeline based on snakemake. It handles all steps from QC, Assembly, Binning, t...

raw sequence reads Genome assembly Annotation track checkm2 gunc prodigal snakemake-wrapper-utils MEGAHIT Atlas BBMap Biopython BioRuby Bwa-mem2 cd-hit CheckM DAS Diamond eggNOG-mapper v2 MetaBAT 2 Minimap2 MMseqs MultiQC Pandas Picard pyfastx SAMtools SemiBin Snakemake SPAdes SqueezeMeta TADpole VAMB CONCOCT ete3 gtdbtk h5py networkx numpy plotly psutil utils metagenomics

Free

175

rna-seq-star-deseq2 — Show Details View Workflow

RNA-seq workflow using STAR and DESeq2

public

This workflow performs a differential gene expression analysis with STAR and Deseq2. The usage of this workflow is described ...

Free

dna-seq-gatk-variant-calling — Show Details View Workflow

This Snakemake pipeline implements the GATK best-practices workflow

public

This Snakemake pipeline implements the GATK best-practices workflow for calling small germline variants. The usage of thi...

VCF raw sequence reads Variant calling genetic variants gatk rust-bio-tools snakemake-wrapper-utils tabix BCFtools BWA FastQC MultiQC Pandas Picard SAMtools Snakemake Trimmomatic Variant Effect Predictor (VEP) common matplotlib numpy seaborn DNA

Free