A tool for generating bacterial genomes from metagenomes with nanopore long read sequencing

public 1yr ago Version: 2 0 bookmarks

View Workflow

Lathe

A tool for generating bacterial genomes from metagenomes with Nanopore long read sequencing

Installation

First, install miniconda3

Then install snakemake. This can be done with the following.

conda install snakemake
snakemake --version #please ensure this is >=5.4.3

Next, clone this github directory to some location where it can be stored permanently. Remember to keep it updated with git pull .

git clone https://github.com/elimoss/lathe.git

Instructions to enable cluster execution with SLURM can be found at https://github.com/bhattlab/slurm .

Typical installation time: 5-10 minutes.

Change as of 2021-02-03

Lathe has been adapted to run on multiple samples simultaneously, instead of one sample per snakemake command. The pipeline can now take in either .fast5 raw data from a nanopore run, or basecalled fastq files. The config file has been changed to reflect this. You now provide sample information and datasets in a tab-delimited file, and indicate this file in the config.yaml . Provide this file to the file_names_txt argument in the configfile.

file_names_txt is a 2 or 3 column tsv with the following columns
SAMPLE_NAME FAST5/FASTQ_READS SHORT_READS_1,SHORT_READS_2
sample name in the first column will be used to name ouptut
the second column can be a directory containing fast5 files (the output of a nanopore run)
 -OR- a single fastq file containing basecalled data 
Optionally, a short read sequencing dataset can be provided in the third column, 
 with pairs separated by a comma. If this option is selected, short read
 polishing will be turned on.

Inputs

Alter config.yaml to provide the following:

file_names_txt : Tab delimited file describing sample names and input datasets. See config.yaml for a description.
flowcell : flowcell code, e.g. FLO-MIN106, passed to basecaller
kit : kit code, e.g. SQK-LSK109, passed to basecaller
genome_size : Estimated genome size, e.g. 50m, passed to Canu.
singularity : location (including on the internet) of a singularity image to be used for the workflow. Don't change this.
use_grid : should Canu execute in distributed mode on a cluster?
grid_options : Extra options for execution on a cluster
canu_args : Extra options for Canu
skip_circularization : Should circularization be omitted from the workflow?

Lathe uses the Flye assembler by default. For Canu, please specify 'canu' for the assembler parameter in the config. For cluster Canu execution, please note: if set to True, you will need to install Canu, e.g. conda install -c conda-forge -c bioconda Canu=1.8 as well as provide any additional required parameters for your job scheduler in the config.yaml file. Please see the example config file. When executing on a cluster, Canu will appear to fail, as the first process does not produce an assembly and instead spawns subsequent jobs on the cluster. Don't worry, just re-run Lathe when the assembly completes.

To execute please run the following. Please note, you must substitute a parent directory containing all of your data and working directories for /labs/ .

snakemake --use-singularity --singularity-args '--bind /labs/,/scg/,/home/ ' -s /path/to/lathe/Snakefile \
--configfile path/to/modified_config.yaml --restart-times 0 --keep-going --latency-wait 30
# --profile scg #enable cluster support, highly recommended. See above.

Outputs

The outputs generated by this workflow will look like the following:

samplename/
├── 0.basecall
│ ├── samplename.fq
│ └── nanoplots
├── 1.assemble
│ ├── samplename_merged.fasta
│ ├── samplename_raw_assembly.fa
│ ├── samplename_raw_assembly.fa.amb
│ ├── samplename_raw_assembly.fa.ann
│ ├── samplename_raw_assembly.fa.bwt
│ ├── samplename_raw_assembly.fa.fai
│ ├── samplename_raw_assembly.fa.pac
│ ├── samplename_raw_assembly.fa.paf
│ ├── samplename_raw_assembly.fa.sa
│ ├── assemble_100m (if specified)
│ └── assemble_250m (if specified)
├── 2.polish
│ ├── samplename_polished.corrected.fasta
│ ├── samplename_polished.fasta
│ ├── samplename_polished.fasta.bam
│ ├── samplename_polished.fasta.bam.bai
│ ├── samplename_polished.fasta.fai
│ ├── samplename_polished.fasta.misassemblies.tsv
│ ├── medaka (if specified)
│ ├── pilon (if specified)
│ └── racon (if specified)
├── 3.circularization
│ ├── 1.candidate_genomes
│ ├── 2.circularization
│ ├── 3.circular_sequences #circularized genomes
│ ├── 4.samplename_circularized.corrected.fasta
│ ├── 4.samplename_circularized.fasta
│ ├── 4.samplename_circularized.fasta.bam
│ ├── 4.samplename_circularized.fasta.bam.bai
│ ├── 4.samplename_circularized.fasta.fai
│ └── 4.samplename_circularized.fasta.misassemblies.tsv
└── 5.final
 ├── samplename_final.fa
 └── samplename_final.fa.fai

Tutorial

The tutorial can be run using the provided config file and input data within the tutorial folder. This tutorial uses pre-basecalled long read data (to reduce total file sizes) and performs assembly with Flye and short read polishing. To reduce runtime, this tutorial skips basecalling, long read polishing, and circularization steps. With cluster execution enabled, this tutorial should be completed in under 6 hours. Successful completion will be indicated by the presence of a atcc_tutorial_final.fa file in the 5.final directory. To run the tutorial:

unzip the short read (tutorial/inputdata/atcc_100000_sr.fastq.gz) and long read (tutorial/atcc_tutorial/0.basecall/atcc_tutorial.fq.gz) data
edit the config file to provide the absolute path to the short read input data (atcc_100000_sr.fastq)
run Lathe using the command:

snakemake --use-singularity --singularity-args '--bind /yourrootdirectories/ ' -s /path/to/lathe/Snakefile \
--configfile path/to/config_nobasecalling.yaml --restart-times 0 --keep-going --latency-wait 30
# --profile clusterconfiguration #enable cluster support, highly recommended. See above.

Code Snippets

import sys
import os
import argparse
import pysam

fastaf = snakemake.input[0]
out = open(snakemake.output[0], 'w')

#knobs
smooth_gap_width = 150000
contig_edge_margin = 150000
min_smoothed_aln_len = 10000
min_aln_len = 5000

#align
print("Aligning...".format(t=str(snakemake.threads)))
os.system("nucmer -p {delta} -b 4000 -l 2000 --maxmatch {fa} {fa}".format( #-t {threads} back to mummer3, no more multithreading : (
	threads = str(snakemake.threads),
	delta = snakemake.params['delta'],
	fa = fastaf
))
os.system("show-coords -T {delta}.delta -L 2000 | sed '1,5d' > coords.tsv".format(
	delta = snakemake.params['delta']
)) #the 1,5d gets rid of the header and identity hit as well
print("Trimming genome...")

max_tiglen = 0

with open('coords.tsv') as coords:
	lines = coords.readlines()
	smoothed_lines = []
	if len(lines) > 0:
		aln_start_line = lines[0].strip().split("\t")
		prev_line = lines[0].strip().split("\t")

		#goal: identify corner-cutting parallel off-diagonals.
	#	print(prev_line)

		for l in lines[1:] + [lines[0]]: #just so we don't miss the last item
			s = l.strip().split("\t")
			#print(s)

			tigname = s[-1] #store the name of the tig.  this repeats nugatorily.

			if int(s[1]) > max_tiglen: #keep track of the largest alignment coordinate found, used as the tig length
				max_tiglen = int(s[1])

			if int(s[0]) > int(s[1]): #ignore inversions
	#				print('ignoring')
				continue

			if int(s[1]) - int(s[0]) < min_aln_len:
		#		print('ignoring')
				continue

			if abs(int(s[0]) - int(prev_line[1])) < smooth_gap_width and \
				abs(int(s[2]) - int(prev_line[3])) < smooth_gap_width:
			#	print("joining")
				pass
			else:
	#				print("terminating")
				newline = aln_start_line
				newline[1] = prev_line[1]
				newline[3] = prev_line[3]
				if int(newline[1]) - int(newline[0]) > min_smoothed_aln_len:
	#				print("storing")
					smoothed_lines.append(newline)
				aln_start_line = s
			prev_line = s


#print("output")
#for s in smoothed_lines:
#	print(s)

if len(smoothed_lines) > 0:
	#is it a corner cutting parallel off-diagonal?
	if int(smoothed_lines[0][0]) < contig_edge_margin and int(smoothed_lines[0][3]) > max_tiglen - contig_edge_margin:
		if int(smoothed_lines[-1][2]) < contig_edge_margin and int(smoothed_lines[-1][1]) > max_tiglen - contig_edge_margin:
			#out.write("It's overcircularized!")

			out.write(tigname + ":" + smoothed_lines[0][0] + '-' + smoothed_lines[-1][0] + "\n")

out.write("done\n")

Python pysam mummer2circos From line 3 of scripts/encircle.py

import fileinput
import os

prev = []
prev_is_terminal = False
verbose = True

margin = snakemake.params['margin']

with open(snakemake.output[0], 'w') as oot:
	for l in open(snakemake.input[0]):
		s = l.strip().split("\t")
		if prev == []:
			next
		if s == ['no circularizations']:
			oot.write("no circularizations\n")
			break

		target_tig = s[9]
		spanner = s[10]

		#test for terminal alignment
		r_start = min([int(i) for i in s[0:2]])
		r_end = max([int(i) for i in s[0:2]])
		q_start = min([int(i) for i in s[2:4]])
		q_end = max([int(i) for i in s[2:4]])

		r_len = int(s[7])
		q_len = int(s[8])

		if r_start < margin or r_end > (r_len - margin): #ref is terminal
			if q_start < margin or q_end > (q_len - margin): #query is terminal
				if prev_is_terminal and spanner == prev[10] and target_tig == prev[9]: #spanned event

					#trim when the center of the spanning contig aligns to both ends of the genome (consistent with overcircularization)
					if q_start < prev_q_end:
						trim_bases = prev_q_end - q_start
			#			oot.write('Need to trim ' + str(trim_bases) + " bases.\n")
						oot.write("{c}:1-{e}\n".format(c = target_tig, e = str(r_len-trim_bases)))

					#extend with spanning contig when it closes a gap
					else:
						insert_bases = [q_start, prev_q_end]
						insert_bases.sort()
						insert_string = spanner + ":" + '-'.join([str(i) for i in insert_bases])
						#print(insert_string)
						oot.write(target_tig + "\n")
						oot.write(insert_string + "\n")

					prev_is_terminal = False
					next

				prev_is_terminal=True
			else:
				prev_is_terminal = False
		else:
			prev_is_terminal = False

		#assign prevs
		prev = s
		prev_r_start = r_start
		prev_r_end   = r_end
		prev_q_start = q_start
		prev_q_end = q_end

		prev_r_len = r_len
		prev_q_len = q_len

	oot.write('done\n')

Python From line 3 of scripts/spancircle.py

shell:
    "ln -s {input} {output}"

SnakeMake From line 91 of master/Snakefile

    shell:
        "guppy_basecaller --cpu_threads_per_caller {threads} -i {params.in_dir} -s {params.out_dir} " +
        "--flowcell {fc} --kit {k}" .format(fc=config['flowcell'], k = config['kit'])

rule basecall_final:
    #Collate called bases into a single fastq file
    input: lambda wildcards: expand('{{sample}}/0.basecall/raw_calls/{foo}/sequencing_summary.txt', foo = [os.path.basename(f).split('.')[0] for f in fast5_files_dict[wildcards.sample]])
    output: '{sample}/0.basecall/{sample}.fq'
    params:
        sample="{sample}"

SnakeMake GUPPY From line 106 of master/Snakefile

shell:
    "find {params.sample}/0.basecall/raw_calls/*/*.fastq | xargs cat > {output}"

SnakeMake From line 116 of master/Snakefile

shell: "NanoPlot --fastq {input} -t {threads} --color green  -o  {wildcards.sample}/0.basecall/nanoplots"

SnakeMake NanoPlot From line 128 of master/Snakefile

    shell:
        "canu -p {sample}_{wildcards.genome_size} -d {sample}/1.assemble/assemble_{wildcards.genome_size}/ -nanopore-raw {input} " +
        config['canu_args'] +
        " genomeSize={wildcards.genome_size} " + #  stopOnReadQuality=false
        "useGrid={grid} gridOptions='{opts}'".format(
            grid = config['usegrid'],
            opts = config['grid_options']
        )

rule assemble_flye:
    input: lambda wildcards: fq_files_dict[wildcards.sample]
    output:
        "{sample}/1.assemble/assemble_{genome_size}/assembly.fasta"

SnakeMake CANU From line 145 of master/Snakefile

shell:
    "flye -t {threads} --meta --nano-raw {input} -o {wildcards.sample}/1.assemble/assemble_{wildcards.genome_size}/ -g {wildcards.genome_size}"

SnakeMake Flye From line 163 of master/Snakefile

shell: """
    bedtools makewindows -g {input[1]} -w {params.window_width} | join - {input[1]} | tr ' ' '\t' | \
    cut -f1-4 | awk '{{if ($2 > {params.window_width} && $3 < $4 - {params.window_width} && $4 > {params.min_tig_size}) print $0}}' | \
    xargs -P 16 -l bash -c '
     htsbox samview {input[2]} $0:$1-$1 -p | \
     cut -f8,9 | awk "{{if (\$1 < $1 - ({params.window_width}/2) && \$2 > $1 + ({params.window_width}/2)) print \$0}}" | wc -l | \
     paste <(echo $0) <(echo $1) - ' | awk '{{if ($3 < 2) print $0}}
    ' > {output}
    """

SnakeMake BEDTools From line 185 of master/Snakefile

shell: """
    if [ -s {input[0]} ] #if the input is nonempty...
    then
        cat {input[0]} | grep -v ^# | sort -k1,1 -k2,2g | join - <(cat {input[1]} | sort -k1,1 -k2,2g) | \
        awk '{{
            if ($1 == prev_tig){{
             print($1,prev_coord,$2)
             }}
            else{{
                if (prev_len > 0){{
                    print(prev_tig,prev_coord,prev_len)
                }}
                print($1,"1",$2)
                }}
            prev_tig = $1
            prev_coord = $2
            prev_len = $4
            }}
            END {{ print(prev_tig,prev_coord,prev_len)
        }}' | sed "s/\(.*\) \(.*\)\ \(.*\)/\\1:\\2-\\3/g" |  xargs samtools faidx {input[2]} \
        | cut -f1 -d ':' | awk '(/^>/ && s[$0]++){{$0=$0\"_\"s[$0]}}1;' > {output[0]}

        cut -f1 {input[0]} > {wildcards.sample}/{wildcards.sequence}.tigs.toremove

        # DM: temp fix to skip misassembly if formatting fails
        x=$(cat {input[1]} | wc -l)
        if [ $x -gt 1 ]
        then
            grep -vf {wildcards.sample}/{wildcards.sequence}.tigs.toremove {input[1]} | cut -f1 | xargs samtools faidx {input[2]} >> {output[0]}
            rm {wildcards.sample}/{wildcards.sequence}.tigs.toremove
        else
            cp {input[2]} {output}
            echo "Misassembly error: skipping misassembly removal"
        fi

    else
        cp {input[2]} {output}
    fi
    """

SnakeMake SAMtools From line 206 of master/Snakefile

shell:
    "merge_wrapper.py {input} -ml 10000 -c 5 -hco 10; mv merged_out.fasta {output}"

SnakeMake From line 260 of master/Snakefile

shell:
    "ln -s ../../{input} {output}"

SnakeMake From line 266 of master/Snakefile

shell: "sort -k2,2gr {input[1]} | awk '{{if ($2 > {params.min_contig_size}) print $1}}' | xargs samtools faidx {input[0]} > {output}"

SnakeMake SAMtools From line 285 of master/Snakefile

shell:
    "cat {input} | cut -f1 -d '_' | fold -w 120 | awk '(/^>/ && s[$0]++){{$0=$0\"_\"s[$0]}}1;' > {output}"

SnakeMake From line 300 of master/Snakefile

shell: """
    racon -m 8 -x -6 -g -8 -w 500 -t {threads} {input} > {output}
    """

SnakeMake Racon From line 350 of master/Snakefile

shell: """
    mkdir -p {output}
    cut -f1 {input[1]} | xargs -n 1 -I foo sh -c 'touch {output[0]}/foo; samtools faidx {input[0]} foo > {output[1]}/foo.fa'
    """

SnakeMake SAMtools From line 362 of master/Snakefile

shell: """
    medaka consensus {input[1]} {output} --model r941_flip213 --threads {threads} --regions {wildcards.range}
    """

SnakeMake Consensus From line 379 of master/Snakefile

shell: """
    medaka stitch {input[0]} {input[1]} {output}
    """

SnakeMake From line 395 of master/Snakefile

shell: """
    cat {params.indir} | cut -f1 -d ':' > {output}
    """

SnakeMake From line 418 of master/Snakefile

shell:
    "bwa index {input.ref}; bwa mem -t {threads} {input.ref} {input.reads} | \
        samtools sort --threads {threads} > {output}"

SnakeMake SAMtools BWA From line 434 of master/Snakefile

shell: """
    mkdir {output}
    bedtools makewindows -w 100000 -g {input[1]} | awk '{{print $1,\":\", $2+ 1, \"-\", $3}}'  | tr -d ' ' |
    xargs -n 1 -I foo touch {output}/foo
    """

SnakeMake BEDTools From line 447 of master/Snakefile

shell: """
    # set env var $i to be the smallest read subset decimal (in increments of 0.1, with a couple very low values thrown in, too)
    # sufficient to generate at least 40x coverage depth of the target sequence, or
    # 1 if 40x coverage cannot be achieved with the available read data

    for i in 0.01 0.05 $(seq 0.1 0.1 1);
    do
       cov=$(samtools view {input[1]} -s $i -h {params.myrange} | samtools depth - | cut -f3 | awk '{{sum+=$1}}END{{print sum/(NR+1)}}')
       if [ $(echo $cov'>'{params.target_coverage}|bc) -eq 1 ]
       then
           break
       fi
    done
    echo Using $i x of total coverage;

    samtools view -h -O BAM -s $i {input[1]} {params.myrange} > {params.bam}
    samtools index {params.bam}
    samtools faidx {input[0]} $(echo {params.myrange}| cut -f1 -d ':') | cut -f1 -d ':' > {params.fa}
    java -Xmx{params.java_mem}G -jar $(which pilon | sed 's/\/pilon//g')/../share/pilon*/pilon*.jar \
        --genome {params.fa} --targets {params.myrange} \
        --unpaired {params.bam} --output {params.sample}_{params.myrange} --outdir {params.subrun_folder} \
        --vcf --nostrays --mindepth 1
    bgzip {params.subrun_folder}/{params.sample}_{params.myrange}.vcf
    tabix -fp vcf {params.subrun_folder}/{params.sample}_{params.myrange}.vcf.gz
    """

SnakeMake SAMtools tabix pilon From line 476 of master/Snakefile

shell: """
    #workaround!  Snakemake was causing the vcf's to appear newer than the indices, which tabix didn't like
    touch {params.sample}/2.polish/pilon/sub_runs/*/*.vcf.gz.tbi

    #get properly sorted intervals
    ls {params.sample}/2.polish/pilon/ranges/ | tr ':-' '\t' | sort -k1,1 -k2,2g | awk '{{print $1,":",$2,"-",$3}}' | tr -d ' ' > sorted_ranges.tmp

    #get header
    (zcat {params.sample}/2.polish/pilon/sub_runs/*/*.vcf.gz | head -1000 | grep ^#

    #get corrections within each range (omitting DUP records, which bcftools can't understand)
    cat sorted_ranges.tmp | xargs -n 1 -I foo sh -c "
    tabix {params.sample}/2.polish/pilon/sub_runs/foo/{params.sample}_foo.vcf.gz foo | grep -v '0/0' | grep -v DUP" |

    #sort by position
    sort -k1,1 -k2,2g) |

    #compress and store
    bgzip > {output} || true

    #index
    tabix -p vcf {output}
    """

SnakeMake BCFtools tabix From line 527 of master/Snakefile

shell: """
    bcftools consensus -f {input} -o {output}
    """

SnakeMake BCFtools Consensus From line 559 of master/Snakefile

shell: """
    cut -f1 -d ':' {input} > {output}
    """

SnakeMake From line 568 of master/Snakefile

shell: """
    #mkdir {output}
    cat {input[1]} | awk '{{if ($2 > {params.min_size}) print $1}}' | xargs -n 1 -I foo sh -c "
        samtools faidx {input[0]} foo > {params.sample}/3.circularization/1.candidate_genomes/foo.fa
    "
    """

SnakeMake SAMtools From line 585 of master/Snakefile

shell: """
    (samtools idxstats {input[0]} | grep {params.tig} | awk '{{if ($2 > 50000) print $1, ":", $2-50000, "-", $2; else print $1, ":", 1, "-", $2 }}' | tr -d ' ';
     samtools idxstats {input[0]} | grep {params.tig} | awk '{{if ($2 > 50000) print $1, ":", 1, "-", 50000; else print $1, ":", 1, "-", $2 }}' | tr -d ' ') |
    xargs -I foo sh -c 'samtools view -h {input[0]} foo | samtools fastq - || true' | paste - - - - | sort | uniq | tr '\t' '\n' | bgzip > {output}
    """

SnakeMake SAMtools From line 603 of master/Snakefile

shell: """
    flye -t {threads} --nano-raw {input} -o {params.directory} -g 1m
    """

SnakeMake Flye From line 622 of master/Snakefile

shell: """
    nucmer -b 5000 {input[0]} {input[1]} -p {params.directory}/{params.prefix} #-t {threads} #reverted nucmer back down to 3, no more multithreading :(

    delta-filter -q {params.directory}/{params.prefix}.delta > {params.directory}/{params.prefix}.filt.delta

    show-coords -Tq {params.directory}/{params.prefix}.filt.delta | cut -f8,9 | sed '1,3d' | sort | \
    uniq -c | tr -s ' ' '\\t' | cut -f2-99 | grep -v ^1 | cut -f2,3 > {params.directory}/potential_circularizations.tsv || true

    show-coords -Tql {params.directory}/{params.prefix}.filt.delta | grep -f {params.directory}/potential_circularizations.tsv | cat > {output} || true
    """

SnakeMake mummer2circos From line 649 of master/Snakefile

script:
    "scripts/spancircle.py"

SnakeMake From line 667 of master/Snakefile

run:
    span_out = open(input[0], 'r').readlines()
    cmd = ''
    if span_out == ['done\n'] or span_out[0].strip() == 'no circularizations': #no circularization occurred
        print('Nothing to do')
    else:
        trim = span_out[0].strip()
        trim_cmd = 'samtools faidx ' + input[1] + ' ' + trim + " > " + params.outfa + "\n"
        cmd += trim_cmd

        if len(span_out) == 3:
            extend = span_out[1].strip()
            extend_cmd = 'samtools faidx ' + input[3] + ' ' + extend + " | grep -v '>'" + " >> " + params.outfa + "\n"

SnakeMake SAMtools From line 682 of master/Snakefile

script:
    "scripts/encircle.py"

SnakeMake From line 719 of master/Snakefile

run:
    span_out = open(input[0], 'r').readlines()
    cmd = ''
    if span_out == ['done\n']: #no circularization occurred
        print('No over-circularization')
    else:
        trim = span_out[0].strip()
        trim_cmd = 'samtools faidx ' + input[1] + ' ' + trim + " > " + params.outfa + "\n"
        cmd += trim_cmd

    open(output[0], 'w').write(cmd + '\n')

SnakeMake SAMtools From line 734 of master/Snakefile

shell: """
    mkdir -p {params.sample}/3.circularization/3.circular_sequences/sh/
    touch {params.sample}/3.circularization/3.circular_sequences/sh/dummy
    find {params.sample}/3.circularization/3.circular_sequences/sh/ -type f | xargs cat | bash
    ls {params.sample}/3.circularization/3.circular_sequences/ | grep .fa$ | cut -f1-2 -d '_' > circs.tmp || true
    (cat {input[1]} | grep -vf circs.tmp |
    cut -f1 | xargs samtools faidx {input[0]}; ls {params.sample}/3.circularization/3.circular_sequences/* | grep .fa$ | xargs cat) |
    tr -d '\\n' | sed 's/\\(>[contigscaffold_]*[0-9]*\\)/\\n\\1\\n/g' | fold -w 120 | cut -f1 -d ':' | grep -v '^$' > {output} || true
    rm circs.tmp
    """

SnakeMake SAMtools From line 769 of master/Snakefile

shell: "cp {input} {output}"

SnakeMake From line 794 of master/Snakefile

shell:
    "minimap2 -t {threads} -x map-ont {input} > {output}"

SnakeMake Minimap2 From line 808 of master/Snakefile

shell:
    "minimap2 -t {threads} -ax map-ont {input} | samtools sort --threads {threads} > {output}"

SnakeMake SAMtools Minimap2 From line 823 of master/Snakefile

shell:
    "samtools index {input}"

SnakeMake SAMtools From line 833 of master/Snakefile

shell: "samtools faidx {input}"

SnakeMake SAMtools From line 841 of master/Snakefile

ShowHide 26 more snippets with no or duplicated tags.

Comments

Support

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Created: 1yr ago

Updated: 1yr ago

Maitainers: public

URL: https://github.com/bhattlab/lathe

Name: lathe

Version: 2

Badge:

Insert copied code into your website to add a link to this workflow.

Other Versions:

License: MIT License

Future updates

Related Workflows

psychip_snakemake — Show Details View Workflow

ENCODE pipeline for histone marks developed for the psychENCODE project

public

psychip pipeline is an improved version of the ENCODE pipeline for histone marks developed for the psychENCODE project. The o...

raw sequence reads Alignment Sequence alignment report macs2 ucsc-bedclip bedGraphToBigWig BEDTools BWA Picard SAMtools Snakemake

Free

Near-real time tracking of SARS-CoV-2 in Connecticut

public

Repository containing scripts to perform near-real time tracking of SARS-CoV-2 in Connecticut using genomic data. This pipeli...

JSON nextclade Augur Biopython FOCUS Pandas Snakemake bs4 epiweeks geopy matplotlib numpy pycountry pycountry-convert uszipcode

Free

cellranger-snakemake-gke — Show Details View Workflow

snakemake workflow to run cellranger on a given bucket using gke.

public

A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...

macs2 ucsc-bedclip bedGraphToBigWig BEDTools BWA Picard SAMtools Snakemake

Free

ATLAS - Three commands to start analyzing your metagenome data

public

Metagenome-atlas is a easy-to-use metagenomic pipeline based on snakemake. It handles all steps from QC, Assembly, Binning, t...

raw sequence reads Genome assembly Annotation track checkm2 gunc prodigal snakemake-wrapper-utils MEGAHIT Atlas BBMap Biopython BioRuby Bwa-mem2 cd-hit CheckM DAS Diamond eggNOG-mapper v2 MetaBAT 2 Minimap2 MMseqs MultiQC Pandas Picard pyfastx SAMtools SemiBin Snakemake SPAdes SqueezeMeta TADpole VAMB CONCOCT ete3 gtdbtk h5py networkx numpy plotly psutil utils metagenomics

Free

175

rna-seq-star-deseq2 — Show Details View Workflow

RNA-seq workflow using STAR and DESeq2

public

This workflow performs a differential gene expression analysis with STAR and Deseq2. The usage of this workflow is described ...

Free

dna-seq-gatk-variant-calling — Show Details View Workflow

This Snakemake pipeline implements the GATK best-practices workflow

public

This Snakemake pipeline implements the GATK best-practices workflow for calling small germline variants. The usage of thi...

VCF raw sequence reads Variant calling genetic variants gatk rust-bio-tools snakemake-wrapper-utils tabix BCFtools BWA FastQC MultiQC Pandas Picard SAMtools Snakemake Trimmomatic Variant Effect Predictor (VEP) common matplotlib numpy seaborn DNA

Free