Phaeocystis Viral Elements Extraction Workflow for Polinton-like Virus Study

public 1yr ago 0 bookmarks

View Workflow

phaeocystis-viral-elements — View Workflow

Help improve this workflow!

This workflow has been published but could be further improved with some additional meta data:

Keyword(s) in categories input, output, operation

You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .

Phaeocystis viral elements

The repository contains the bioinformatic workflow used for extraction of viral elements from Phaeocystis genomes for the paper Roitman et al (2023) "Infection cycle and phylogeny of a Polinton-like virus with a virophage lifestyle infecting Phaeocystis globosa ".

The workflow is built using snakemake . Dependencies are under the control of conda (see --use-conda ). Run as snakemake --cores 10 --use-conda .

Code Snippets

shell:
    "trim_galore -a ' {params.truseq} -a {params.nextera} -n 3' --cores {threads} --trim-n --paired --retain_unpaired --output_dir {params.outdir} --length {params.min_read} {input.r1} {input.r2}"

SnakeMake Trim_Galore From line 47 of workflow/Snakefile

shell:
    "trim_galore -a ' {params.truseq} -a {params.nextera} -n 3' --cores {threads} --trim-n --length {params.min_read} --output_dir {params.outdir} {input.reads}"

SnakeMake Trim_Galore From line 64 of workflow/Snakefile

shell:
    "seqkit faidx {input}"

SnakeMake seqkit From line 74 of workflow/Snakefile

shell:
    "megahit {params.reads} -f -o {output.out_dir} --k-list {params.k} --min-contig-len {params.min_contig_len} -t {threads} &> {log}"

SnakeMake MEGAHIT From line 150 of workflow/Snakefile

shell:
    "bowtie2-build {input} {wildcards.prefix}"

SnakeMake Bowtie 2 From line 170 of workflow/Snakefile

shell:
    "cutadapt -o /dev/null --info-file /dev/stdout --quiet -b {params.old_linker} {input} | python workflow/utilities/info2fastq.py {params.new_linker} | gzip > {output}"

SnakeMake Cutadapt From line 183 of workflow/Snakefile

shell:
    "nxtrim -1 {input.r1} -2 {input.r2} -O {params.prefix} --separate --rf"

SnakeMake From line 202 of workflow/Snakefile

shell:
    "trim_galore -a ' {params.truseq} -a {params.nextera} -n 3' --cores {threads} --trim-n --paired --retain_unpaired --output_dir {params.outdir} --length {params.min_read} {input.r1} {input.r2}"

SnakeMake Trim_Galore From line 225 of workflow/Snakefile

shell:
    "trim_galore -a ' {params.truseq} -a {params.nextera} -n 3' --cores {threads} --trim-n --output_dir {params.outdir} --length {params.min_read} {input}"

SnakeMake Trim_Galore From line 244 of workflow/Snakefile

shell:
    "seqkit seq -rp {input} | gzip > {output}"

SnakeMake seqkit From line 254 of workflow/Snakefile

shell:
    "echo '{params.config}' > {output}"

SnakeMake From line 312 of workflow/Snakefile

shell:
    "SOAPdenovo-fusion -D -s {input.config} -p {threads} -K {wildcards.K} -g {params.prefix} -c {input.contigs}"

SnakeMake From line 328 of workflow/Snakefile

shell:
    "SOAPdenovo-127mer map -s {input.config} -p {threads} -g {params.prefix}"

SnakeMake From line 343 of workflow/Snakefile

shell:
    "SOAPdenovo-127mer scaff -p {threads} -g {params.prefix}"

SnakeMake From line 358 of workflow/Snakefile

shell:
    "getorf -minsize {params.minsize} -filter -sformat pearson {input.fasta} | hmmsearch -E {params.e_value} -o /dev/null --tblout {output} {input.hmm} -"

SnakeMake hmmsearch (genouest) From line 372 of workflow/Snakefile

shell:
    "grep -hv '^#' {input.MCP} | cut -f1 -d' ' | sed 's/_[0-9]*$//' | sort -u | xargs seqkit faidx {input.fasta} | seqkit seq -m {params.min_len} -o {output}"

SnakeMake seqkit From line 386 of workflow/Snakefile

shell:
    "dust {input} {params.cutoff} > {output}"

SnakeMake From line 398 of workflow/Snakefile

shell:
    "bowtie2 --no-unal --threads {threads} --{params.mode} -x {input.fasta} -U {input.reads} 2> {log} | samtools sort -o {output}"

SnakeMake SAMtools Bowtie 2 From line 418 of workflow/Snakefile

shell:
    "bowtie2 --threads {threads} --fr --{params.mode} -x {input.fasta} -1 {input.r1} -2 {input.r2} 2> {log} | awk '/^@/||!and($2,4)||!and($2,8)' | samtools sort -o {output}"

SnakeMake SAMtools Bowtie 2 From line 437 of workflow/Snakefile

shell:
    "seqkit replace -p _pilon -o {output} {input}"

SnakeMake seqkit From line 447 of workflow/Snakefile

shell:
    "bowtie2 --no-unal --threads {threads} --{params.mode} -x {input.fasta} -U {input.reads} 2> {log} | samtools sort -o {output}"

SnakeMake SAMtools Bowtie 2 From line 467 of workflow/Snakefile

shell:
    "bowtie2 --threads {threads} --fr --{params.mode} -x {input.fasta} -1 {input.r1} -2 {input.r2} 2> {log} | awk '/^@/||!and($2,4)||!and($2,8)' | samtools sort -o {output}"

SnakeMake SAMtools Bowtie 2 From line 486 of workflow/Snakefile

shell:
    "samtools index {input}"

SnakeMake SAMtools From line 496 of workflow/Snakefile

shell:
    "samtools cat {input} | samtools fastq | gzip > {output}"

SnakeMake SAMtools From line 525 of workflow/Snakefile

shell:
    "pilon -Xmx{resources.mem_mb}M --genome {input.fasta} {params.bams} --outdir {output.outdir} --fix all"

SnakeMake pilon From line 567 of workflow/Snakefile

shell:
    "pilon -Xmx{resources.mem_mb}M --genome {input.fasta} {params.bams} --outdir {output.outdir} --fix all"

SnakeMake pilon From line 609 of workflow/Snakefile

shell:
    """
    docker run --user {params.user} --rm -v {params.basedir}:/app/mnt --workdir /app/mnt {params.container} \
        --threads {threads} --out {output.outdir} --min-overlap-length {params.min_overlap} --branch-limit {params.branch} {input.fasta} {input.reads}
    """

SnakeMake From line 630 of workflow/Snakefile

shell:
    "seqkit sort -l {input} | seqkit replace -p '_[0-9]+\\b' | seqkit rmdup | seqkit replace -sp [^ATGCatgc] -r N -o {output}"

SnakeMake seqkit From line 643 of workflow/Snakefile

shell:
    "seqkit seq -gM{params.lim} -o {output} {input}"

SnakeMake seqkit From line 655 of workflow/Snakefile

shell:
    "seqkit seq -gm{params.lim} -o {output} {input}"

SnakeMake seqkit From line 667 of workflow/Snakefile

shell:
    """
    echo "
        project = mira
        job = denovo,clustering,accurate
        parameters = --noclipping
        parameters = TEXT_SETTINGS -AS:epoq=no
        readgroup
        technology = text
        data = fna::{input}
    " > {output}
    """

SnakeMake From line 675 of workflow/Snakefile

shell:
    "mira -t {threads} {input.manifest} &> {log} && mv mira_assembly/* {params.outdir}/"

SnakeMake From line 703 of workflow/Snakefile

shell:
    "cat {input} > {output}"

SnakeMake From line 712 of workflow/Snakefile

from sys import stdin, argv

new_adapt = argv[1]
new_adapt_len = len(new_adapt)

for line in stdin:
    read_name, error, *rest = line.rstrip('\n').split('\t')
    if int(error) < 0:
        read_seq, read_qual, *other = rest
        print('@%s\n%s\n+\n%s' % (read_name, read_seq, read_qual))
    else:
        start, end, seq_left, seq_adapt, seq_right, adapter_name, qual_left, qual_adapt, qual_right, *other = rest
        adapt_len = adapt_len_top = len(seq_adapt)
        if adapt_len_top > new_adapt_len:
            adapt_len_top = new_adapt_len
        start_offset = len(seq_left)
        end_offset = len(seq_right)
        if start_offset == 0:
            seq_adapt = new_adapt[-adapt_len_top:]
            qual_adapt = qual_adapt[-adapt_len_top:]
        elif end_offset == 0:
            seq_adapt = new_adapt[:adapt_len_top]
            qual_adapt = qual_adapt[:adapt_len_top]
        else:
            seq_adapt = new_adapt
            qual_adapt = qual_adapt[0:new_adapt_len] + qual_adapt[-1] * (new_adapt_len - adapt_len)
        print('@%s\n%s%s%s\n+\n%s%s%s' % (read_name, seq_left, seq_adapt, seq_right, qual_left, qual_adapt, qual_right))