MToolBox pipeline written in snakemake to allow a better scalability

public 1yr ago Version: Prototype.2 0 bookmarks

View Workflow

This is an work-in-progress update of MToolBox ( PMID:25028726 ). Please find more at the official documentation .

Code Snippets

shell:
    """
    mkdir -p {params.outDir}
    fastqc -t {threads} -o {params.outDir} {input} &> {log}
    """

SnakeMake FastQC From line 99 of master/Snakefile

shell:
    """
    #module load gsnap
    gmap_build -D {params.gmap_db_dir} -d {params.gmap_db} -s none {input.mt_genome_fasta} &> {log}
    """

SnakeMake From line 119 of master/Snakefile

shell:
    """
    cat {input.mt_genome_fasta} {input.n_genome_fasta} > {output.mt_n_fasta}
    gmap_build -D {params.gmap_db_dir} -d {params.gmap_db} -s none {output.mt_n_fasta} &> {log}
    # rm {input.mt_genome_fasta}_{input.n_genome_fasta}.fasta
    """

SnakeMake From line 145 of master/Snakefile

shell:
    """

    mkdir -p {params.outDir}
    fastqc -t {threads} -o {params.outDir} {input} &> {log}

    """

SnakeMake FastQC From line 172 of master/Snakefile

run:
    #trimmomatic_adapters_path = get_trimmomatic_adapters_path()
    shell("export tap=$(which trimmomatic | sed 's/bin\/trimmomatic/share\/trimmomatic\/adapters\/TruSeq3-PE.fa/g'); trimmomatic PE {params.options} -threads {threads} {input.R1} {input.R2} {params.out1P} {params.out1U} {params.out2P} {params.out2U} ILLUMINACLIP:$tap:2:30:10 {params.processing_options} &> {log}")
    shell("zcat {params.out1U} {params.out2U} | gzip > {output.out1U} && rm {params.out1U} {params.out2U}")

SnakeMake Trimmomatic From line 208 of master/Snakefile

run:
    if seq_type == "pe":
        print("PE mode")
        shell("gsnap -D {params.gmap_db_dir} -d {params.gmap_db} -o {params.uncompressed_output} -A sam --gunzip --nofails --pairmax-dna=500 --query-unk-mismatch=1 {params.RG_tag} -n 1 -Q -O -t {threads} {input[0]} {input[1]} &> {log} && gzip {params.uncompressed_output} &>> {log}")
    if seq_type == "se":
        print("SE mode")
        shell("gsnap -D {params.gmap_db_dir} -d {params.gmap_db} -o {params.uncompressed_output} -A sam --gunzip --nofails --pairmax-dna=500 --query-unk-mismatch=1 {params.RG_tag} -n 1 -Q -O -t {threads} {input[0]} &> {log} && gzip {params.uncompressed_output} &>> {log}")
    elif seq_type == "both":
        print("PE + SE mode")
        shell("gsnap -D {params.gmap_db_dir} -d {params.gmap_db} -o {params.uncompressed_output} -A sam --gunzip --nofails --pairmax-dna=500 --query-unk-mismatch=1 {params.RG_tag} -n 1 -Q -O -t {threads} {input[0]} {input[1]} {input[2]} &> {log} && gzip {params.uncompressed_output} &>> {log}")

SnakeMake From line 234 of master/Snakefile

run:
    sam_to_fastq(samfile=input.outmt_sam, outmt1=output.outmt1,
                 outmt2=output.outmt2, outmt=output.outmt)

SnakeMake From line 257 of master/Snakefile

run:
    if os.path.isfile(input.outmt):
        shell("gsnap -D {params.gmap_db_dir} -d {params.gmap_db} -o {params.uncompressed_output} --gunzip -A sam --nofails --query-unk-mismatch=1 -O -t {threads} {input.outmt} &> {log.logS} && gzip {params.uncompressed_output} &>> {log.logS}")
    else:
        open(output.outS, 'a').close()

SnakeMake From line 283 of master/Snakefile

run:
    if os.path.isfile(input.outmt1):
        shell("gsnap -D {params.gmap_db_dir} -d {params.gmap_db} -o {params.uncompressed_output} --gunzip -A sam --nofails --query-unk-mismatch=1 -O -t {threads} {input.outmt1} {input.outmt2} &> {log.logP} && gzip {params.uncompressed_output} &>> {log.logP}")
    else:
        open(output.outP, 'a').close()

SnakeMake From line 314 of master/Snakefile

run:
    filter_alignments(outmt=input.outmt,
                      outS=input.outS,
                      outP=input.outP,
                      OUT=output.sam,
                      ref_mt_fasta=params.ref_mt_fasta)

SnakeMake From line 332 of master/Snakefile

shell:
    """
    zcat {input.sam} | samtools view -b -o {output} - &> {log}
    """

SnakeMake SAMtools From line 348 of master/Snakefile

shell:
    """
    samtools sort -o {output.sorted_bam} -T {params.TMP} {input.bam} &> {log}
    # samtools sort -o {output.sorted_bam} -T ${{TMP}} {input.bam}
    """

SnakeMake SAMtools From line 364 of master/Snakefile

run:
    if params.mark_duplicates == True:
        shell("picard MarkDuplicates \
            INPUT={input.sorted_bam} \
            OUTPUT={output.sorted_bam_md} \
            METRICS_FILE={output.metrics_file} \
            ASSUME_SORTED=true \
            REMOVE_DUPLICATES=true \
            TMP_DIR={params.TMP}")
    else:
        shutil.copy2(input.sorted_bam, output.sorted_bam_md)
        with open(output.metrics_file, "w") as f:
            f.write("")

SnakeMake Picard From line 381 of master/Snakefile

shell:
    """
    samtools merge {output.merged_bam} {input} &> {log}
    samtools index {output.merged_bam} {output.merged_bam_index}
    """

SnakeMake SAMtools From line 407 of master/Snakefile

shell:
    """
    samtools faidx {input.mt_n_fasta} &> {log}
    """

SnakeMake SAMtools From line 421 of master/Snakefile

run:
    shell("picard CreateSequenceDictionary R={input.mt_n_fasta} O={output.genome_dict}")

SnakeMake Picard From line 434 of master/Snakefile

shell:
    """
    java -Xmx6G -jar {params.source_dir}/modules/GenomeAnalysisTK.jar \
        -R {input.mt_n_fasta} \
        -T LeftAlignIndels \
        -I {input.merged_bam} \
        -o {output.merged_bam_left_realigned} \
        --filter_reads_with_N_cigar
    """

SnakeMake gatk From line 450 of master/Snakefile

shell:
    """
    samtools mpileup -B -f {params.genome_fasta} -o {output.pileup} {input.merged_bam} &> {log}
    """

SnakeMake SAMtools From line 472 of master/Snakefile

run:
    mt_table_data = pileup2mt_table(pileup=input.pileup, ref_fasta=params.ref_mt_fasta)
    write_mt_table(mt_table_data=mt_table_data, mt_table_file=output.mt_table)

SnakeMake From line 488 of master/Snakefile

run:
    # function (and related ones) from mtVariantCaller
    # vcf_dict = mtvcf_main_analysis(sam_file = input.sam, mtable_file = input.mt_table, name2 = wildcards.sample)
    tmp_sam = os.path.split(input.merged_bam)[1].replace(".bam", ".sam")
    shell("samtools view {merged_bam} > {tmp_dir}/{tmp_sam}".format(merged_bam=input.merged_bam,
                                                                    tmp_dir=params.TMP,
                                                                    tmp_sam=tmp_sam))
    vcf_dict = mtvcf_main_analysis(sam_file="{tmp_dir}/{tmp_sam}".format(tmp_dir=params.TMP,
                                                                         tmp_sam=tmp_sam),
                                   mtable_file=input.mt_table, name2=wildcards.sample)
    # ref_genome_mt will be used in the VCF descriptive field
    # seq_name in the VCF data
    seq_name = get_seq_name(params.ref_mt_fasta)
    VCF_RECORDS = VCFoutput(vcf_dict, reference=wildcards.ref_genome_mt,
                            seq_name=seq_name, vcffile=output.single_vcf)
    bed_output(VCF_RECORDS, seq_name=seq_name, bedfile=output.single_bed)
    # fasta output
    #contigs = pileup2mt_table(pileup=input.pileup, fasta=params.ref_mt_fasta, mt_table=in.mt_table)
    mt_table_data = pileup2mt_table(pileup=input.pileup,
                                    ref_fasta=params.ref_mt_fasta)
    gapped_fasta = mt_table_handle2gapped_fasta(mt_table_data=mt_table_data)
    contigs = gapped_fasta2contigs(gapped_fasta=gapped_fasta)
    fasta_output(vcf_dict=vcf_dict, ref_mt=params.ref_mt_fasta,
                 fasta_out=output.single_fasta, contigs=contigs)

SnakeMake SAMtools From line 514 of master/Snakefile

run:
    shell("bcftools index {input.single_vcf}")

SnakeMake BCFtools From line 546 of master/Snakefile

run:
    shell("bcftools merge {input.single_vcf_list} -O v -o {output.merged_vcf}")

SnakeMake BCFtools From line 561 of master/Snakefile

ShowHide 16 more snippets with no or duplicated tags.

Comments

Support

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Created: 1yr ago

Updated: 1yr ago

Maitainers: public

URL: https://github.com/mitoNGS/MToolBox_snakemake

Name: mtoolbox_snakemake

Version: Prototype.2

Badge:

Insert copied code into your website to add a link to this workflow.

License: None

Keywords:

VCF raw sequence reads Genetic mapping Alignment gatk BCFtools FastQC Picard SAMtools Snakemake Trimmomatic Genomics

Future updates

Related Workflows

psychip_snakemake — Show Details View Workflow

ENCODE pipeline for histone marks developed for the psychENCODE project

public

psychip pipeline is an improved version of the ENCODE pipeline for histone marks developed for the psychENCODE project. The o...

raw sequence reads Alignment Sequence alignment report macs2 ucsc-bedclip bedGraphToBigWig BEDTools BWA Picard SAMtools Snakemake

Free

Near-real time tracking of SARS-CoV-2 in Connecticut

public

Repository containing scripts to perform near-real time tracking of SARS-CoV-2 in Connecticut using genomic data. This pipeli...

JSON nextclade Augur Biopython FOCUS Pandas Snakemake bs4 epiweeks geopy matplotlib numpy pycountry pycountry-convert uszipcode

Free

cellranger-snakemake-gke — Show Details View Workflow

snakemake workflow to run cellranger on a given bucket using gke.

public

A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...

macs2 ucsc-bedclip bedGraphToBigWig BEDTools BWA Picard SAMtools Snakemake

Free

ATLAS - Three commands to start analyzing your metagenome data

public

Metagenome-atlas is a easy-to-use metagenomic pipeline based on snakemake. It handles all steps from QC, Assembly, Binning, t...

raw sequence reads Genome assembly Annotation track checkm2 gunc prodigal snakemake-wrapper-utils MEGAHIT Atlas BBMap Biopython BioRuby Bwa-mem2 cd-hit CheckM DAS Diamond eggNOG-mapper v2 MetaBAT 2 Minimap2 MMseqs MultiQC Pandas Picard pyfastx SAMtools SemiBin Snakemake SPAdes SqueezeMeta TADpole VAMB CONCOCT ete3 gtdbtk h5py networkx numpy plotly psutil utils metagenomics

Free

175

rna-seq-star-deseq2 — Show Details View Workflow

RNA-seq workflow using STAR and DESeq2

public

This workflow performs a differential gene expression analysis with STAR and Deseq2. The usage of this workflow is described ...

Free

dna-seq-gatk-variant-calling — Show Details View Workflow

This Snakemake pipeline implements the GATK best-practices workflow

public

This Snakemake pipeline implements the GATK best-practices workflow for calling small germline variants. The usage of thi...

VCF raw sequence reads Variant calling genetic variants gatk rust-bio-tools snakemake-wrapper-utils tabix BCFtools BWA FastQC MultiQC Pandas Picard SAMtools Snakemake Trimmomatic Variant Effect Predictor (VEP) common matplotlib numpy seaborn DNA

Free