mgi-ncov19 Snakemake COVID-19 Analysis Pipeline with Classification and De Novo Assembly

public 1yr ago 0 bookmarks

View Workflow

Help improve this workflow!

This workflow has been published but could be further improved with some additional meta data:

Keyword(s) in categories input, output, operation

You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .

This snakemake pipeline can conduct cov-19 virus classification, de novo assembly, coverage assessment and variant calling.

The pipeline is built according to https://github.com/BGI-IORI/nCoV_Meta (preprint: https://doi.org/10.1101/2020.03.16.993584)

Differences between mgi-ncov19-snakemake and nCoV_Meta:

low complexity reads removal were implemented with fastp (bgi: prinSEQ)
kraken2 was employed in this snakemake pipeline (bgi: kraken1)
SOAPnuke v2 (bgi: SOAPnuke v1) (better to change to SOAPnuke v1)
not yet finished with the alignment and variant calling steps.

updated: 2020-04-14

Usage:

0. Install Conda and Snakemake

1. Clone workflow

git clone git@github.com:huyue87/mgi-ncov19-snakemake

2. Execute workflow

# 2.1 load input files (paired-end raw reads)
cd mgi-ncov19-snakemake/input
ln -s Sample_{1,2}.fq.gz . 
# 2.2 run de novo assembly and generating sam files
cd mgi-ncov19-snakemake
snakemake --use-conda -n 
snakemake --use-conda

Code Snippets

shell:
    """
    kraken2 \
        --db {params.db} \
        --threads {threads} \
        --output {output.kraken}\
        --report {output.kreport}\
        --classified-out {params.classified} \
        --unclassified-out {params.unclassified}\
        --paired \
        {input.read1} {input.read2} 
        2> {log.stderr}
    pigz \
        --processes {threads} \
        --verbose \
        --force \
        {params.fq_to_compress} \
        2> {log.stderr}
    """

SnakeMake kraken2 From line 37 of master/Snakefile

shell:
    """
    fastp \
    -q 20 -u 20 -n 1 -l 50 \
    -i {input.read1} \
    -I {input.read2}\
    -o {output.read1}\
    -O {output.read2}\
    -j {output.json}\
    -h {output.html}\
    --adapter_sequence AAGTCGGAGGCCAAGCGGTCTTAGGAAGACAA \
    --adapter_sequence_r2 AAGTCGGATCGTAGCCATGTCGTTCTGTGAGCCAAGGAGTTG \
    --detect_adapter_for_pe \
    --disable_trim_poly_g \
    --thread {threads}\
    --low_complexity_filter \
    --complexity_threshold 7 \
    > {log.stdout} \
    2> {log.stderr}
    """

SnakeMake fastp From line 76 of master/Snakefile

shell:
    """
    SOAPnuke filter \
    -f AAGTCGGAGGCCAAGCGGTCTTAGGAAGACAA \
    -r AAGTCGGATCGTAGCCATGTCGTTCTGTGAGCCAAGGAGTTG \
    -l 20 -q 0.2 -n 0.02 -5 0 -Q 2 -G 2 \
    --fq1 {input.read1} \
    --fq2 {input.read2}\
    --cleanFq1 {params.clean1} \
    --cleanFq2 {params.clean2} \
    --outDir {params.outdir} \
    -T {threads} \
    > {log.stdout} \
    2> {log.stderr} 
    """

SnakeMake SOAPnuke From line 119 of master/Snakefile

shell:
    """
    spades.py \
    -1 {input.read1} \
    -2 {input.read2} \
    -o {params.outdir} \
    -t {threads} \
    > {log.stdout} \
    2> {log.stderr}
    """

SnakeMake SPAdes From line 153 of master/Snakefile

shell:
    """
    bwa aln -t 4 \
    {params.db} {input.read1} > {output.bwa1}\
    2>{log.stderr1}

    bwa aln -t 4 \
    {params.db} {input.read2} > {output.bwa2}\
    2>{log.stderr2}
    """

SnakeMake BWA From line 180 of master/Snakefile

shell:
    """
    bwa sampe {params.db} \
    {input.bwa1} {input.bwa2} {input.read1} {input.read2}\
    >{output.sam} \
    2>{log.stderr}
    """

SnakeMake BWA From line 207 of master/Snakefile

shell:
    """
    perl scripts/BWA_sam_Filter_identity_cvg.pl \
    -i {input} \
    -o {output}\
    -m 0.95 \
    -s 0.90 \
    > {log.stdout}\
    2> {log.stderr}
    """

SnakeMake From line 226 of master/Snakefile

shell:
    """
    samtools view -bt $DB.fai {input} > {output.L178} 

    samtools sort -n {output.L178} | samtools fixmate: - {output.L179}

    samtools flagstat {output.L179} > {output.L180}

    samtools sort {output.L179} -o {output.L181} --reference {params.db}

    samtools index {output.L181} 

    java -jar bin/picard.jar \
    MarkDuplicates AS=TRUE \
    VALIDATION_STRINGENCY=LENIENT \
    MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=1000 \
    REMOVE_DUPLICATES=TRUE INPUT={output.L181} \
    OUTPUT={output.L183a} \
    METRICS_FILE={output.L183b} \
    > {log.stdout} \
    2> {log.stderr}
    """