MetaGT: A pipeline for de novo assembly of metatranscriptomes with the aid of metagenomic data
Assembly and quantification metatranscriptome using metagenome data .
MetaGT is a bioinformatics analysis pipeline used for improving and quantification metatranscriptome assembly using metagenome data. The pipeline supports Illumina sequencing data and complete metagenome and metatranscriptome assemblies. The pipeline involves the alignment of metatranscriprome assembly to the metagenome assembly with further extracting CDSs, which are covered by transcripts.
The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker containers making installation trivial and results highly reproducible. The Nextflow DSL2 implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies.
Pipeline Summary
Optionally, if raw reades are used:
-
Sequencing quality control (
FastQC
) -
Assembly metagenome or metatranscriptome (
metaSPAdes, rnaSPAdes
)
By default, the pipeline currently performs the following:
-
Annotation metagenome (
Prokka
) -
Aligning metatranscriptome on metagenome (
minimap2
) -
Annotation unaligned transcripts (
TransDecoder
) -
Clustering covered CDS and CDS from unaligned transcripts (
MMseqs2
) -
Quantifying abundances of transcripts (
kallisto
)
Code Snippets
94 95 96 97 98 99 100 | """ echo $workflow.manifest.version > v_pipeline.txt echo $workflow.nextflow.version > v_nextflow.txt abricate --version > v_abricate.txt echo \$(mmseqs 2>&1) > v_varscan.txt scrape_software_versions.py &> software_versions_mqc.yaml """ |
31 32 33 34 35 36 | """ samtools index $bam extract_covered_cds.py --threads $task.cpus --gff $gff --bam $bam --genome $genome --output ${prefix}_covered_cds extract_unused.py ${prefix}_covered_cds.used_contigs.list $transcriptome unaligned.transcripts.fasta """ |
24 25 26 27 | """ kallisto index -i index $fasta """ |
45 46 47 48 49 | """ kallisto quant -i $index $input_reads -t $task.cpus -o ./ cp ./abundance.tsv abudance.tsv """ |
28 29 30 31 32 33 34 35 36 | """ minimap2 -t $task.cpus -aY --MD $genome $transcriptome > ${prefix}.align.sam samtools sort ${prefix}.align.sam -o ${prefix}.align.sorted.bam change_name.py $transcriptome ${meta_t.id}.all_transcripts.fasta """ |
29 30 31 32 33 | """ cat $cov_transcripts $cds_from_unaligned > all.fasta mmseqs easy-linclust all.fasta res tmp --min-seq-id ${params.cluster_idy} --cluster-mode 0 --seq-id-mode 2 --threads $task.cpus --cov-mode 1 mv res_rep_seq.fasta ${prefix}.rep_seq.fasta """ |
26 27 28 29 30 | """ [ ! -f ${prefix}.fasta ] && ln -s $fasta ${prefix}.fasta prokka ${prefix}.fasta --outdir ./ --force --prefix ${prefix} --metagenome --cpus $task.cpus """ |
25 26 27 28 29 | """ TransDecoder.LongOrfs -t $fasta --output_dir ./ mv ./longest_orfs.cds ${prefix}_cds_from_all_transcripts.fasta """ |
34 35 36 37 38 | """ fastqc $options.args --threads $task.cpus `parse_yaml.py $reads` -o ./ fastqc --version | sed -e "s/FastQC v//g" > ${software}.version.txt """ |
40 41 42 43 44 45 | """ [ ! -f ${prefix}_1.fastq.gz ] && ln -s ${reads[0]} ${prefix}_1.fastq.gz [ ! -f ${prefix}_2.fastq.gz ] && ln -s ${reads[1]} ${prefix}_2.fastq.gz fastqc $options.args --threads $task.cpus ${prefix}_1.fastq.gz ${prefix}_2.fastq.gz fastqc --version | sed -e "s/FastQC v//g" > ${software}.version.txt """ |
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 | """ $command \\ $options.args \\ --threads $task.cpus \\ $custom_hmms \\ $input_reads \\ -o ./ mv spades.log ${prefix}.spades.log if [ -f scaffolds.fasta ]; then mv scaffolds.fasta ${prefix}.scaffolds.fa fi if [ -f contigs.fasta ]; then mv contigs.fasta ${prefix}.contigs.fa fi if [ -f transcripts.fasta ]; then mv transcripts.fasta ${prefix}.transcripts.fa fi if [ -f assembly_graph_with_scaffolds.gfa ]; then mv assembly_graph_with_scaffolds.gfa ${prefix}.assembly.gfa fi if [ -f gene_clusters.fasta ]; then mv gene_clusters.fasta ${prefix}.gene_clusters.fa fi echo \$(spades.py --version 2>&1) | sed 's/^.*SPAdes genome assembler v//; s/ .*\$//' > ${software}.version.txt """ |
Support
- Future updates
Related Workflows





