A fully reproducible and state-of-the-art ancient DNA analysis pipeline
A fully reproducible and state-of-the-art ancient DNA analysis pipeline .
Introduction
nf-core/eager is a scalable and reproducible bioinformatics best-practise processing pipeline for genomic NGS sequencing data, with a focus on ancient DNA (aDNA) data. It is ideal for the (palaeo)genomic analysis of humans, animals, plants, microbes and even microbiomes.
The pipeline is built using Nextflow , a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker containers making installation trivial and results highly reproducible. The pipeline pre-processes raw data from FASTQ inputs, or preprocessed BAM inputs. It can align reads and performs extensive general NGS and aDNA specific quality-control on the results. It comes with docker, singularity or conda containers making installation trivial and results highly reproducible.
Quick Start
-
Install
nextflow
(>=20.07.1
) -
Install any of
Docker
,Singularity
,Podman
,Shifter
orCharliecloud
for full pipeline reproducibility (please only useConda
as a last resort; see docs ) -
Download the pipeline and test it on a minimal dataset with a single command:
nextflow run nf-core/eager -profile test,<docker/singularity/podman/shifter/charliecloud/conda/institute>
Please check nf-core/configs to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use
-profile <institute>
in your command. This will enable eitherdocker
orsingularity
and set the appropriate execution settings for your local compute environment. -
Start running your own analysis!
nextflow run nf-core/eager -profile <docker/singularity/podman/conda/institute> --input '*_R{1,2}.fastq.gz' --fasta '<your_reference>.fasta'
-
Once your run has completed successfully, clean up the intermediate files.
nextflow clean -f -k
See usage docs for all of the available options when running the pipeline.
N.B.
You can see an overview of the run in the MultiQC report located at
./results/MultiQC/multiqc_report.html
Modifications to the default pipeline are easily made using various options as described in the documentation.
Pipeline Summary
Default Steps
By default the pipeline currently performs the following:
-
Create reference genome indices for mapping (
bwa
,samtools
, andpicard
) -
Sequencing quality control (
FastQC
) -
Sequencing adapter removal, paired-end data merging (
AdapterRemoval
) -
Read mapping to reference using (
bwa aln
,bwa mem
,CircularMapper
, orbowtie2
) -
Post-mapping processing, statistics and conversion to bam (
samtools
) -
Ancient DNA C-to-T damage pattern visualisation (
DamageProfiler
) -
PCR duplicate removal (
DeDup
orMarkDuplicates
) -
Post-mapping statistics and BAM quality control (
Qualimap
) -
Library Complexity Estimation (
preseq
) -
Overall pipeline statistics summaries (
MultiQC
)
Additional Steps
Additional functionality contained by the pipeline currently includes:
Input
- Automatic merging of complex sequencing setups (e.g. multiple lanes, sequencing configurations, library types)
Preprocessing
-
Illumina two-coloured sequencer poly-G tail removal (
fastp
) -
Post-AdapterRemoval trimming of FASTQ files prior mapping (
fastp
) -
Automatic conversion of unmapped reads to FASTQ (
samtools
) -
Host DNA (mapped reads) stripping from input FASTQ files (for sensitive samples)
aDNA Damage manipulation
-
Damage removal/clipping for UDG+/UDG-half treatment protocols (
BamUtil
) -
Damaged reads extraction and assessment (
PMDTools
) -
Nuclear DNA contamination estimation of human samples (
angsd
)
Genotyping
-
Creation of VCF genotyping files (
GATK UnifiedGenotyper
,GATK HaplotypeCaller
andFreeBayes
) -
Creation of EIGENSTRAT genotyping files (
pileupCaller
) -
Creation of Genotype Likelihood files (
angsd
) -
Consensus sequence FASTA creation (
VCF2Genome
) -
SNP Table generation (
MultiVCFAnalyzer
)
Biological Information
-
Mitochondrial to Nuclear read ratio calculation (
MtNucRatioCalculator
) -
Statistical sex determination of human individuals (
Sex.DetERRmine
)
Metagenomic Screening
-
Low-sequenced complexity filtering (
BBduk
) -
Taxonomic binner with alignment (
MALT
) -
Taxonomic binner without alignment (
Kraken2
) -
aDNA characteristic screening of taxonomically binned data from MALT (
MaltExtract
)
Functionality Overview
A graphical overview of suggested routes through the pipeline depending on context can be seen below.
Documentation
The nf-core/eager pipeline comes with documentation about the pipeline: usage and output .
-
Pipeline configuration
-
- This includes tutorials, FAQs, and troubleshooting instructions
Credits
This pipeline was mostly written by Alexander Peltzer ( apeltzer ) and James A. Fellows Yates , with contributions from Stephen Clayton , Thiseas C. Lamnidis , Maxime Borry , Zandra Fagernäs , Aida Andrades Valtueña and Maxime Garcia and the nf-core community.
We thank the following people for their extensive assistance in the development of this pipeline:
Authors (alphabetical)
Additional Contributors (alphabetical)
Those who have provided conceptual guidance, suggestions, bug reports etc.
-
Arielle Munters
If you've contributed and you're missing in here, please let us know and we will add you in of course!
Contributions and Support
If you would like to contribute to this pipeline, please see the contributing guidelines .
For further information or help, don't hesitate to get in touch on the
Slack
#eager
channel
(you can join with
this invite
).
Citations
If you use
nf-core/eager
for your analysis, please cite the
eager
preprint as follows:
Fellows Yates JA, Lamnidis TC, Borry M, Valtueña Andrades A, Fagernäs Z, Clayton S, Garcia MU, Neukamm J, Peltzer A. 2021. Reproducible, portable, and efficient ancient genome reconstruction with nf-core/eager. PeerJ 9:e10947. DOI: 10.7717/peerj.10947 .
You can cite the eager zenodo record for a specific version using the following doi: 10.5281/zenodo.3698082
You can cite the
nf-core
publication as follows:
The nf-core framework for community-curated bioinformatics pipelines.
Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x .
In addition, references of tools and data used in this pipeline are as follows:
-
EAGER v1 , CircularMapper, DeDup* Peltzer, A., Jäger, G., Herbig, A., Seitz, A., Kniep, C., Krause, J., & Nieselt, K. (2016). EAGER: efficient ancient genome reconstruction. Genome Biology, 17(1), 1–14. https://doi.org/10.1186/s13059-016-0918-z . Download: https://github.com/apeltzer/EAGER-GUI and https://github.com/apeltzer/EAGER-CLI
-
FastQC Download: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/
-
AdapterRemoval v2 Schubert, M., Lindgreen, S., & Orlando, L. (2016). AdapterRemoval v2: rapid adapter trimming, identification, and read merging. BMC Research Notes, 9, 88. https://doi.org/10.1186/s13104-016-1900-2 . Download: https://github.com/MikkelSchubert/adapterremoval
-
bwa Li, H., & Durbin, R. (2009). Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics , 25(14), 1754–1760. https://doi.org/10.1093/bioinformatics/btp324 . Download: http://bio-bwa.sourceforge.net/bwa.shtml
-
SAMtools Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., … 1000 Genome Project Data Processing Subgroup. (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics , 25(16), 2078–2079. https://doi.org/10.1093/bioinformatics/btp352 . Download: http://www.htslib.org/
-
DamageProfiler Neukamm, J., Peltzer, A., & Nieselt, K. (2020). DamageProfiler: Fast damage pattern calculation for ancient DNA. In Bioinformatics (btab190). https://doi.org/10.1093/bioinformatics/btab190 . Download: https://github.com/Integrative-Transcriptomics/DamageProfiler
-
QualiMap Okonechnikov, K., Conesa, A., & García-Alcalde, F. (2016). Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data. Bioinformatics , 32(2), 292–294. https://doi.org/10.1093/bioinformatics/btv566 . Download: http://qualimap.bioinfo.cipf.es/
-
preseq Daley, T., & Smith, A. D. (2013). Predicting the molecular complexity of sequencing libraries. Nature Methods, 10(4), 325–327. https://doi.org/10.1038/nmeth.2375 . Download: http://smithlabresearch.org/software/preseq/
-
PMDTools Skoglund, P., Northoff, B. H., Shunkov, M. V., Derevianko, A. P., Pääbo, S., Krause, J., & Jakobsson, M. (2014). Separating endogenous ancient DNA from modern day contamination in a Siberian Neandertal. Proceedings of the National Academy of Sciences of the United States of America, 111(6), 2229–2234. https://doi.org/10.1073/pnas.1318934111 . Download: https://github.com/pontussk/PMDtools
-
MultiQC Ewels, P., Magnusson, M., Lundin, S., & Käller, M. (2016). MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics , 32(19), 3047–3048. https://doi.org/10.1093/bioinformatics/btw354 . Download: https://multiqc.info/
-
BamUtils Jun, G., Wing, M. K., Abecasis, G. R., & Kang, H. M. (2015). An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data. Genome Research, 25(6), 918–925. https://doi.org/10.1101/gr.176552.114 . Download: https://genome.sph.umich.edu/wiki/BamUtil
-
FastP Chen, S., Zhou, Y., Chen, Y., & Gu, J. (2018). fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics , 34(17), i884–i890. https://doi.org/10.1093/bioinformatics/bty560 . Download: https://github.com/OpenGene/fastp
-
GATK 3.5 DePristo, M. A., Banks, E., Poplin, R., Garimella, K. V., Maguire, J. R., Hartl, C., … Daly, M. J. (2011). A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nature Genetics, 43(5), 491–498. https://doi.org/10.1038/ng.806 .Download: https://console.cloud.google.com/storage/browser/gatk
-
GATK 4.X - no citation available yet. Download: https://github.com/broadinstitute/gatk/releases
-
VCF2Genome - Alexander Herbig and Alex Peltzer (unpublished). Download: https://github.com/apeltzer/VCF2Genome
-
MultiVCFAnalyzer Bos, K.I. et al., 2014. Pre-Columbian mycobacterial genomes reveal seals as a source of New World human tuberculosis. Nature, 514(7523), pp.494–497. Available at: http://dx.doi.org/10.1038/nature13591 . Download: https://github.com/alexherbig/MultiVCFAnalyzer
-
MTNucRatioCalculator Alex Peltzter (Unpublished). Download: https://github.com/apeltzer/MTNucRatioCalculator
-
Sex.DetERRmine.py Lamnidis, T.C. et al., 2018. Ancient Fennoscandian genomes reveal origin and spread of Siberian ancestry in Europe. Nature communications, 9(1), p.5018. Available at: http://dx.doi.org/10.1038/s41467-018-07483-5 . Download: https://github.com/TCLamnidis/Sex.DetERRmine.git
-
ANGSD Korneliussen, T.S., Albrechtsen, A. & Nielsen, R., 2014. ANGSD: Analysis of Next Generation Sequencing Data. BMC bioinformatics, 15, p.356. Available at: http://dx.doi.org/10.1186/s12859-014-0356-4 . Download: https://github.com/ANGSD/angsd
-
bedtools Quinlan, A.R. & Hall, I.M., 2010. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics , 26(6), pp.841–842. Available at: http://dx.doi.org/10.1093/bioinformatics/btq033 . Download: https://github.com/arq5x/bedtools2/releases
-
MALT . Download: https://software-ab.informatik.uni-tuebingen.de/download/malt/welcome.html
-
Vågene, Å.J. et al., 2018. Salmonella enterica genomes from victims of a major sixteenth-century epidemic in Mexico. Nature ecology & evolution, 2(3), pp.520–528. Available at: http://dx.doi.org/10.1038/s41559-017-0446-6 .
-
Herbig, A. et al., 2016. MALT: Fast alignment and analysis of metagenomic DNA sequence data applied to the Tyrolean Iceman. bioRxiv, p.050559. Available at: http://biorxiv.org/content/early/2016/04/27/050559 .
-
-
MaltExtract Huebler, R. et al., 2019. HOPS: Automated detection and authentication of pathogen DNA in archaeological remains. bioRxiv, p.534198. Available at: https://www.biorxiv.org/content/10.1101/534198v1?rss=1 . Download: https://github.com/rhuebler/MaltExtract
-
Kraken2 Wood, D et al., 2019. Improved metagenomic analysis with Kraken 2. Genome Biology volume 20, Article number: 257. Available at: https://doi.org/10.1186/s13059-019-1891-0 . Download: https://ccb.jhu.edu/software/kraken2/
-
endorS.py Aida Andrades Valtueña (Unpublished). Download: https://github.com/aidaanva/endorS.py
-
Bowtie2 Langmead, B. and Salzberg, S. L. 2012 Fast gapped-read alignment with Bowtie 2. Nature methods, 9(4), p. 357–359. doi: 10.1038/nmeth.1923 .
-
sequenceTools Stephan Schiffels (Unpublished). Download: https://github.com/stschiff/sequenceTools
-
EigenstratDatabaseTools Thiseas C. Lamnidis (Unpublished). Download: https://github.com/TCLamnidis/EigenStratDatabaseTools.git
-
mapDamage2 Jónsson, H., et al 2013. mapDamage2.0: fast approximate Bayesian estimates of ancient DNA damage parameters. Bioinformatics , 29(13), 1682–1684. https://doi.org/10.1093/bioinformatics/btt193
-
BBduk Brian Bushnell (Unpublished). Download: https://sourceforge.net/projects/bbmap/
Data References
This repository uses test data from the following studies:
-
Fellows Yates, J. A. et al. (2017) ‘Central European Woolly Mammoth Population Dynamics: Insights from Late Pleistocene Mitochondrial Genomes’, Scientific reports, 7(1), p. 17714. doi: 10.1038/s41598-017-17723-1 .
-
Gamba, C. et al. (2014) ‘Genome flux and stasis in a five millennium transect of European prehistory’, Nature communications, 5, p. 5257. doi: 10.1038/ncomms6257 .
-
Star, B. et al. (2017) ‘Ancient DNA reveals the Arctic origin of Viking Age cod from Haithabu, Germany’, Proceedings of the National Academy of Sciences of the United States of America, 114(34), pp. 9152–9157. doi: 10.1073/pnas.1710186114 .
-
de Barros Damgaard, P. et al. (2018). '137 ancient human genomes from across the Eurasian steppes.', Nature, 557(7705), 369–374. doi: 10.1038/s41586-018-0094-2
Code Snippets
193 194 195 | """ pigz -f -d -p ${task.cpus} $zipped_fasta """ |
504 505 506 507 | """ bwa index $fasta mkdir BWAIndex && mv ${fasta}* BWAIndex """ |
533 534 535 536 | """ bowtie2-build --threads ${task.cpus} $fasta $fasta mkdir BT2Index && mv ${fasta}* BT2Index """ |
575 576 577 | """ samtools faidx $fasta """ |
615 616 617 | """ picard -Xmx${task.memory.toMega()}M CreateSequenceDictionary R=$fasta O="${fasta.baseName}.dict" """ |
643 644 645 | """ samtools fastq -t ${bam} | pigz -p ${task.cpus} > ${base}.converted.fastq.gz """ |
664 665 666 | """ samtools index ${bam} ${size} """ |
699 700 701 702 703 | """ fastqc -t ${task.cpus} -q $r1 $r2 rename 's/_fastqc\\.zip\$/_raw_fastqc.zip/' *_fastqc.zip rename 's/_fastqc\\.html\$/_raw_fastqc.html/' *_fastqc.html """ |
705 706 707 708 709 | """ fastqc -t ${task.cpus} -q $r1 rename 's/_fastqc\\.zip\$/_raw_fastqc.zip/' *_fastqc.zip rename 's/_fastqc\\.html\$/_raw_fastqc.html/' *_fastqc.html """ |
746 747 748 | """ fastp --in1 ${r1} --out1 "${r1.baseName}.pG.fq.gz" -A -g --poly_g_min_len "${params.complexity_filter_poly_g_min}" -Q -L -w ${task.cpus} --json "${r1.baseName}"_L${lane}_fastp.json """ |
750 751 752 | """ fastp --in1 ${r1} --in2 ${r2} --out1 "${r1.baseName}.pG.fq.gz" --out2 "${r2.baseName}.pG.fq.gz" -A -g --poly_g_min_len "${params.complexity_filter_poly_g_min}" -Q -L -w ${task.cpus} --json "${libraryid}"_L${lane}_polyg_fastp.json """ |
820 821 822 823 824 825 826 827 828 829 830 831 832 | """ mkdir -p output AdapterRemoval --file1 ${r1} --file2 ${r2} --basename ${base}.pe --gzip --threads ${task.cpus} --qualitymax ${params.qualitymax} --collapse ${preserve5p} --trimns --trimqualities ${adapters_to_remove} --minlength ${params.clip_readlength} --minquality ${params.clip_min_read_quality} --minadapteroverlap ${params.min_adap_overlap} cat *.collapsed.gz *.collapsed.truncated.gz *.singleton.truncated.gz *.pair1.truncated.gz *.pair2.truncated.gz > output/${base}.pe.combined.tmp.fq.gz mv *.settings output/ ## Add R_ and L_ for unmerged reads for DeDup compatibility AdapterRemovalFixPrefix -Xmx${task.memory.toGiga()}g output/${base}.pe.combined.tmp.fq.gz | pigz -p ${task.cpus - 1} > output/${base}.pe.combined.fq.gz """ |
835 836 837 838 839 840 841 842 843 844 845 846 847 | """ mkdir -p output AdapterRemoval --file1 ${r1} --file2 ${r2} --basename ${base}.pe --gzip --threads ${task.cpus} --qualitymax ${params.qualitymax} --collapse ${preserve5p} --trimns --trimqualities ${adapters_to_remove} --minlength ${params.clip_readlength} --minquality ${params.clip_min_read_quality} --minadapteroverlap ${params.min_adap_overlap} cat *.collapsed.gz *.singleton.truncated.gz *.pair1.truncated.gz *.pair2.truncated.gz > output/${base}.pe.combined.tmp.fq.gz mv *.settings output/ ## Add R_ and L_ for unmerged reads for DeDup compatibility AdapterRemovalFixPrefix -Xmx${task.memory.toGiga()}g output/${base}.pe.combined.tmp.fq.gz | pigz -p ${task.cpus - 1} > output/${base}.pe.combined.fq.gz """ |
850 851 852 853 854 855 856 857 858 859 860 | """ mkdir -p output AdapterRemoval --file1 ${r1} --file2 ${r2} --basename ${base}.pe --gzip --threads ${task.cpus} --qualitymax ${params.qualitymax} --collapse ${preserve5p} --trimns --trimqualities ${adapters_to_remove} --minlength ${params.clip_readlength} --minquality ${params.clip_min_read_quality} --minadapteroverlap ${params.min_adap_overlap} cat *.collapsed.gz *.collapsed.truncated.gz > output/${base}.pe.combined.tmp.fq.gz ## Add R_ and L_ for unmerged reads for DeDup compatibility AdapterRemovalFixPrefix -Xmx${task.memory.toGiga()}g output/${base}.pe.combined.tmp.fq.gz | pigz -p ${task.cpus - 1} > output/${base}.pe.combined.fq.gz mv *.settings output/ """ |
863 864 865 866 867 868 869 870 871 872 873 | """ mkdir -p output AdapterRemoval --file1 ${r1} --file2 ${r2} --basename ${base}.pe --gzip --threads ${task.cpus} --qualitymax ${params.qualitymax} --collapse ${preserve5p} --trimns --trimqualities ${adapters_to_remove} --minlength ${params.clip_readlength} --minquality ${params.clip_min_read_quality} --minadapteroverlap ${params.min_adap_overlap} cat *.collapsed.gz > output/${base}.pe.combined.tmp.fq.gz ## Add R_ and L_ for unmerged reads for DeDup compatibility AdapterRemovalFixPrefix -Xmx${task.memory.toGiga()}g output/${base}.pe.combined.tmp.fq.gz | pigz -p ${task.cpus - 1} > output/${base}.pe.combined.fq.gz mv *.settings output/ """ |
877 878 879 880 881 882 883 884 885 886 887 | """ mkdir -p output AdapterRemoval --file1 ${r1} --file2 ${r2} --basename ${base}.pe --gzip --threads ${task.cpus} --qualitymax ${params.qualitymax} --collapse ${preserve5p} --adapter1 "" --adapter2 "" cat *.collapsed.gz *.pair1.truncated.gz *.pair2.truncated.gz > output/${base}.pe.combined.tmp.fq.gz ## Add R_ and L_ for unmerged reads for DeDup compatibility AdapterRemovalFixPrefix -Xmx${task.memory.toGiga()}g output/${base}.pe.combined.tmp.fq.gz | pigz -p ${task.cpus - 1} > output/${base}.pe.combined.fq.gz mv *.settings output/ """ |
891 892 893 894 895 896 897 898 899 900 901 | """ mkdir -p output AdapterRemoval --file1 ${r1} --file2 ${r2} --basename ${base}.pe --gzip --threads ${task.cpus} --qualitymax ${params.qualitymax} --collapse ${preserve5p} --adapter1 "" --adapter2 "" cat *.collapsed.gz > output/${base}.pe.combined.tmp.fq.gz ## Add R_ and L_ for unmerged reads for DeDup compatibility AdapterRemovalFixPrefix -Xmx${task.memory.toGiga()}g output/${base}.pe.combined.tmp.fq.gz | pigz -p ${task.cpus - 1} > output/${base}.pe.combined.fq.gz mv *.settings output/ """ |
904 905 906 907 908 909 | """ mkdir -p output AdapterRemoval --file1 ${r1} --file2 ${r2} --basename ${base}.pe --gzip --threads ${task.cpus} --qualitymax ${params.qualitymax} ${preserve5p} --trimns --trimqualities ${adapters_to_remove} --minlength ${params.clip_readlength} --minquality ${params.clip_min_read_quality} --minadapteroverlap ${params.min_adap_overlap} mv ${base}.pe.pair*.truncated.gz *.settings output/ """ |
912 913 914 915 916 | """ mkdir -p output AdapterRemoval --file1 ${r1} --basename ${base}.se --gzip --threads ${task.cpus} --qualitymax ${params.qualitymax} ${preserve5p} --trimns --trimqualities ${adapters_to_remove} --minlength ${params.clip_readlength} --minquality ${params.clip_min_read_quality} --minadapteroverlap ${params.min_adap_overlap} mv *.settings *.se.truncated.gz output/ """ |
919 920 921 922 923 | """ mkdir -p output AdapterRemoval --file1 ${r1} --basename ${base}.se --gzip --threads ${task.cpus} --qualitymax ${params.qualitymax} ${preserve5p} --adapter1 "" --adapter2 "" mv *.settings *.se.truncated.gz output/ """ |
996 997 998 | """ fastp --in1 ${r1} --trim_front1 ${params.post_ar_trim_front} --trim_tail1 ${params.post_ar_trim_tail} -A -G -Q -L -w ${task.cpus} --out1 "${libraryid}"_L"${lane}"_R1_postartrimmed.fq.gz """ |
1000 1001 1002 | """ fastp --in1 ${r1} --in2 ${r2} --trim_front1 ${params.post_ar_trim_front} --trim_tail1 ${params.post_ar_trim_tail} --trim_front2 ${params.post_ar_trim_front2} --trim_tail2 ${params.post_ar_trim_tail2} -A -G -Q -L -w ${task.cpus} --out1 "${libraryid}"_L"${lane}"_R1_postartrimmed.fq.gz --out2 "${libraryid}"_L"${lane}"_R2_postartrimmed.fq.gz """ |
1132 1133 1134 1135 | """ cat ${r1} > "${libraryid}"_R1_lanemerged.fq.gz cat ${r2} > "${libraryid}"_R2_lanemerged.fq.gz """ |
1137 1138 1139 | """ cat ${r1} > "${libraryid}"_R1_lanemerged.fq.gz """ |
1205 1206 1207 1208 | """ cat ${r1} > "${libraryid}"_R1_lanemerged.fq.gz cat ${r2} > "${libraryid}"_R2_lanemerged.fq.gz """ |
1210 1211 1212 | """ cat ${r1} > "${libraryid}"_R1_lanemerged.fq.gz """ |
1238 1239 1240 | """ fastqc -t ${task.cpus} -q ${r1} ${r2} """ |
1242 1243 1244 | """ fastqc -t ${task.cpus} -q ${r1} """ |
1276 1277 1278 1279 1280 1281 | """ bwa aln -t ${task.cpus} $fasta ${r1} -n ${params.bwaalnn} -l ${params.bwaalnl} -k ${params.bwaalnk} -o ${params.bwaalno} -f ${libraryid}.r1.sai bwa aln -t ${task.cpus} $fasta ${r2} -n ${params.bwaalnn} -l ${params.bwaalnl} -k ${params.bwaalnk} -o ${params.bwaalno} -f ${libraryid}.r2.sai bwa sampe -r "@RG\\tID:ILLUMINA-${libraryid}\\tSM:${samplename}\\tPL:illumina\\tPU:ILLUMINA-${libraryid}-${seqtype}" $fasta ${libraryid}.r1.sai ${libraryid}.r2.sai ${r1} ${r2} | samtools sort -@ ${task.cpus - 1} -O bam - > ${libraryid}_"${seqtype}".mapped.bam samtools index "${libraryid}"_"${seqtype}".mapped.bam ${size} """ |
1284 1285 1286 1287 1288 | """ bwa aln -t ${task.cpus} ${fasta} ${r1} -n ${params.bwaalnn} -l ${params.bwaalnl} -k ${params.bwaalnk} -o ${params.bwaalno} -f ${libraryid}.sai bwa samse -r "@RG\\tID:ILLUMINA-${libraryid}\\tSM:${samplename}\\tPL:illumina\\tPU:ILLUMINA-${libraryid}-${seqtype}" $fasta ${libraryid}.sai $r1 | samtools sort -@ ${task.cpus - 1} -O bam - > "${libraryid}"_"${seqtype}".mapped.bam samtools index "${libraryid}"_"${seqtype}".mapped.bam ${size} """ |
1316 1317 1318 1319 | """ bwa mem -t ${split_cpus} $fasta $r1 $r2 -R "@RG\\tID:ILLUMINA-${libraryid}\\tSM:${samplename}\\tPL:illumina\\tPU:ILLUMINA-${libraryid}-${seqtype}" | samtools sort -@ ${split_cpus} -O bam - > "${libraryid}"_"${seqtype}".mapped.bam samtools index ${size} -@ ${task.cpus} "${libraryid}"_"${seqtype}".mapped.bam """ |
1321 1322 1323 1324 | """ bwa mem -t ${split_cpus} $fasta $r1 -R "@RG\\tID:ILLUMINA-${libraryid}\\tSM:${samplename}\\tPL:illumina\\tPU:ILLUMINA-${libraryid}-${seqtype}" | samtools sort -@ ${split_cpus} -O bam - > "${libraryid}"_"${seqtype}".mapped.bam samtools index -@ ${task.cpus} "${libraryid}"_"${seqtype}".mapped.bam ${size} """ |
1352 1353 1354 1355 | """ circulargenerator -Xmx${task.memory.toGiga()}g -e ${params.circularextension} -i $fasta -s ${params.circulartarget} bwa index $prefix """ |
1382 1383 1384 1385 1386 1387 1388 1389 | """ bwa aln -t ${task.cpus} $elongated_root $r1 -n ${params.bwaalnn} -l ${params.bwaalnl} -k ${params.bwaalnk} -f ${libraryid}.r1.sai bwa aln -t ${task.cpus} $elongated_root $r2 -n ${params.bwaalnn} -l ${params.bwaalnl} -k ${params.bwaalnk} -f ${libraryid}.r2.sai bwa sampe -r "@RG\\tID:ILLUMINA-${libraryid}\\tSM:${samplename}\\tPL:illumina\\tPU:ILLUMINA-${libraryid}-${seqtype}" $elongated_root ${libraryid}.r1.sai ${libraryid}.r2.sai $r1 $r2 > tmp.out realignsamfile -Xmx${task.memory.toGiga()}g -e ${params.circularextension} -i tmp.out -r $fasta $filter samtools sort -@ ${task.cpus} -O bam tmp_realigned.bam > ${libraryid}_"${seqtype}".mapped.bam samtools index "${libraryid}"_"${seqtype}".mapped.bam ${size} """ |
1391 1392 1393 1394 1395 1396 1397 | """ bwa aln -t ${task.cpus} $elongated_root $r1 -n ${params.bwaalnn} -l ${params.bwaalnl} -k ${params.bwaalnk} -f ${libraryid}.sai bwa samse -r "@RG\\tID:ILLUMINA-${libraryid}\\tSM:${samplename}\\tPL:illumina\\tPU:ILLUMINA-${libraryid}-${seqtype}" $elongated_root ${libraryid}.sai $r1 > tmp.out realignsamfile -Xmx${task.memory.toGiga()}g -e ${params.circularextension} -i tmp.out -r $fasta $filter samtools sort -@ ${task.cpus} -O bam tmp_realigned.bam > "${libraryid}"_"${seqtype}".mapped.bam samtools index "${libraryid}"_"${seqtype}".mapped.bam ${size} """ |
1462 1463 1464 1465 | """ bowtie2 -x ${fasta} -1 ${r1} -2 ${r2} -p ${split_cpus} ${sensitivity} ${bt2n} ${bt2l} ${trim5} ${trim3} --maxins ${params.bt2_maxins} --rg-id ILLUMINA-${libraryid} --rg SM:${samplename} --rg PL:illumina --rg PU:ILLUMINA-${libraryid}-${seqtype} 2> "${libraryid}"_bt2.log | samtools sort -@ ${split_cpus} -O bam > "${libraryid}"_"${seqtype}".mapped.bam samtools index "${libraryid}"_"${seqtype}".mapped.bam ${size} """ |
1468 1469 1470 1471 | """ bowtie2 -x ${fasta} -U ${r1} -p ${split_cpus} ${sensitivity} ${bt2n} ${bt2l} ${trim5} ${trim3} --rg-id ILLUMINA-${libraryid} --rg SM:${samplename} --rg PL:illumina --rg PU:ILLUMINA-${libraryid}-${seqtype} 2> "${libraryid}"_bt2.log | samtools sort -@ ${split_cpus} -O bam > "${libraryid}"_"${seqtype}".mapped.bam samtools index "${libraryid}"_"${seqtype}".mapped.bam ${size} """ |
1537 1538 1539 1540 | """ samtools index $bam extract_map_reads.py $bam ${r1} -m ${params.hostremoval_mode} $merged -of $out_fwd -t ${task.cpus} """ |
1544 1545 1546 1547 | """ samtools index $bam extract_map_reads.py $bam ${r1} -rev ${r2} -m ${params.hostremoval_mode} $merged -of $out_fwd -or $out_rev -t ${task.cpus} """ |
1602 1603 1604 1605 | """ samtools merge ${libraryid}_seqtypemerged.bam ${bam} samtools index ${libraryid}_seqtypemerged.bam ${size} """ |
1628 1629 1630 | """ samtools flagstat $bam > ${libraryid}_flagstat.stats """ |
1664 1665 1666 1667 | """ samtools view -h ${bam} -@ ${task.cpus} -q ${params.bam_mapping_quality_threshold} -b > ${libraryid}.filtered.bam samtools index ${libraryid}.filtered.bam ${size} """ |
1669 1670 1671 1672 | """ samtools view -h ${bam} -@ ${task.cpus} -F4 -q ${params.bam_mapping_quality_threshold} -b > ${libraryid}.filtered.bam samtools index ${libraryid}.filtered.bam ${size} """ |
1674 1675 1676 1677 1678 | """ samtools view -h ${bam} -@ ${task.cpus} -f4 -b > ${libraryid}.unmapped.bam samtools view -h ${bam} -@ ${task.cpus} -F4 -q ${params.bam_mapping_quality_threshold} -b > ${libraryid}.filtered.bam samtools index ${libraryid}.filtered.bam ${size} """ |
1680 1681 1682 1683 1684 1685 1686 1687 1688 | """ samtools view -h ${bam} -@ ${task.cpus} -f4 -b > ${libraryid}.unmapped.bam samtools view -h ${bam} -@ ${task.cpus} -F4 -q ${params.bam_mapping_quality_threshold} -b > ${libraryid}.filtered.bam samtools index ${libraryid}.filtered.bam ${size} ## FASTQ samtools fastq -tN ${libraryid}.unmapped.bam | pigz -p ${task.cpus - 1} > ${libraryid}.unmapped.fastq.gz rm ${libraryid}.unmapped.bam """ |
1690 1691 1692 1693 1694 1695 1696 1697 | """ samtools view -h ${bam} -@ ${task.cpus} -f4 -b > ${libraryid}.unmapped.bam samtools view -h ${bam} -@ ${task.cpus} -F4 -q ${params.bam_mapping_quality_threshold} -b > ${libraryid}.filtered.bam samtools index ${libraryid}.filtered.bam ${size} ## FASTQ samtools fastq -tN ${libraryid}.unmapped.bam | pigz -p ${task.cpus -1} > ${libraryid}.unmapped.fastq.gz """ |
1700 1701 1702 1703 1704 | """ samtools view -h ${bam} -@ ${task.cpus} -q ${params.bam_mapping_quality_threshold} -b > tmp_mapped.bam filter_bam_fragment_length.py -a -l ${params.bam_filter_minreadlength} -o ${libraryid} tmp_mapped.bam samtools index ${libraryid}.filtered.bam ${size} """ |
1706 1707 1708 1709 1710 | """ samtools view -h ${bam} -@ ${task.cpus} -F4 -q ${params.bam_mapping_quality_threshold} -b > tmp_mapped.bam filter_bam_fragment_length.py -a -l ${params.bam_filter_minreadlength} -o ${libraryid} tmp_mapped.bam samtools index ${libraryid}.filtered.bam ${size} """ |
1712 1713 1714 1715 1716 1717 | """ samtools view -h ${bam} -@ ${task.cpus} -f4 -b > ${libraryid}.unmapped.bam samtools view -h ${bam} -@ ${task.cpus} -F4 -q ${params.bam_mapping_quality_threshold} -b > tmp_mapped.bam filter_bam_fragment_length.py -a -l ${params.bam_filter_minreadlength} -o ${libraryid} tmp_mapped.bam samtools index ${libraryid}.filtered.bam ${size} """ |
1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 | """ samtools view -h ${bam} -@ ${task.cpus} -f4 -b > ${libraryid}.unmapped.bam samtools view -h ${bam} -@ ${task.cpus} -F4 -q ${params.bam_mapping_quality_threshold} -b > tmp_mapped.bam filter_bam_fragment_length.py -a -l ${params.bam_filter_minreadlength} -o ${libraryid} tmp_mapped.bam samtools index ${libraryid}.filtered.bam ${size} ## FASTQ samtools fastq -tN ${libraryid}.unmapped.bam | pigz -p ${task.cpus - 1} > ${libraryid}.unmapped.fastq.gz rm ${libraryid}.unmapped.bam """ |
1730 1731 1732 1733 1734 1735 1736 1737 1738 | """ samtools view -h ${bam} -@ ${task.cpus} -f4 -b > ${libraryid}.unmapped.bam samtools view -h ${bam} -@ ${task.cpus} -F4 -q ${params.bam_mapping_quality_threshold} -b > tmp_mapped.bam filter_bam_fragment_length.py -a -l ${params.bam_filter_minreadlength} -o ${libraryid} tmp_mapped.bam samtools index ${libraryid}.filtered.bam ${size} ## FASTQ samtools fastq -tN ${libraryid}.unmapped.bam | pigz -p ${task.cpus} > ${libraryid}.unmapped.fastq.gz """ |
1771 1772 1773 | """ samtools flagstat $bam > ${libraryid}_postfilterflagstat.stats """ |
1815 1816 1817 | """ endorS.py -o json -n ${libraryid} ${stats} ${poststats} """ |
1819 1820 1821 | """ endorS.py -o json -n ${libraryid} ${stats} """ |
1850 1851 1852 1853 1854 1855 1856 | """ mv ${bam} ${libraryid}.bam dedup -Xmx${task.memory.toGiga()}g -i ${libraryid}.bam $treat_merged -o . -u mv *.log dedup.log samtools sort -@ ${task.cpus} "${libraryid}"_rmdup.bam -o "${libraryid}"_rmdup.bam samtools index "${libraryid}"_rmdup.bam ${size} """ |
1858 1859 1860 1861 1862 1863 | """ dedup -Xmx${task.memory.toGiga()}g -i ${libraryid}.bam $treat_merged -o . -u mv *.log dedup.log samtools sort -@ ${task.cpus} "${libraryid}"_rmdup.bam -o "${libraryid}"_rmdup.bam samtools index "${libraryid}"_rmdup.bam ${size} """ |
1888 1889 1890 1891 1892 | """ mv ${bam} ${libraryid}.bam picard -Xmx${task.memory.toMega()}M MarkDuplicates INPUT=${libraryid}.bam OUTPUT=${libraryid}_rmdup.bam REMOVE_DUPLICATES=TRUE AS=TRUE METRICS_FILE="${libraryid}_rmdup.metrics" VALIDATION_STRINGENCY=SILENT samtools index ${libraryid}_rmdup.bam ${size} """ |
1894 1895 1896 1897 | """ picard -Xmx${task.memory.toMega()}M MarkDuplicates INPUT=${libraryid}.bam OUTPUT=${libraryid}_rmdup.bam REMOVE_DUPLICATES=TRUE AS=TRUE METRICS_FILE="${libraryid}_rmdup.metrics" VALIDATION_STRINGENCY=SILENT samtools index ${libraryid}_rmdup.bam ${size} """ |
1972 1973 1974 1975 | """ samtools merge ${samplename}_udg${udg}_libmerged_rmdup.bam ${bam} samtools index ${samplename}_udg${udg}_libmerged_rmdup.bam ${size} """ |
2022 2023 2024 | """ preseq c_curve -s ${params.preseq_step_size} -o ${input.baseName}.preseq -H ${input} """ |
2026 2027 2028 | """ preseq c_curve -s ${params.preseq_step_size} -o ${input.baseName}.preseq -B ${input} ${pe_mode} """ |
2030 2031 2032 | """ preseq c_curve -s ${params.preseq_step_size} -o ${input.baseName}.preseq -B ${input} ${pe_mode} """ |
2034 2035 2036 | """ preseq lc_extrap -s ${params.preseq_step_size} -o ${input.baseName}.preseq -H ${input} -n ${params.preseq_bootstrap} -e ${params.preseq_maxextrap} -cval ${params.preseq_cval} -x ${params.preseq_terms} """ |
2038 2039 2040 | """ preseq lc_extrap -s ${params.preseq_step_size} -o ${input.baseName}.preseq -B ${input} ${pe_mode} -n ${params.preseq_bootstrap} -e ${params.preseq_maxextrap} -cval ${params.preseq_cval} -x ${params.preseq_terms} """ |
2042 2043 2044 | """ preseq lc_extrap -s ${params.preseq_step_size} -o ${input.baseName}.preseq -B ${input} ${pe_mode} -n ${params.preseq_bootstrap} -e ${params.preseq_maxextrap} -cval ${params.preseq_cval} -x ${params.preseq_terms} """ |
2066 2067 2068 2069 2070 2071 2072 2073 | """ ## Create genome file from bam header samtools view -H ${bam} | grep '@SQ' | sed 's#@SQ\tSN:\\|LN:##g' > genome.txt ## Run bedtools bedtools coverage -nonamecheck -g genome.txt -sorted -a ${anno_file} -b ${bam} | pigz -p ${task.cpus - 1} > "${bam.baseName}".breadth.gz bedtools coverage -nonamecheck -g genome.txt -sorted -a ${anno_file} -b ${bam} -mean | pigz -p ${task.cpus - 1} > "${bam.baseName}".depth.gz """ |
2104 2105 2106 | """ damageprofiler -Xmx${task.memory.toGiga()}g -i $bam -r $fasta -l ${params.damageprofiler_length} -t ${params.damageprofiler_threshold} -o . -yaxis_damageplot ${params.damageprofiler_yaxis} """ |
2134 2135 2136 2137 | """ mapDamage -i ${bam} -r ${fasta} --rescale --rescale-out="${base}_rescaled.bam" --seq-length=${params.rescale_seqlength} ${rescale_length_5p} ${rescale_length_3p} ${singlestranded} samtools index ${base}_rescaled.bam ${size} """ |
2159 2160 2161 | """ bedtools maskfasta -fi ${fasta} -bed ${bedfile} -fo ${fasta.baseName}_masked.fa """ |
2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 | """ #Run Filtering step samtools calmd ${bam} ${fasta} | pmdtools --threshold ${params.pmdtools_threshold} ${treatment} --header | samtools view -Sb - > "${libraryid}".pmd.bam #Run Calc Range step ## To allow early shut off of pipe: https://github.com/nextflow-io/nextflow/issues/1564 trap 'if [[ \$? == 141 ]]; then echo "Shutting samtools early due to -n parameter" && samtools index ${libraryid}.pmd.bam ${size}; exit 0; fi' EXIT samtools calmd ${bam} ${fasta} | pmdtools --deamination ${platypus} --range ${params.pmdtools_range} ${treatment} -n ${params.pmdtools_max_reads} > "${libraryid}".cpg.range."${params.pmdtools_range}".txt samtools index ${libraryid}.pmd.bam ${size} """ |
2246 2247 2248 2249 2250 | """ bam trimBam $bam tmp.bam -L ${left_clipping} -R ${right_clipping} ${softclip} samtools sort -@ ${task.cpus} tmp.bam -o ${libraryid}.trimmed.bam samtools index ${libraryid}.trimmed.bam ${size} """ |
2296 2297 2298 2299 | """ samtools merge ${samplename}_libmerged_add.bam ${bam} samtools index ${samplename}_libmerged_add.bam ${size} """ |
2326 2327 2328 | """ qualimap bamqc -bam $bam -nt ${task.cpus} -outdir . -outformat "HTML" ${snpcap} --java-mem-size=${task.memory.toGiga()}G """ |
2392 2393 2394 2395 | """ picard -Xmx${task.memory.toGiga()}g AddOrReplaceReadGroups I=${bam} O=${samplename}_rg.bam RGID=1 RGLB="${samplename}_rg" RGPL=illumina RGPU=4410 RGSM="${samplename}_rg" VALIDATION_STRINGENCY=LENIENT samtools index ${samplename}_rg.bam ${size} """ |
2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 | """ samtools index -b ${bam} gatk3 -Xmx${task.memory.toGiga()}g -T RealignerTargetCreator -R ${fasta} -I ${bam} -nt ${task.cpus} -o ${samplename}.intervals ${defaultbasequalities} gatk3 -Xmx${task.memory.toGiga()}g -T IndelRealigner -R ${fasta} -I ${bam} -targetIntervals ${samplename}.intervals -o ${samplename}.realign.bam ${defaultbasequalities} gatk3 -Xmx${task.memory.toGiga()}g -T UnifiedGenotyper -R ${fasta} -I ${samplename}.realign.bam -o ${samplename}.unifiedgenotyper.vcf -nt ${task.cpus} --genotype_likelihoods_model ${params.gatk_ug_genotype_model} -stand_call_conf ${params.gatk_call_conf} --sample_ploidy ${params.gatk_ploidy} -dcov ${params.gatk_downsample} --output_mode ${params.gatk_ug_out_mode} ${defaultbasequalities} $keep_realign bgzip -@ ${task.cpus} ${samplename}.unifiedgenotyper.vcf """ |
2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 | """ samtools index ${bam} gatk3 -Xmx${task.memory.toGiga()}g -T RealignerTargetCreator -R ${fasta} -I ${bam} -nt ${task.cpus} -o ${samplename}.intervals ${defaultbasequalities} gatk3 -Xmx${task.memory.toGiga()}g -T IndelRealigner -R ${fasta} -I ${bam} -targetIntervals ${samplenane}.intervals -o ${samplename}.realign.bam ${defaultbasequalities} gatk3 -Xmx${task.memory.toGiga()}g -T UnifiedGenotyper -R ${fasta} -I ${samplename}.realign.bam -o ${samplename}.unifiedgenotyper.vcf -nt ${task.cpus} --dbsnp ${params.gatk_dbsnp} --genotype_likelihoods_model ${params.gatk_ug_genotype_model} -stand_call_conf ${params.gatk_call_conf} --sample_ploidy ${params.gatk_ploidy} -dcov ${params.gatk_downsample} --output_mode ${params.gatk_ug_out_mode} ${defaultbasequalities} $keep_realign bgzip -@ ${task.cpus} ${samplename}.unifiedgenotyper.vcf """ |
2473 2474 2475 2476 | """ gatk HaplotypeCaller --java-options "-Xmx${task.memory.toGiga()}G" -R ${fasta} -I ${bam} -O ${samplename}.haplotypecaller.vcf -stand-call-conf ${params.gatk_call_conf} --sample-ploidy ${params.gatk_ploidy} --output-mode ${params.gatk_hc_out_mode} --emit-ref-confidence ${params.gatk_hc_emitrefconf} bgzip -@ ${task.cpus} ${samplename}.haplotypecaller.vcf """ |
2479 2480 2481 2482 | """ gatk HaplotypeCaller --java-options "-Xmx${task.memory.toGiga()}G" -R ${fasta} -I ${bam} -O ${samplename}.haplotypecaller.vcf --dbsnp ${params.gatk_dbsnp} -stand-call-conf ${params.gatk_call_conf} --sample_ploidy ${params.gatk_ploidy} --output_mode ${params.gatk_hc_out_mode} --emit-ref-confidence ${params.gatk_hc_emitrefconf} bgzip -@ ${task.cpus} ${samplename}.haplotypecaller.vcf """ |
2506 2507 2508 2509 | """ freebayes -f ${fasta} -p ${params.freebayes_p} -C ${params.freebayes_C} ${skip_coverage} ${bam} > ${samplename}.freebayes.vcf bgzip -@ ${task.cpus} ${samplename}.freebayes.vcf """ |
2587 2588 2589 | """ samtools mpileup -B --ignore-RG -q ${map_q} -Q ${base_q} ${use_bed} -f ${fasta} ${bam_list} | pileupCaller ${caller} ${ssmode} ${transitions_mode} --sampleNames ${sample_names} ${use_snp} -e pileupcaller.${strandedness} """ |
2610 2611 2612 | """ eigenstrat_snp_coverage -i pileupcaller.${strandedness} >${strandedness}_eigenstrat_coverage.txt -j ${strandedness}_eigenstrat_coverage_mqc.json """ |
2614 2615 2616 2617 | """ eigenstrat_snp_coverage -i pileupcaller.${strandedness} >${strandedness}_eigenstrat_coverage.txt parse_snp_cov.py ${strandedness}_eigenstrat_coverage.txt """ |
2662 2663 2664 2665 2666 | """ echo ${bam} > bam.filelist mkdir angsd angsd -bam bam.filelist -nThreads ${task.cpus} -GL ${angsd_glmodel} -doGlF ${angsd_glformat} ${angsd_majorminor} ${angsd_fasta} -out ${samplename}.angsd """ |
2689 2690 2691 | """ bcftools stats *.vcf.gz -F ${fasta} > ${samplename}.vcf.stats """ |
2718 2719 2720 2721 2722 2723 | """ pigz -d -f -p ${task.cpus} ${vcf} vcf2genome -Xmx${task.memory.toGiga()}g -draft ${out} -draftname "${fasta_head}" -in ${vcf.baseName} -minc ${params.vcf2genome_minc} -minfreq ${params.vcf2genome_minfreq} -minq ${params.vcf2genome_minq} -ref ${fasta} -refMod ${out}_refmod.fasta -uncertain ${out}_uncertainty.fasta pigz -f -p ${task.cpus} ${out}* bgzip -@ ${task.cpus} *.vcf """ |
2761 2762 2763 2764 2765 2766 | """ pigz -d -f -p ${task.cpus} ${vcf} multivcfanalyzer -Xmx${task.memory.toGiga()}g ${params.snp_eff_results} ${fasta} ${params.reference_gff_annotations} . ${write_freqs} ${params.min_genotype_quality} ${params.min_base_coverage} ${params.min_allele_freq_hom} ${params.min_allele_freq_het} ${params.reference_gff_exclude} *.vcf pigz -p ${task.cpus} *.tsv *.txt snpAlignment.fasta snpAlignmentIncludingRefGenome.fasta fullAlignment.fasta bgzip -@ ${task.cpus} *.vcf """ |
2791 2792 2793 | """ mtnucratio -Xmx${task.memory.toGiga()}g ${bam} "${params.mtnucratio_header}" """ |
2812 2813 2814 | """ mv ${bam} ${bam.baseName}_${strandedness}strand.bam """ |
2836 2837 2838 2839 | """ ls *.bam >> bamlist.txt samtools depth -aa -q30 -Q30 $filter -f bamlist.txt | sexdeterrmine -f bamlist.txt > SexDet.txt """ |
2859 2860 2861 2862 2863 | """ samtools index ${input} angsd -i ${input} -r ${params.contamination_chrom_name}:5000000-154900000 -doCounts 1 -iCounts 1 -minMapQ 30 -minQ 30 -out ${libraryid}.doCounts contamination -a ${libraryid}.doCounts.icnts.gz -h ${projectDir}/assets/angsd_resources/HapMapChrX.gz 2> ${libraryid}.X.contamination.out """ |
2882 2883 2884 | """ print_x_contamination.py ${Contam.join(' ')} """ |
2910 2911 2912 | """ bbduk.sh -Xmx${task.memory.toGiga()}g in=${fastq} threads=${task.cpus} entropymask=f entropy=${params.metagenomic_complexity_entropy} out=${fastq}_lowcomplexityremoved.fq.gz 2> ${fastq}_bbduk.stats """ |
2969 2970 2971 2972 2973 2974 2975 2976 2977 2978 2979 2980 2981 2982 2983 2984 2985 | """ malt-run \ -J-Xmx${task.memory.toGiga()}g \ -t ${task.cpus} \ -v \ -o . \ -d ${db} \ ${sam_out} \ -id ${params.percent_identity} \ -m ${params.malt_mode} \ -at ${params.malt_alignment_mode} \ -top ${params.malt_top_percent} \ ${min_supp} \ -mq ${params.malt_max_queries} \ --memoryMode ${params.malt_memory_mode} \ -i ${fastqs.join(' ')} |&tee malt.log """ |
3015 3016 3017 3018 3019 3020 3021 3022 3023 3024 3025 3026 3027 3028 3029 3030 3031 3032 3033 3034 3035 | """ MaltExtract \ -Xmx${task.memory.toGiga()}g \ -t ${taxon_list} \ -i ${rma6.join(' ')} \ -o results/ \ -r ${ncbifiles} \ -p ${task.cpus} \ -f ${params.maltextract_filter} \ -a ${params.maltextract_toppercent} \ --minPI ${params.maltextract_percentidentity} \ ${destack} \ ${downsam} \ ${dupremo} \ ${matches} \ ${megsum} \ ${topaln} \ ${ss} postprocessing.AMPS.r -r results/ -m ${params.maltextract_filter} -t ${task.cpus} -n ${taxon_list} -j """ |
3052 3053 3054 3055 3056 | """ tar xvzf $ckdb mkdir -p $dbname mv *.k2d $dbname || echo "nothing to do" """ |
3087 3088 3089 3090 | """ kraken2 --db ${krakendb} --threads ${task.cpus} --output $out --report-minimizer-data --report $kreport $fastq cut -f1-3,6-8 $kreport > $kreport_old """ |
3106 3107 3108 | """ kraken_parse.py -c ${params.metagenomic_min_support_reads} -or $read_out -ok $kmer_out $kraken_r """ |
3123 3124 3125 | """ merge_kraken_res.py -or $read_out -ok $kmer_out """ |
3146 3147 3148 | """ markdown_to_html.py $output_docs -o results_description.html """ |
3168 3169 3170 3171 3172 3173 3174 3175 3176 3177 3178 3179 3180 3181 3182 3183 3184 3185 3186 3187 3188 3189 3190 3191 3192 3193 3194 3195 3196 3197 3198 3199 3200 3201 3202 3203 3204 3205 3206 3207 | """ echo $workflow.manifest.version &> v_pipeline.txt echo $workflow.nextflow.version &> v_nextflow.txt fastqc -t ${task.cpus} --version &> v_fastqc.txt 2>&1 || true AdapterRemoval --version &> v_adapterremoval.txt 2>&1 || true fastp --version &> v_fastp.txt 2>&1 || true bwa &> v_bwa.txt 2>&1 || true circulargenerator -Xmx${task.memory.toGiga()}g --help | head -n 1 &> v_circulargenerator.txt 2>&1 || true samtools --version &> v_samtools.txt 2>&1 || true dedup -Xmx${task.memory.toGiga()}g -v &> v_dedup.txt 2>&1 || true ## bioconda recipe of picard is incorrectly set up and extra warning made with stderr, this ugly command ensures only version exported ( exec 7>&1; picard -Xmx${task.memory.toMega()}M MarkDuplicates --version 2>&1 >&7 | grep -v '/' >&2 ) 2> v_markduplicates.txt || true qualimap --version --java-mem-size=${task.memory.toGiga()}G &> v_qualimap.txt 2>&1 || true preseq &> v_preseq.txt 2>&1 || true gatk --java-options "-Xmx${task.memory.toGiga()}G" --version 2>&1 | grep '(GATK)' > v_gatk.txt 2>&1 || true gatk3 -Xmx${task.memory.toGiga()}g --version 2>&1 | head -n 1 > v_gatk3.txt 2>&1 || true freebayes --version &> v_freebayes.txt 2>&1 || true bedtools --version &> v_bedtools.txt 2>&1 || true damageprofiler -Xmx${task.memory.toGiga()}g --version &> v_damageprofiler.txt 2>&1 || true bam --version &> v_bamutil.txt 2>&1 || true pmdtools --version &> v_pmdtools.txt 2>&1 || true angsd -h |& head -n 1 | cut -d ' ' -f3-4 &> v_angsd.txt 2>&1 || true multivcfanalyzer -Xmx${task.memory.toGiga()}g --help | head -n 1 &> v_multivcfanalyzer.txt 2>&1 || true malt-run -J-Xmx${task.memory.toGiga()}g --help |& tail -n 3 | head -n 1 | cut -f 2 -d'(' | cut -f 1 -d ',' &> v_malt.txt 2>&1 || true MaltExtract -Xmx${task.memory.toGiga()}g --help | head -n 2 | tail -n 1 &> v_maltextract.txt 2>&1 || true multiqc --version &> v_multiqc.txt 2>&1 || true vcf2genome -Xmx${task.memory.toGiga()}g -h |& head -n 1 &> v_vcf2genome.txt || true mtnucratio -Xmx${task.memory.toGiga()}g --help &> v_mtnucratiocalculator.txt || true sexdeterrmine --version &> v_sexdeterrmine.txt || true kraken2 --version | head -n 1 &> v_kraken.txt || true endorS.py --version &> v_endorSpy.txt || true pileupCaller --version &> v_sequencetools.txt 2>&1 || true bowtie2 --version | grep -a 'bowtie2-.* -fdebug' > v_bowtie2.txt || true eigenstrat_snp_coverage --version | cut -d ' ' -f2 >v_eigenstrat_snp_coverage.txt || true mapDamage --version > v_mapdamage.txt || true bbversion.sh > v_bbduk.txt || true bcftools --version | grep 'bcftools' | cut -d ' ' -f 2 > v_bcftools.txt || true scrape_software_versions.py &> software_versions_mqc.yaml """ |
3263 3264 3265 | """ multiqc -f $rtitle $rfilename $multiqc_config $custom_config_file . """ |
Support
-
https://nf-co.re/eager
-
10.7717/peerj.10947
-
10.1186/s13059-016-0918-z
-
https://doi.org/10.1186/s13104-016-1900-2
-
10.1093/bioinformatics/btp324
-
10.1093/bioinformatics/btp352
-
https://doi.org/10.1093/bioinformatics/btab190
-
https://doi.org/10.1093/bioinformatics/btv566
-
https://doi.org/10.1038/nmeth.2375
-
10.1073/pnas.1318934111
-
10.1093/bioinformatics/btw354
-
10.1101/gr.176552.114
-
10.1093/bioinformatics/bty560
-
https://doi.org/10.1038/ng.806.
-
10.1186/s13059-019-1891-0
-
https://doi.org/10.1093/bioinformatics/btt193
-
https://doi.org/10.1038/s41598-017-17723-1
-
https://doi.org/10.1038/ncomms6257
-
https://doi.org/10.1073/pnas.1710186114
-
https://doi.org/10.1038/s41586-018-0094-2
- Future updates
Related Workflows





