Harmonization of AMR predictor tool outputs
Help improve this workflow!
This workflow has been published but could be further improved with some additional meta data:- Keyword(s) in categories input, output, operation, topic
You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .
Description
hAMRonization is a project aiming at the harmonization of output file formats of antimicrobial resistance detection tools. This is a workflow acting as a proof of concept test-case for the hAMRonization parsers.
Specifically, this runs a set of AMR gene detection tools against a set of contigs/reads and uses
hAMRonization
to collate the results in a single unified report.
The following tools are currently included:
-
abricate
-
AMRFinderPlus
-
ariba
-
Groot
-
RGI (for complete and draft genomes)
-
RGI BWT (for metagenomes)
-
staramr
-
resfams
-
staramr
-
Resfinder
-
sraX
-
DeepARG (requires singularity)
-
CSSTAR
-
AMRplusplus
-
SRST2
-
KmerResistance
Excluded tools:
-
mykrobe (needs variant specification to be parseable)
-
pointfinder (needs variant specification to be parseable)
-
SEAR, ARG-ANNOT (no longer downloadable)
-
RAST/PATRIC (not easily runnable on CLI)
-
Single organism/or resistance tools (e.g. Kleborate, LREfinder, SSCmec Finder, U-CARE, ARGO)
-
shortBRED, ARGS-OAP (rely on usearch which isn't open-source)
Installation
First clone this repository:
git clone https://github.com/pha4ge/hAMRonization_workflow
This pipeline depends on snakemake, conda, build-essentials, git, zlib-dev, and unzip. If you have conda installed, please run:
conda env create -n hamronization_workflow --file envs/hamronization_workflow.yaml
and
conda activate hamronization_workflow
All further dependencies will be installed via conda on execution.
If you want to run
DeepARG
you need to have a working
singularity
install on your system and invoke
--use-singularity --singularity-args "-B $PWD:/data"
when running snakemake (otherwise comment out this input to the cleanup rule in the
Snakefile
).
Running
To execute the pipeline, navigate to the cloned repository, edit the config (
config/config.yaml
) and input details (
config/isolate_list.txt
) for your purposes.
Execute the following substitution the value for
--jobs
as needed:
snakemake --configfile config/config.yaml --use-conda --conda-frontend mamba --jobs 2 --use-singularity --singularity-args "-B $PWD:/data"
Testing
To test the pipeline follow the above installation instructions and execute on the test data set:
snakemake --configfile config/test_config.yaml --use-conda --conda-frontend mamba --jobs 1 --use-singularity --singularity-args "-B $PWD:/data"
Docker
Alternatively, the workflow can be run using docker. Given the collective quirks of the bundled tools this will probably be easier for most users.
Unfortunately, deeparg is only really runnable as a container, and snakemake uses singularity, the docker version has to be run in a privileged manner i.e.
docker run --privileged
.
If you are unable to run docker in privileged mode then you can just comment out the deeparg target in the main
Snakefile
(
expand("results/{sample}/deeparg/output.mapping.ARG", sample=samples.index),
).
First get the docker container:
docker pull finlaymaguire/hamronization:1.0.1
You can execute it in a couple of ways but the easiest is to just mount the folder containing your reads and running it interactively:
docker run -it --privileged -v $HOST_FOLDER_CONTAINING_ISOLATES:/data finlaymaguire/hamronization:1.0.1 /bin/bash
If our isolate data is in
~/isolates
the command to interactively run this container and get a bash terminal would be:
docker run -it --privileged -v ~/isolates:/data finlaymaguire/hamronization:1.0.1 /bin/bash
Then point your
sample_table.tsv
to that folder, entries for this example would be:
species biosample assembly read1 read2
Mycobacterium tuberculosis SAMN02599008 /data/SAMN02599008/GCF_000662585.1.fna /data/SAMN02599008/SRR1180160_R1.fastq.gz /data/SAMN02599008/SRR1180160_R2.fastq.gz
Mycobacterium tuberculosis SAMN02599009 /data/SAMN02599009/GCF_000662586.1.fna /data/SAMN02599009/SRR1180161_R1.fastq.gz /data/SAMN02599009/SRR1180161_R2.fastq.gz
Then specify your
config.yaml
to use this
sample_table.tsv
and execute the pipeline from bash in the container by activating the top-level environment:
conda activate hamronization_workflow
Then the workflow:
snakemake --configfile config/config.yaml --use-conda --cores 6 --use-singularity --singularity-args "-B $PWD:/data"
WARNING
You will have to extract your results folder (e.g.
cp results /data
for the example mounted volume) from the container if you wish to use them elsewhere.
Note: kma/kmerresistance fails without explanation in the container (possibly zlib related, although adding the zlib headers didn't solve this). It is commented out for now.
Initial Run
Run Data
Following datasets are currently used for result file generation:
organism Biosample Assembly Run
Salmonella enterica SAMN13012778 GCA_009009245.1 SRR10258315
Salmonella enterica SAMN13064234 GCA_009239915.1 SRR10313698
Salmonella enterica SAMN10872197 GCA_007657735.1 SRR8528923
Salmonella enterica SAMN13064249 GCA_009239785.1 SRR10313716
Salmonella enterica SAMN07255713 GCA_009439415.1 SRR5921214
Salmonella enterica SAMN03098832 GCA_006629605.1 SRR1616829
Klebsiella pneumoniae SAMN02927805 GCA_004302785.1 SRR1561295
Salmonella enterica SAMEA6058467 GCA_009625195.1 ERR3581801
E. coli SAMN05980528 GCA_004268245.1 SRR4897319
Mycobacterium tuberculosis SAMN02599008 GCA_000662585.1 SRR1182980 SRR1180160
Mycobacterium tuberculosis SAMN02599179 GCA_000665745.1 SRR1172848 SRR1172873
Mycobacterium tuberculosis SAMN02599095 GCA_000706105.1 SRR1173728 SRR1173217
Mycobacterium tuberculosis SAMN02599061 GCA_000663625.1 SRR1175151 SRR1172938
Mycobacterium tuberculosis SAMN02598983 GCA_000654735.1 SRR1174279 SRR1173257
Links to data and corresponding metadata need to be stored in a tab separated sample sheet with the following columns:
species biosample assembly reads read1 read2
Results
The results generated on the aforementioned datasets can be retrieved here .
Contact
Please consult the PHA4GE project website for questions.
For technical questions, please feel free to consult:
-
Finlay Maguire <finlaymaguire (at) gmail.com>
-
Simon H. Tausch <Simon.Tausch (at) bfr.bund.de>
Code Snippets
18 19 20 21 22 23 | shell: """ abricate --threads {threads} --nopath --db {params.dbname} --minid {params.minid} --mincov {params.mincov} {input.contigs} > {output.report} 2> {log} abricate --version | perl -p -e 's/abricate (.+)/--analysis_software_version $1/' > {output.metadata} abricate --list | grep {params.dbname} | perl -p -e 's/.+?\t.+?\t.+?\t(.+)/--reference_database_version $1/' >> {output.metadata} """ |
33 34 35 36 | shell: """ hamronize abricate $(paste - - < {input.metadata}) {input.report} > {output} """ |
10 11 | shell: "amrfinder_update -d {params.db_dir} 2> {log}" |
30 31 32 33 34 35 36 | shell: """ amrfinder -n {input.contigs} -o {output.report} -d {params.db}/latest >{log} 2>&1 rm -rf {params.output_tmp_dir} amrfinder --version | perl -p -e 's/(.+)/--analysis_software_version $1/' > {output.metadata} cat {input.dbversion} | perl -p -e 's/(.+)/--reference_database_version $1/' >> {output.metadata} """ |
47 48 49 50 | shell: """ hamronize amrfinderplus --input_file_name {input.contigs} $(paste - - < {input.metadata}) {input.report} > {output} """ |
9 10 11 12 13 14 15 16 | shell: """ mkdir -p {params.db_dir} wget -O {output.megares_db} http://megares.meglab.org/download/megares_v2.00/megares_full_database_v2.00.fasta wget -O {output.megares_annot} http://megares.meglab.org/download/megares_v2.00/megares_full_annotations_v2.00.csv cd {params.db_dir} bwa index megares_full_database_v2.00.fasta """ |
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | shell: """ cd {params.bin_dir} git clone https://github.com/cdeanj/snpfinder cd snpfinder git checkout {params.snpfinder_version} make cd .. git clone https://github.com/cdeanj/rarefactionanalyzer cd rarefactionanalyzer git checkout {params.rarefaction_analyzer_version} make cd .. git clone https://github.com/cdeanj/resistomeanalyzer cd resistomeanalyzer git checkout {params.resistome_analyzer_version} make """ |
74 75 76 77 78 79 80 81 82 83 84 85 86 | shell: """ mkdir -p {params.output_prefix_tmp} trimmomatic PE {input.read1} {input.read2} {params.output_prefix_tmp}/{wildcards.sample}_r1_pe_trimmed.fq {params.output_prefix_tmp}/{wildcards.sample}_r1_se_trimmed.fq {params.output_prefix_tmp}/{wildcards.sample}_r2_pe_trimmed.fq {params.output_prefix_tmp}/{wildcards.sample}_r2_se_trimmed.fq SLIDINGWINDOW:4:15 LEADING:3 TRAILING:3 MINLEN:36 >{log} 2>&1 bwa mem {input.megares_db} {params.output_prefix_tmp}/{wildcards.sample}_r1_pe_trimmed.fq {params.output_prefix_tmp}/{wildcards.sample}_r2_pe_trimmed.fq 2>> {log} | samtools sort -n -O sam > {params.output_prefix_tmp}/{wildcards.sample}.sam 2>>{log} {input.resistome_tool} -ref_fp {input.megares_db} -annot_fp {input.megares_annot} -sam_fp {params.output_prefix_tmp}/{wildcards.sample}.sam -gene_fp {output.amr_gene} -group_fp {output.amr_group} -class_fp {output.amr_class} -mech_fp {output.amr_mech} -t 80 >>{log} 2>&1 {input.rarefaction_tool} -ref_fp {input.megares_db} -annot_fp {input.megares_annot} -sam_fp {params.output_prefix_tmp}/{wildcards.sample}.sam -gene_fp {output.amr_gene}_rare -group_fp {output.amr_group}_rare -class_fp {output.amr_class}_rare -mech_fp {output.amr_mech}_rare -min 5 -max 100 -skip 5 -samples 1 -t 80 >>{log} 2>&1 {input.snp_tool} -amr_fp {input.megares_db} -sampe {params.output_prefix_tmp}/{wildcards.sample}.sam -out_fp {output.amr_snps} >>{log} 2>&1 #rm -rf {params.output_prefix_tmp} echo "--analysis_software_version {params.resistome_analyzer_version}" > {output.metadata} echo "--reference_database_version v2.00" >> {output.metadata} """ |
97 98 99 100 | shell: """ hamronize amrplusplus $(paste - - < {input.metadata}) --input_file_name {input.read1} {input.amr_gene} > {output} """ |
12 13 14 15 16 17 | shell: """ ariba getref card {params.db_dir}/ariba_card > {log} ariba prepareref -f {params.db_dir}/ariba_card.fa -m {params.db_dir}/ariba_card.tsv {output.db} >> {log} date +"{params.dateformat}" > {output.dbversion} """ |
37 38 39 40 41 42 43 44 | shell: """ mkdir -p {params.tmp_dir} ariba run --noclean --force --tmp_dir {params.tmp_dir} --threads {threads} {input.ref_db} {input.read1} {input.read2} {params.output_folder} > {log} 2>&1 rm -rf {params.tmp_dir} ariba version | grep "ARIBA version" | perl -p -e 's/ARIBA version: (.+)/--analysis_software_version $1/' > {output.metadata} cat {input.dbversion} | perl -p -e 's/(.+)/--reference_database_version $1/' >> {output.metadata} """ |
55 56 57 58 | shell: """ hamronize ariba --input_file_name {input.read1} --reference_database_id CARD $(paste - - < {input.metadata}) {input.report} > {output} """ |
6 7 8 9 10 | shell: """ cd {params.bin_dir} git clone https://github.com/chrisgulvik/c-SSTAR """ |
19 20 21 22 23 | shell: """ wget -O {output.dbfile} {params.db_source} date +"{params.dateformat}" > {output.dbversion} """ |
44 45 46 47 48 49 | shell: """ {input.csstar} -g {input.contigs} -d {input.resgannot_db} --outdir {params.outdir} > {output.report} 2>{log} grep "c-SSTAR version" {params.logfile} | perl -p -e 's/.+c-SSTAR version: (.+)/--analysis_software_version $1/' > {output.metadata} cat {input.dbversion} | perl -p -e 's/(.+)/--reference_database_version $1/' >> {output.metadata} """ |
60 61 62 63 | shell: """ hamronize csstar --input_file_name {input.contigs} --reference_database_id ResGANNOT $(paste - - < {input.metadata}) {input.report} > {output} """ |
14 15 16 17 18 19 20 | shell: """ python /deeparg/deepARG.py --align --type nucl --reads --input /data/results/{wildcards.sample}/deeparg/reads.fasta --output /data/results/{wildcards.sample}/deeparg/output > {log} 2>&1 rm /data/results/{wildcards.sample}/deeparg/reads.fasta echo "--analysis_software_version {params.version}" > {output.metadata} echo "--reference_database_version {params.version}" >> {output.metadata} """ |
28 29 | shell: "zcat {input.read1} {input.read2} > {output.fasta_reads}" |
40 41 42 43 | shell: """ hamronize deeparg --input_file_name {input.read1} $(paste - - < {input.metadata}) {input.report} > {output} """ |
14 15 16 17 18 19 | shell: """ rm -rf {params.db_dir}/groot_clustered groot get -d {params.db_source} -o {params.db_dir}/groot_clustered groot index -p {threads} -m {params.db_dir}/groot_clustered/{params.db_source}.90 -i {output.db} -w {params.read_length} --log {log} """ |
40 41 42 43 44 | shell: """ zcat {input.read1} {input.read2} | seqkit seq --min-len {params.min_read_length} --max-len {params.max_read_length} | groot align -g {params.graph_dir} -p {threads} -i {input.db_index} --log {log} | groot report --log {log} > {output.report} groot version | perl -p -e 's/(.+)/--analysis_software_version $1/' > {output.metadata} """ |
58 59 60 61 | shell: """ hamronize groot --input_file_name {input.read1} $(paste - < {input.metadata}) --reference_database_id {params.db_source} --reference_database_version $(paste - < {params.db_dir}/groot_clustered/card.90/timestamp.txt) {input.report} > {output} """ |
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | shell: """ # proper database is downloaded like this but is 20G and downloads # from the DTU FTP very slowly, so not going to support this feature # for now and just use a single type klebsiella genome for now #pushd {params.db_dir} #git clone https://bitbucket.org/genomicepidemiology/kmerfinder_db.git #cd kmerfinder_db #export KmerFinder_DB=$(pwd) #bash INSTALL.sh $KmerFinder_DB bacteria latest curl https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/240/185/GCF_000240185.1_ASM24018v2/GCF_000240185.1_ASM24018v2_genomic.fna.gz | gunzip > {params.db_dir}/klebsiella_type_genome.fasta kma index -i {params.db_dir}/klebsiella_type_genome.fasta -o {params.species_db} -Sparse ATG curl https://bitbucket.org/genomicepidemiology/resfinder_db/get/{params.db_version}.zip --output {params.db_dir}.zip mkdir -p {params.db_dir}/resfinder unzip -j -d {params.db_dir}/resfinder {params.db_dir}.zip cat {params.db_dir}/resfinder/*.fsa > {params.db_dir}/resfinder.fsa kma index -i {params.db_dir}/resfinder.fsa -o {params.kma_resfinder_db} """ |
55 56 57 58 59 60 61 62 | shell: """ zcat {input.read1} {input.read2} > {params.output_folder}/temp_all_reads.fq kmerresistance -i {params.output_folder}/temp_all_reads.fq -t_db {params.kma_resfinder_db} -s_db {params.species_db} -o {params.output_folder}/results > {log} 2>&1 rm {params.output_folder}/temp_all_reads.fq kmerresistance -v 2>&1 | perl -p -e 's/KmerResistance-(.+)/--analysis_software_version $1/' > {output.metadata} echo "{params.db_version}" | perl -p -e 's/(.+)/--reference_database_version $1/' >> {output.metadata} """ |
73 74 75 76 | shell: """ hamronize kmerresistance --input_file_name {input.read1} $(paste - - < {input.metadata}) {input.report} > {output} """ |
4 5 6 7 | shell: """ git clone https://bitbucket.org/genomicepidemiology/pointfinder_db {output.pointfinder_db} """ |
14 15 16 17 18 19 20 21 22 | shell: """ cd {params.binary_dir} git clone https://bitbucket.org/genomicepidemiology/pointfinder.git --recursive # tidy up shebang sed -i "s|env python3$|env python|" pointfinder/PointFinder.py chmod +x pointfinder/PointFinder.py """ |
42 43 44 45 46 47 | shell: """ python {input.pointfinder_script} -i {input.contigs} -p {input.pointfinder_db} -s {params.species} -m blastn -m_p $(which blastn) -o results/{wildcards.sample}/pointfinder > {log} 2>&1 cp {params.output_dir}/*_blastn_results.tsv {output.report} rm -rf {params.output_tmp_dir} """ |
7 8 9 10 11 | shell: """ curl http://dantaslab.wustl.edu/resfams/Resfams-full.hmm.gz | gunzip > {output.resfams_hmms} date +"{params.dateformat}" > {output.dbversion} """ |
30 31 32 33 34 35 36 | shell: """ prodigal -p meta -i {input.contigs} -a {params.output_prefix}/protein_seqs.faa > {log} 2>&1 hmmsearch --cpu {threads} --tblout {output.report} {input.resfams_hmms} {params.output_prefix}/protein_seqs.faa >>{log} 2>&1 hmmsearch -h | grep "# HMMER " | perl -p -e 's/# HMMER (.+) \(.+/--analysis_software_version hmmsearch_v$1/' >> {output.metadata} cat {input.dbversion} | perl -p -e 's/(.+)/--reference_database_version $1/' >> {output.metadata} """ |
47 48 49 50 | shell: """ hamronize resfams --input_file_name {input.contigs} $(paste - - < {input.metadata}) {input.report} > {output} """ |
9 10 11 12 13 | shell: """ curl https://bitbucket.org/genomicepidemiology/resfinder_db/get/{params.db_version}.zip --output {params.db_dir}.zip unzip -j -d {output.resfinder_db} {params.db_dir}.zip """ |
34 35 36 37 38 39 40 41 | shell: """ mkdir -p {params.outdir} resfinder.py -p {input.resfinder_db} -i {input.contigs} -o {params.outdir} > {log} 2>&1 rm -rf {params.output_tmp_dir} grep "resfinder=" {params.conda_env} | perl -p -e 's/ - resfinder=(.+)/--analysis_software_version $1/' > {output.metadata} echo "{params.db_version}" | perl -p -e 's/(.+)/--reference_database_version $1/' >> {output.metadata} """ |
52 53 54 55 | shell: """ hamronize resfinder $(paste - - < {input.metadata}) {input.report} > {output} """ |
9 10 11 12 13 14 | shell: """ mkdir -p {params.db_dir} curl https://card.mcmaster.ca/latest/data --output {params.db_dir}/card.tar.bz2 tar -C {params.db_dir} -xvf {params.db_dir}/card.tar.bz2 """ |
33 34 35 36 37 38 39 40 41 42 | shell: """ rgi card_annotation --input {input.card_db_bwt} > {log} 2>&1 rgi load --card_json {input.card_db_bwt} --card_annotation card_database_v*.fasta >> {log} 2>&1 rm card_database_v*.fasta rgi bwt --read_one {input.read1} --read_two {input.read2} --output_file {params.output_prefix} --aligner bwa --threads {threads} >>{log} 2>&1 echo "--analysis_software_version $(rgi main --version)" > {output.metadata} echo "--reference_database_version $(rgi database --version)" >> {output.metadata} """ |
53 54 55 56 | shell: """ hamronize rgi $(paste - - < {input.metadata}) --input_file_name {input.read1} {input.report} > {output} """ |
9 10 11 12 13 14 | shell: """ mkdir -p {params.db_dir} curl https://card.mcmaster.ca/latest/data --output {params.db_dir}/card.tar.bz2 tar -C {params.db_dir} -xvf {params.db_dir}/card.tar.bz2 """ |
32 33 34 35 36 37 38 39 | shell: """ rgi load --card_json {input.card_db} > {log} 2>&1 rgi main --input_sequence {input.contigs} --output_file {params.output_prefix} --clean --num_threads {threads} >>{log} 2>&1 echo "--analysis_software_version $(rgi main --version)" > {output.metadata} echo "--reference_database_version $(rgi database --version)" >> {output.metadata} """ |
50 51 52 53 | shell: """ hamronize rgi $(paste - - < {input.metadata}) --input_file_name {input.contigs} {input.report} > {output} """ |
23 24 25 26 27 28 29 | shell: """ sraX -i {input.genome_dir} -t 4 -db {params.dbtype} -o {params.outdir} > {log} 2>&1 mv {params.result_output_dir}/* {params.outdir} sraX --version | grep version | perl -p -e 's/.+version: (.+)/--analysis_software_version $1/' > {output.metadata} date +"{params.dateformat}" | perl -p -e 's/(.+)/--reference_database_version $1/' >> {output.metadata} """ |
43 44 45 46 | shell: """ hamronize srax --input_file_name {input.contigs} $(paste - - < {input.metadata}) --reference_database_id srax_{params.dbtype}_amr_db {input.report} > {output} """ |
10 11 12 13 14 | shell: """ curl {params.db_source} --output {output.db_file} date +"{params.dateformat}" > {output.dbversion} """ |
39 40 41 42 43 44 | shell: """ srst2 --threads {threads} --gene_db {params.gene_db} --forward {params.for_suffix} --reverse {params.rev_suffix} --input_pe {input.read1} {input.read2} --min_depth {params.min_depth} --output {params.output_prefix} > {log} 2>&1 srst2 --version 2>&1 | perl -p -e 's/srst2 (.+)/--analysis_software_version $1/' > {output.metadata} cat {input.dbversion} | perl -p -e 's/(.+)/--reference_database_version $1/' >> {output.metadata} """ |
56 57 58 59 | shell: """ hamronize srst2 --input_file_name {input.read1} $(paste - - - < {input.metadata}) {input.report} > {output} """ |
17 18 19 20 21 22 23 | shell: """ rm -r {params.output_folder}; staramr search -o {params.output_folder} --nproc {threads} {input.contigs} >{log} 2>&1 staramr --version | perl -p -e 's/staramr (.+)/--analysis_software_version $1/' > {output.metadata} grep "resfinder_db_commit" {params.settings} | perl -p -e 's/.+= (.+)/--reference_database_version $1/' >> {output.metadata} """ |
34 35 36 37 | shell: """ hamronize staramr $(paste - - < {input.metadata}) {input.report} > {output} """ |
42 43 44 45 | shell: """ hamronize summarize -t interactive -o {output} {input} """ |
71 72 73 74 | shell: """ hamronize summarize -o {output} -t tsv {input} """ |
Support
- Future updates
Related Workflows





