Help improve this workflow!
This workflow has been published but could be further improved with some additional meta data:- Keyword(s) in categories input, output, operation, topic
You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .
This is the Nextstrain build for MPXV (mpox virus). Output from this build is visible at nextstrain.org/monkeypox . The lineages within the recent mpox outbreaks in humans are defined in a separate lineage-designation repository .
Usage
Provision input data
Input sequences and metadata can be retrieved from data.nextstrain.org
Note that these data are generously shared by many labs around the world. If you analyze and plan to publish using these data, please contact these labs first.
Within the analysis pipeline, these data are fetched from data.nextstrain.org and written to
data/
with:
nextstrain build --docker . data/sequences.fasta data/metadata.tsv
Run analysis pipeline
Run pipeline to produce "overview" tree for
/monkeypox/mpxv
with:
nextstrain build --docker --cpus 1 . --configfile config/config_mpxv.yaml
Run pipeline to produce "outbreak" tree for
/monkeypox/hmpxv1
with:
nextstrain build --docker --cpus 1 . --configfile config/config_hmpxv1.yaml
Adjust the number of CPUs to what your machine has available if you want to perform alignment and tree building a bit faster.
Deploying
Run the python script
scripts/deploy.py
to deploy the staging build to production.
This will also automatically create a dated build where each node has a unique (random) ID so it can be targeted in shared links/narratives.
python scripts/deploy.py --build-names hmpxv1 mpxv
If a dated build already exists it is not overwritten by default. To overwrite, pass
-f
.
To deploy a locally built build to staging, use the
--staging
flag.
To not deploy a dated build to production, add the
--no-dated
flag.
Visualize results
View results with:
nextstrain view auspice/
Configuration
Configuration takes place in
config/config.yml
by default.
The analysis pipeline is contained in
workflow/snakemake_rule/core.smk
.
This can be read top-to-bottom, each rule specifies its file inputs and output and pulls its parameters from
config
.
There is little redirection and each rule should be able to be reasoned with on its own.
Data use
We gratefully acknowledge the authors, originating and submitting laboratories of the genetic sequences and metadata for sharing their work. Please note that although data generators have generously shared data in an open fashion, that does not mean there should be free license to publish on this data. Data generators should be cited where possible and collaborations should be sought in some circumstances. Please try to avoid scooping someone else's work. Reach out if uncertain.
Installation
Follow the standard installation instructions for Nextstrain's suite of software tools.
If you don't use the
nextstrain
CLI but a custom conda environment, make sure that you have
tsv-utils
and
seqkit
installed, e.g. using:
conda install -c bioconda tsv-utils seqkit
Nextstrain build vs Snakemake
The above commands use the Nextstrain CLI and
nextstrain build
along with Docker to run using Nextalign v2.
Alternatively, if you
install Nextalign/Nextclade v2 locally
you can run the pipeline with:
snakemake -j 1 -p --configfile config/config_mpxv.yaml
snakemake -j 1 -p --configfile config/config_hmpxv1.yaml
Update colors to include new countries
Update
colors_hmpxv1.tsv
to group countries by region based on countries present in its
metadata.tsv
:
python3 scripts/update_colours.py --colors config/colors_hmpxv1.tsv \
--metadata results/hmpxv1/metadata.tsv --output config/colors_hmpxv1.tsv
and similarly update
colors_mpxv.tsv
:
python3 scripts/update_colours.py --colors config/colors_mpxv.tsv \
--metadata results/mpxv/metadata.tsv --output config/colors_mpxv.tsv
Update example data
Example data is used by CI . It can also be used as a small subset of real-world data.
Example data should be updated every time metadata schema is changed or a new clade/lineage emerges. To update, run:
nextstrain build --docker . update_example_data -F
Code Snippets
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 | import argparse import pdb import pandas as pd # Forced colours MUST NOT appear in the ordering TSV forced_colors = { } if __name__ == '__main__': parser = argparse.ArgumentParser( description="Assign colors based on ordering", formatter_class=argparse.ArgumentDefaultsHelpFormatter ) parser.add_argument('--ordering', type=str, required=True, help="input ordering file") parser.add_argument('--color-schemes', type=str, required=True, help="input color schemes file") parser.add_argument('--metadata', type=str, help="if provided, restrict colors to only those found in metadata") parser.add_argument('--output', type=str, required=True, help="output colors tsv") args = parser.parse_args() assignment = {} with open(args.ordering) as f: for line in f.readlines(): array = line.lstrip().rstrip().split("\t") if len(array) == 2: name = array[0] trait = array[1] if name not in assignment: assignment[name] = [trait] else: assignment[name].append(trait) # if metadata supplied, go through and # 1. remove assignments that don't exist in metadata # 2. remove assignments that have 'focal' set to 'False' in metadata if args.metadata: metadata = pd.read_csv(args.metadata, delimiter='\t') for name, trait in assignment.items(): # Items not to exclude if not (yet) present in metadata to solve bootstrapping issue if name in metadata and name not in ['clade_membership', 'outbreak', 'lineage']: subset_present = [x for x in assignment[name] if x in metadata[name].unique()] assignment[name] = subset_present if name in metadata and 'focal' in metadata: focal_list = metadata.loc[metadata['focal'] == True, name].unique() subset_focal = [x for x in assignment[name] if x in focal_list] assignment[name] = subset_focal schemes = {} counter = 0 with open(args.color_schemes) as f: for line in f.readlines(): counter += 1 array = line.lstrip().rstrip().split("\t") schemes[counter] = array with open(args.output, 'w') as f: for trait_name, trait_array in assignment.items(): if len(trait_array)==0: print(f"No traits found for {trait_name}") continue if len(schemes)<len(trait_array): print(f"WARNING: insufficient colours available for trait {trait_name} - reusing colours!") remain = len(trait_array) color_array = [] while(remain>0): if (remain>len(schemes)): color_array = [*color_array, *schemes[len(schemes)]] remain -= len(schemes) else: color_array = [*color_array, *schemes[remain]] remain = 0 else: color_array = schemes[len(trait_array)] extra_trait_values = list(forced_colors.get(trait_name, {}).keys()) extra_color_values = list(forced_colors.get(trait_name, {}).values()) zipped = list(zip(trait_array+extra_trait_values, color_array+extra_color_values)) for trait_value, color in zipped: f.write(trait_name + "\t" + trait_value + "\t" + color + "\n") f.write("\n") |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 | import json, argparse if __name__ == "__main__": parser = argparse.ArgumentParser( description="remove time info", formatter_class=argparse.ArgumentDefaultsHelpFormatter, ) parser.add_argument( "--input-node-data", type=str, required=True, help="input data" ) parser.add_argument( "--output-node-data", type=str, metavar="JSON", required=True, help="output Auspice JSON", ) args = parser.parse_args() with open(args.input_node_data) as fh: data = json.load(fh) new_node_data = {} for name, node in data["nodes"].items(): old_clade_name = node["clade_membership"] outbreak_name = "" lineage_name = "" # if it starts with clade -> it's a clade # if it starts with outbreak -> it's outbreak, need to look up clade # if it starts with lineage -> it's clade IIb, outbreak hMPXV-1 if old_clade_name.startswith("clade"): clade_name = old_clade_name.split()[1] # Need to set up clade dictionary for when we have other outbreaks # if old_clade_name.startswith('outbreak'): # outbreak_name = old_clade_name.split()[1] # clade_name = clade[outbreak_name] elif old_clade_name.startswith("outgroup"): clade_name = "outgroup" else: clade_name = "IIb" outbreak_name = "hMPXV-1" lineage_name = old_clade_name new_node_data[name] = { "clade_membership": clade_name, "outbreak": outbreak_name, "lineage": lineage_name, } if "clade_annotation" in node: new_node_data[name]["clade_annotation"] = node["clade_annotation"] if node["clade_annotation"] == "A": new_node_data[name]["clade_annotation"] = "hMPXV-1 A" data["nodes"] = new_node_data with open(args.output_node_data, "w") as fh: json.dump(data, fh) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 | import argparse from datetime import datetime from augur.io import read_metadata import json ## Script originally from https://github.com/nextstrain/ncov/blob/master/scripts/construct-recency-from-submission-date.py def get_recency(date_str, ref_date): date_submitted = datetime.strptime(date_str, '%Y-%m-%d').toordinal() ref_day = ref_date.toordinal() delta_days = ref_day - date_submitted if delta_days<=0: return 'New' elif delta_days<3: return '1-2 days ago' elif delta_days<8: return '3-7 days ago' elif delta_days<15: return 'One week ago' elif delta_days<31: return 'One month ago' elif delta_days < 121: return '1-3 months ago' elif delta_days < 365: return '3-12 months ago' elif delta_days < 365*4: return '1-3 years ago' elif delta_days < 365*16: return '3-15 years ago' elif delta_days>=31: return 'Older than 15 years' if __name__ == '__main__': parser = argparse.ArgumentParser( description="Assign each sequence a field that specifies when it was added", formatter_class=argparse.ArgumentDefaultsHelpFormatter ) parser.add_argument('--metadata', type=str, required=True, help="metadata file") parser.add_argument('--metadata-id-columns', nargs="+", help="names of possible metadata columns containing identifier information, ordered by priority. Only one ID column will be inferred.") parser.add_argument('--output', type=str, required=True, help="output json") args = parser.parse_args() meta = read_metadata(args.metadata, id_columns=args.metadata_id_columns).to_dict(orient="index") node_data = {'nodes':{}} ref_date = datetime.now() for strain, d in meta.items(): if 'date_submitted' in d and d['date_submitted'] and d['date_submitted'] != "undefined": node_data['nodes'][strain] = {'recency': get_recency(d['date_submitted'], ref_date)} with open(args.output, 'wt') as fh: json.dump(node_data, fh) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 | from collections import defaultdict import argparse from treetime import TreeAnc from Bio import Phylo if __name__=="__main__": parser = argparse.ArgumentParser( description="remove time info", formatter_class=argparse.ArgumentDefaultsHelpFormatter ) parser.add_argument('--alignment', type=str, required=True, help="input sequences") parser.add_argument('--input-tree', type=str, required=True, help="input nwk") parser.add_argument('--root', type=str, required=False, help="root node") parser.add_argument('--output', type=str, required=True, help="output nwk") args = parser.parse_args() T = Phylo.read(args.input_tree, 'newick') if args.root: T.root_with_outgroup(args.root) else: T.root_at_midpoint() tt = TreeAnc(tree=T, aln=args.alignment, gtr='JC69') tt.optimize_tree(prune_short=True) # make list of mutations that are phylogenetically informative (not gaps of N) for n in T.find_clades(): n.relevant_mutations = set() for mut in n.mutations: if (mut[0] in 'ACGT') and (mut[2] in 'ACGT'): n.relevant_mutations.add(mut) print(f"### Checking for immediate reversions\n") reversions = list() for clade in T.find_clades(): for child in clade.clades: if child.is_terminal(): continue for grandchild in child.clades: if grandchild.is_terminal(): continue # Check if one of grandchild mutation reverts one of child for mut_child in child.relevant_mutations: for mut_grandchild in grandchild.relevant_mutations: if mut_child[1] == mut_grandchild[1] and mut_child[2] == mut_grandchild[0]: reversions.append( { "parent": clade, "child": child, "grandchild": grandchild, "mut_child": mut_child, "mut_grandchild": mut_grandchild } ) print(f"Below {clade}: {mut_child} in {child.name} reverted in {grandchild.name}") for reversion in reversions: # Remove reversion from grandchild reversion["grandchild"].relevant_mutations.remove(reversion["mut_grandchild"]) # Remove grandchild from child reversion["child"].clades.remove(reversion["grandchild"]) # If there are mutations, add grandchild as child of parent if reversion["grandchild"].relevant_mutations != reversion["parent"].relevant_mutations: reversion["parent"].clades.append(reversion["grandchild"]) else: # Otherwise add grandchild clades to parent reversion["parent"].clades.extend(reversion["grandchild"].clades) # find mutations that occur multiple times in branches leading to children of a node. # use these mutations to group clades to merge later. max_iter = 5 for ii in range(max_iter): print(f"###\nIteration: {ii+1}\n") nodes_to_merge = defaultdict(list) for n in T.get_nonterminals(): shared_mutations = defaultdict(list) for c in n: for mut in c.relevant_mutations: shared_mutations[mut].append(c) for mut in shared_mutations: if len(shared_mutations[mut])>1: nodes_to_merge[(n,tuple(shared_mutations[mut]))].append(mut) if len(nodes_to_merge)==0: print("No more shared mutations -- breaking out of loop.") break already_touched = set() for (parent, children), mutations in sorted(nodes_to_merge.items(), key=lambda x:len(x[1]), reverse=True): if any([c in already_touched for c in children]): continue print("####\nmerging clades:\n\t", '\n\t'.join([f"{c.name} with mutations {c.relevant_mutations} and {c.count_terminals()} tips" for c in children])) print("shared mutations:", mutations) print("\n") parent.clades = [c for c in parent if c not in children] new_clade = Phylo.BaseTree.Clade(branch_length=tt.one_mutation*len(mutations)) new_clade.relevant_mutations = set(mutations) for c in children: left_over_mutations = c.relevant_mutations.difference(mutations) if len(left_over_mutations): c.relevant_mutations = left_over_mutations c.branch_length = tt.one_mutation*len(c.relevant_mutations) new_clade.clades.append(c) else: new_clade.clades.extend(c.clades) already_touched.add(c) parent.clades.append(new_clade) Phylo.write(T, args.output, 'newick') |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 | from collections import defaultdict import json, argparse if __name__=="__main__": parser = argparse.ArgumentParser( description="calculate mutation context json", formatter_class=argparse.ArgumentDefaultsHelpFormatter ) parser.add_argument('--tree', type=str, required=True, help="tree file") parser.add_argument('--mutations', type=str, required=True, help="mutations") parser.add_argument('--output', type=str, metavar="JSON", required=True, help="output Auspice JSON") args = parser.parse_args() with open(args.mutations) as fh: data = json.load(fh)['nodes'] terminal_muts = defaultdict(lambda: defaultdict(int)) internal_muts = defaultdict(lambda: defaultdict(int)) node_data = {} for name, node in data.items(): GA_count = 0 CT_count = 0 total_muts = 0 for mut in node["muts"]: a, pos, d = mut[0], int(mut[1:-1]), mut[-1] if a in 'ACGT' and d in 'ACGT': total_muts += 1 if a+d == 'GA': GA_count += 1 elif a+d == 'CT': CT_count += 1 GA_CT_count = GA_count + CT_count if total_muts: node_data[name] = {"GA_CT_fraction": GA_CT_count/total_muts} else: node_data[name] = {"GA_CT_fraction": None } dinuc_count = 0 if GA_CT_count: #node_data[name]["CT_fraction"] = CT_count/GA_CT_count for mut in node["muts"]: a, pos, d = mut[0], int(mut[1:-1]), mut[-1] if a in 'ACGT' and d in 'ACGT': if a+d == 'GA' and node['sequence'][pos]=='A': dinuc_count+=1 elif a+d == 'CT' and node['sequence'][pos-2]=='T': dinuc_count+=1 node_data[name]["dinuc_context_fraction"] = dinuc_count/GA_CT_count else: node_data[name]["dinuc_context_fraction"] = None #node_data[name]["CT_fraction"] = None with open(args.output, 'w') as fh: json.dump({"nodes":node_data}, fh) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 | import argparse import json from collections import defaultdict def sample_date(node): """ Returns the sample date in numeric form. In the future, we could examine the 'raw_date' attr here to decide whether to ignore some sequences, as 'numdate' is the inferred (timetree) date which can hide uncertainty in actual sampling date, """ if "raw_date" not in node: # internal node or tip with no date info return return node['numdate'] if __name__=="__main__": parser = argparse.ArgumentParser( description="remove time info", formatter_class=argparse.ArgumentDefaultsHelpFormatter ) parser.add_argument('--input-node-data', type=str, required=True, help="input data") parser.add_argument('--output-node-data', type=str, metavar="JSON", required=True, help="output Auspice JSON") args = parser.parse_args() with open(args.input_node_data) as fh: data = json.load(fh) new_node_data = {} for name, node in data["nodes"].items(): try: new_node_data[name] = { "mutation_length": node["mutation_length"], "branch_length": node["branch_length"] } sdate = sample_date(node) if sdate: new_node_data[name]["sample_date"] = sdate except KeyError: # internal node or tip with no date info pass data["nodes"] = new_node_data with open(args.output_node_data, 'w') as fh: json.dump(data, fh) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | import pandas as pd import argparse from Bio import SeqIO if __name__=="__main__": parser = argparse.ArgumentParser( description="Reverse-complement reverse-complemented sequence", formatter_class=argparse.ArgumentDefaultsHelpFormatter ) parser.add_argument('--metadata', type=str, required=True, help="input metadata") parser.add_argument('--sequences', type=str, required=True, help="input sequences") parser.add_argument('--output', type=str, required=True, help="output sequences") args = parser.parse_args() metadata = pd.read_csv(args.metadata, sep='\t') # Read in fasta file with open(args.sequences, 'r') as f_in: with open(args.output, 'w') as f_out: for seq in SeqIO.parse(f_in, 'fasta'): # Check if metadata['reverse'] is True try: if metadata.loc[metadata['strain'] == seq.id, 'reverse'].values[0] == True: # Reverse-complement sequence seq.seq = seq.seq.reverse_complement() print("Reverse-complementing sequence:", seq.id) except: print("No reverse complement for:", seq.id) # Write sequences to file SeqIO.write(seq, f_out, 'fasta') |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | import pandas as pd import json, argparse from augur.io import read_metadata def replace_name_recursive(node, lookup): if node["name"] in lookup: node["name"] = lookup[node["name"]] if "children" in node: for child in node["children"]: replace_name_recursive(child, lookup) if __name__=="__main__": parser = argparse.ArgumentParser( description="Swaps out the strain names in the Auspice JSON with the final strain name", formatter_class=argparse.ArgumentDefaultsHelpFormatter ) parser.add_argument('--input-auspice-json', type=str, required=True, help="input auspice_json") parser.add_argument('--metadata', type=str, required=True, help="input data") parser.add_argument('--metadata-id-columns', nargs="+", help="names of possible metadata columns containing identifier information, ordered by priority. Only one ID column will be inferred.") parser.add_argument('--display-strain-name', type=str, required=True, help="field to use as strain name in auspice") parser.add_argument('--output', type=str, metavar="JSON", required=True, help="output Auspice JSON") args = parser.parse_args() metadata = read_metadata(args.metadata, id_columns=args.metadata_id_columns) name_lookup = {} for ri, row in metadata.iterrows(): strain_id = row.name name_lookup[strain_id] = args.display_strain_name if pd.isna(row[args.display_strain_name]) else row[args.display_strain_name] with open(args.input_auspice_json, 'r') as fh: data = json.load(fh) replace_name_recursive(data['tree'], name_lookup) with open(args.output, 'w') as fh: json.dump(data, fh) |
31 32 33 34 35 | shell: """ cp {input.auspice_json} {output.auspice_json} cp {input.root_sequence} {output.root_sequence_json} """ |
62 63 | shell: "rm -rfv {params}" |
73 74 | shell: "rm -rfv {params}" |
19 20 21 22 23 24 25 26 27 28 29 30 31 | shell: """ augur filter \ --metadata {input.metadata} \ --metadata-id-columns {params.strain_id} \ --sequences {input.sequences} \ --include-where strain=MK783032 strain=MK783030 \ --group-by clade lineage \ --subsample-max-sequences 50 \ --subsample-seed 0 \ --output-metadata {output.metadata} \ --output-sequences {output.sequences} """ |
36 37 38 39 40 41 42 43 44 45 46 47 48 49 | shell: """ augur filter \ --sequences {input.sequences} \ --metadata {input.metadata} \ --metadata-id-columns {params.strain_id} \ --exclude {input.exclude} \ --output-sequences {output.sequences} \ --output-metadata {output.metadata} \ --min-date {params.min_date} \ --min-length {params.min_length} \ --exclude-where QC_rare_mutations=bad \ --output-log {output.log} """ |
67 68 69 70 71 72 73 74 75 76 77 78 79 | shell: """ augur filter \ --sequences {input.sequences} \ --metadata {input.metadata} \ --metadata-id-columns {params.strain_id} \ --output-strains {output.strains} \ {params.group_by} \ {params.sequences_per_group} \ {params.exclude} \ {params.other_filters} \ --output-log {output.log} """ |
96 97 98 99 100 101 102 103 104 105 106 | shell: """ augur filter \ --metadata-id-columns {params.strain_id} \ --sequences {input.sequences} \ --metadata {input.metadata} \ --exclude-all \ --include {input.strains} {input.include}\ --output-sequences {output.sequences} \ --output-metadata {output.metadata} """ |
115 116 117 118 119 120 121 | shell: """ python3 scripts/reverse_reversed_sequences.py \ --metadata {input.metadata} \ --sequences {input.sequences} \ --output {output} """ |
141 142 143 144 145 146 147 148 149 150 151 152 153 | shell: """ nextalign run \ --jobs {threads} \ --reference {input.reference} \ --genemap {input.genemap} \ --max-indel {params.max_indel} \ --seed-spacing {params.seed_spacing} \ --retry-reverse-complement \ --output-fasta - \ --output-insertions {output.insertions} \ {input.sequences} | seqkit seq -i > {output.alignment} """ |
171 172 173 174 175 176 177 178 | shell: """ augur mask \ --sequences {input.sequences} \ --mask {input.mask} \ --mask-from-beginning {params.from_start} \ --mask-from-end {params.from_end} --output {output} """ |
190 191 192 193 194 195 196 197 198 | shell: """ augur tree \ --alignment {input.alignment} \ --exclude-sites {input.tree_mask} \ --tree-builder-args="-redo" \ --output {output.tree} \ --nthreads {threads} """ |
211 212 213 214 215 216 217 218 | shell: """ python3 scripts/fix_tree.py \ --alignment {input.alignment} \ --input-tree {input.tree} \ {params.root} \ --output {output.tree} """ |
251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 | shell: """ augur refine \ --tree {input.tree} \ --alignment {input.alignment} \ --metadata {input.metadata} \ --metadata-id-columns {params.strain_id} \ --output-tree {output.tree} \ --timetree \ --root {params.root} \ --precision 3 \ --keep-polytomies \ {params.clock_rate} \ {params.clock_std_dev} \ --output-node-data {output.node_data} \ --coalescent {params.coalescent} \ --date-inference {params.date_inference} \ --date-confidence \ --clock-filter-iqd {params.clock_filter_iqd} """ |
283 284 285 286 287 288 289 290 | shell: """ augur ancestral \ --tree {input.tree} \ --alignment {input.alignment} \ --output-node-data {output.node_data} \ --inference {params.inference} """ |
302 303 304 305 306 307 308 309 | shell: """ augur translate \ --tree {input.tree} \ --ancestral-sequences {input.node_data} \ --reference-sequence {input.genemap} \ --output {output.node_data} """ |
327 328 329 330 331 332 333 334 335 336 337 | shell: """ augur traits \ --tree {input.tree} \ --metadata {input.metadata} \ --metadata-id-columns {params.strain_id} \ --output {output.node_data} \ --columns {params.columns} \ --confidence \ --sampling-bias-correction {params.sampling_bias_correction} """ |
352 353 354 355 356 357 358 359 | shell: """ augur clades \ --tree {input.tree} \ --mutations {input.nuc_muts} {input.aa_muts} \ --clades {input.clades} \ --output-node-data {output.node_data} 2>&1 | tee {log} """ |
367 368 369 370 371 372 | shell: """ python scripts/clades_renaming.py \ --input-node-data {input} \ --output-node-data {output.node_data} """ |
381 382 383 384 385 386 387 | shell: """ python3 scripts/mutation_context.py \ --tree {input.tree} \ --mutations {input.node_data} \ --output {output.node_data} """ |
395 396 397 398 | shell: """ python3 scripts/remove_timeinfo.py --input-node-data {input} --output-node-data {output} """ |
410 411 412 413 414 415 416 | shell: """ python3 scripts/construct-recency-from-submission-date.py \ --metadata {input.metadata} \ --metadata-id-columns {params.strain_id} \ --output {output} 2>&1 """ |
426 427 428 429 430 431 432 433 | shell: """ python3 scripts/assign-colors.py \ --ordering {input.ordering} \ --color-schemes {input.color_schemes} \ --output {output.colors} \ --metadata {input.metadata} 2>&1 """ |
460 461 462 463 464 465 466 467 468 469 470 471 472 473 | shell: """ augur export v2 \ --tree {input.tree} \ --metadata {input.metadata} \ --metadata-id-columns {params.strain_id} \ --node-data {input.branch_lengths} {input.nt_muts} {input.aa_muts} {input.mutation_context} {input.clades} {input.recency}\ --colors {input.colors} \ --lat-longs {input.lat_longs} \ --description {input.description} \ --auspice-config {input.auspice_config} \ --include-root-sequence \ --output {output.auspice_json} """ |
487 488 489 490 491 492 493 494 495 | shell: """ python3 scripts/set_final_strain_name.py --metadata {input.metadata} \ --metadata-id-columns {params.strain_id} \ --input-auspice-json {input.auspice_json} \ --display-strain-name {params.display_strain_field} \ --output {output.auspice_json} cp {input.root_sequence} {output.root_sequence} """ |
4 5 6 7 | shell: """ curl https://mpox-lapis.genspectrum.org/v1/sample/fasta --output {output.sequences} """ |
13 14 15 16 17 18 | shell: """ curl https://mpox-lapis.genspectrum.org/v1/sample/details?dataFormat=csv | \ tr -d "\r" | sed -E 's/("([^"]*)")?,/\\2\\t/g' > {output.metadata} """ |
24 25 26 27 | shell: """ nextstrain remote upload {params.deploy_url} {input} """ |
38 39 40 41 | shell: """ ./bin/notify-on-deploy {params.deploy_url} {params.slack_ts} """ |
10 11 12 13 14 | shell: """ curl -fsSL --compressed {params.sequences_url:q} --output {output.sequences} curl -fsSL --compressed {params.metadata_url:q} --output {output.metadata} """ |
26 27 28 29 30 | shell: """ gzip --decompress --keep {input.metadata} xz --decompress --keep {input.sequences} """ |
Support
- Future updates
Related Workflows





