Help improve this workflow!
This workflow has been published but could be further improved with some additional meta data:- Keyword(s) in categories input, output, operation
You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .
This repository contains the analysis workflow for Debelius et al (doi: 10.1101/2021.03.23.436606 ).
Installation
The analysis was done based on a qiime2-2020.11 environment with RESRIPt and q2-sidle installed.
To build this environment, install qiime2-202011 according to your system requirements. Then, follow the q2-sidle installation instructions. You will also need snakemake > 5.3.
conda install --channel bioconda snakemake=5.3
Finally, you will need to clone and download the repository in whatever way works best for you.
Input Files
Simulation
To run the simulation, you will need to download the 100nt OTU table (ID 45113, CRC32: 7066cf83) and metadata from Qiita Study 850 . The OTU table should be imported into QIIME 2 using the command,
qiime tools import \
--type 'FeatureTable[Frequency]' \
--source-fomat BIOMV210Format \
--input-path 45113_otu_table.biom \
--output-path start-table.qza
And placed in the folder specified as the config file (
input_dir
under
simulation
). (By default, this folder is
ipynb/data/inputs/simulation
).
The metadata should also be downloaded from the qiita study (found under the sample information tab), and renamed to
start-metadata.txt
and moved to the simulation input folder
ipynb/data/inputs/simulation
.
Real Data
Real data can be downloaded from ENA study
PRJEB37382
, accessions ERR4704801-ERR4704848. The should be placed in the location specified in the config file. By default, they are expected in the
ipynb/data/inputs/real/seqs
directory. Metadata is inferred from the filenames.
Database Files
The analysis is currently based on the Silva 128 QIIME release, available
here
. The files should be imported into QIIME 2 and saved to the
ipynb/data/reference/silva-ori/
folder. The SEPP reference tree can be downloaded from the
qiime2 resources page
. The formatted Optivag and Greengenes databases are already in the repository.
Running the workflow
You can be through easily through snakemake based on the current repository set up.
snakemake
Workflow steps
Simulation
-
Reference Simulation . The greengenes database is used to filter the reference OTU table to exclude sequences with more than 5 degenerates, and then a reference dataset is simulated from the original data by averaging abundance across multiple individuals of the same age group.
-
filter_degenerete_reference
-
filter_ref_table
-
simulate_reference_samples
-
-
Regional Simulation . Regions of the 16s gene are extracted and combined with the full length reference table to build a regional "ASV" table and representative sequence set. These are the starting points for our three test methods.
-
extract_regions
-
generate_asvs_and_derep_tables
-
-
Database Preparation . The database is prepared by filtering to remove sequences with a large number of degenerate nucleotides from the reference and database sequences which do not have class-level annotations. These are used to generate per-region
-
filter_silva_reference
-
extract_sidle_regions
-
prepare_extracted_region
-
-
Reference Reconstruction . The simulated reference table, the greengenes 13_8 taxonomy and the greengenes 13_8 tree are copied from their respective locations.
-
copy_sim_reference
-
-
OTU Clustering . The table is generated by merging all ASV representative sequences and the feature tables from all the individual regions and then clustering them closed reference at 99% identity against the Silva 128 reference database
-
concatenate_single_asv_counts
-
concatenate_single_rep_seqs
-
cluster_otus
-
get_otu_tree
-
get_otu_taxonomy
-
-
ASV denosing . The table is generated by merging all ASV counts. Taxonomy comes from naive bayesian classification of all sequences; sequences which don't have phylum level annotation are excluded. The tree is built by fragment insertion of the merged sequences
-
concatenate_single_asv_counts
-
concatenate_single_rep_seqs
-
classify_taxonomy
-
get_asv_rep_seq
-
get_asv_table
-
build_insertion_tree
-
-
Sidle . For Sidle, the regional sequences are aligned to the regional kmers. The alignments, regional database maps, and feature counts are used to reconstruct the table. Taxonomy is reconstructed using the database map to identify unresolved sequences. To build the phylogenetic tree, we generate consensus fragments and then insert them into them in the Silva 128 backbone.
-
align_sidle_regions
-
reconstruct_table_single
-
reconstruct_taxonomy
-
reconstruct_fragments_single
-
reconstructed_fragment_insertion
-
-
Alpha Diversity . Within sample (alpha) diversity is calculated from multiple rarefaction. We look at observed features, Shannon, Pielou's evenness and faith's diversity. The results are summarized using the
Table1-CompareAlpha.ipynb
notebook. This generated Table 1.-
rarefy_iteration
-
alpha_iter
-
alpha_table
-
-
Beta Diversity . The between sample (beta) diversity is also calculated from rarified tables, and then, we look at three feature-based metrics (unweighted UniFrac, weighted UniFrac, and Bray-Curtis) as well as a collapsed metric (genus level Bray-Curtis). The results are summarized in the
ipynb/Table2-CompareBeta.ipynb
notebook. This generates Table 2.-
rarefy_iteration
-
genus_braycurtis
-
beta_phylogenetic
-
braycurtis
-
adonis
-
beta_table
-
-
Taxonomic Comparison . We compared the taxonomy at class level using the merged, unnormalized tables through the
ipynb/Figure1-Taxonomy.ipynb
. This generates Figure 2 and Table S3.-
compare_taxonomy
-
Real Data Analysis
-
Data Preparation . We import the sequences into QIIME 2, and then denoise them using dada2 before denoising the sequences.
-
build_manifest
-
import_seqs
-
trim_data_v13
-
denoise_v13
-
trim_data_v34
-
denoise_v34
-
trim_posthoc
-
-
Database Preparation . For benchmarking the greengenes database needs to be prepared (
filter_greengenes_reference
), otherwise, the Optivag database is assumed to be tidied. Then, we extract the regions and prepare them from sidle alignment.-
filter_greengenes_reference
-
extract_sidle_regions
-
prepare_extracted_region
-
-
Reference Reconstruction . The V34 region alone was used as the reference for the simulation. We used the paired end table, classified the taxonomy using a naive bayesian classifier for the full length sequences, and then filtered the table to remove anything missing a phylum level or kingdom level designation.
-
classify_regional_taxonomy
-
get_real_reference_data
-
get_real_representative_seqs
-
-
OTU Clustering . The table is generated by merging all ASV representative sequences and the feature tables from all the individual regions and then clustering them closed reference at 99% identity
-
concatenate_single_asv_counts
-
concatenate_single_rep_seqs
-
cluster_otus
-
get_otu_taxonomy
-
-
ASV denosing . The table is generated by merging all ASV counts. Taxonomy comes from naive bayesian classification of all sequences; sequences which don't have phylum level annotation are excluded. The tree is built by fragment insertion of the merged sequences
-
concatenate_single_asv_counts
-
concatenate_single_rep_seqs
-
classify_taxonomy
-
get_asv_rep_seq
-
get_asv_table
-
-
Sidle . For Sidle, the regional sequences are aligned to the regional kmers. The alignments, regional database maps, and feature counts are used to reconstruct the table. Taxonomy is reconstructed using the database map to identify unresolved sequences.
-
align_sidle_regions
-
reconstruct_table_single
-
reconstruct_taxonomy
-
-
Data Synthesis . The taxonomic annotation was checked and the tables evaluated using the
ipynb/Figure2-RealData.ipynb
notebook.-
real_data_figure
-
Benchmarking
The snake file contains a section dedicated to preparing the sidle reads, and it will output a table. This will generate a series of benchmarking files, in the benchmark folder. The
Matlab
folder contains the modified m scripts to run with SMURF.
Code Snippets
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | suppressMessages(library("argparser")) suppressMessages(library("vegan")) # Sets up an argparser to make this tractable parser <- arg_parser( description="Basic Adonis Functionality for age and set" ) parser <- add_argument(parser, '--distance-matrix', help='the distance matrix', ) parser <- add_argument(parser, '--metadata', help=('Path to the metadata file'), ) parser <- add_argument(parser, '--output', help=('The location for the summary file') ) args <- parse_args(parser) # reads in the distance matrix as a dataframe and sets the rownames as the first column df <- read.csv2(args$distance_matrix, sep='\t', row.names='X') map <- read.csv2(args$metadata, sep='\t', row.names='sample.id') map <- map[row.names(df),] dm <- dist(df[row.names(df), row.names(df)]) age <- adonis2(dm ~ age, data=map) res <- data.frame( age = c(age$R2[1], age$`Pr(>F)`[1], age$F[1]), row.names = c('R2', 'p_999', 'F') ) write.table(res, file=args$output, quote=FALSE, sep='\t', col.names = NA) |
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 | description = """ This script will build a set of synthetic samples based on resampling of the american gut data. We'll do this by reading in the biom table and then randomly selecting a subset of samples using non ambigious samples. The samples will then be split into two approximately equal sized groups and used to generate a set of normalized, simulated samples """ import argparse import os import biom import numpy as np import pandas as pd from qiime2 import Artifact def subsample_data(metadata, table, age_split_fun, id_prefixes, colors=None, group_size=10, n_samples=20, seed=1234): """ Generates a set of synthetic sequences based on the American Gut using seperation between people in their 20s and their 60s Parameters ---------- keep_map : DataFrame A map of the samples with the age encoded in `age_cat` and the index as the sample number samples : biom.Table The samples used for simulation age_split_fun: function group_size : int The number of samples to be combined to make the simulated sample split_size : int The number of samples to use in each set. (This is used to adjust) for uneven group sizes in the two age groups n_samples : int The number of samples to simulate for each age group Return ------ biom.Table All of the synthesized samples """ # Sets a seed for consistency np.random.seed(seed) # Gets the age catgories metadata['age_cat'] = metadata['age'].apply(age_split_fun) metadata.dropna(subset=['age_cat'], inplace=True) table.filter(metadata.index, axis='sample', inplace=True) table.add_metadata(metadata[['age_cat']].to_dict(orient='index'), axis='sample') print(id_prefixes) tables = [ table.copy().filter(lambda v, id_, md: md['age_cat'] == prefix, axis='sample', inplace=False) for prefix in id_prefixes ] synthetic = [ tidy_names(pd.concat(axis=1, objs=[ build_subsample(table, group_size, prefix) for i in range(n_samples)]), prefix) for prefix, table in zip(*(id_prefixes, tables)) ] print(synthetic[0].head()) all_samples = pd.concat(axis=1, sort=False, objs=synthetic) all_samples.fillna(0, inplace=True) all_biom = biom.Table(data=all_samples.values, sample_ids=all_samples.columns, observation_ids=all_samples.index ) metadata = pd.DataFrame.from_dict(orient='index', data={ id_: {'set': id_.split('.')[1], 'age': id_.split('.')[2], } for id_ in all_biom.ids(axis='sample') }) metadata.index.set_names('sample-id', inplace=True) metadata['composite'] = metadata.apply(lambda x: "{age}-{set}".format(**x), axis=1) if colors is not None: metadata['color'] = metadata['composite'].replace(colors) return all_biom, metadata def tidy_names(table, prefix): table.rename(inplace=True, columns={ int(i): f"sample.1.{prefix}.{i}" for i in table.columns }) return table def build_subsample(table, sample_size, group): """ Builds a subsampled table from a biom object """ sub_t = table.copy().subsample(sample_size, axis='sample', by_id=True) sub_t.filter(lambda v, id_, md: (v > 0).sum() > np.round(0.4 * sample_size), axis='observation', inplace=True) single = sub_t.copy().collapse(lambda id_, md: md['age_cat'], axis='sample') single.norm(axis='sample', inplace=True) single.filter(lambda v, id_, md: (v > 2e-5).all(), axis='observation', inplace=True) single.norm(axis='sample', inplace=True) return pd.Series(np.round(single.data(group) * 5e5).round(0), index=single.ids(axis='observation')) parser = argparse.ArgumentParser(description=description) parser.add_argument( '--keep-map', help=("The map of the samples to be used for simulation. This must " "contain a sample identifier labeled `sample_name` and the age" "labeled age_cat" ), ) parser.add_argument( '--sample-table', help=("A biom table containing sample counts. The sample list should " "match those present in `--keep-map`"), ) parser.add_argument( '--simulated-table', help=('Filepath to the qiime artifact containing all the samples'), ) parser.add_argument( '--simulated-metadata', help=('Filepath to the qiime artifact containing all the samples'), ) parser.add_argument( '--samples-into-simulation', help=("The number of samples to combine into each simulated sample"), default=10, type=int, ) parser.add_argument( '--group-size', help=("The number of samples in each group to simulate for the set."), default=20, type=int, ) parser.add_argument( '--seed', help=('a random seed'), default=1234, type=int, ) if __name__ == '__main__': args = parser.parse_args() # Loads group keep_map = pd.read_csv(args.keep_map, sep='\t', dtype=str) keep_map.set_index('sample_name', inplace=True) keep_map['age'] = keep_map['age'].astype(float) # Loads table sample_table = Artifact.load(args.sample_table).view(biom.Table) def cat_age(x): if pd.isnull(x): return x if x <= 3: return 'infant' elif (3 < x) & (x <= 12): return 'child' elif (20 < x) & (x <= 60): return 'adult' else: return np.nan colors = {'infant-1': '#1f78b4', 'infant-2': '#a6cee3', 'adult-1': '#e31a1c', 'adult-2': '#fb9a99'} all_biom, metadata = \ subsample_data(metadata=keep_map.copy(), table=sample_table, age_split_fun=cat_age, id_prefixes=['infant', 'adult'], group_size=args.samples_into_simulation, n_samples=args.group_size, seed=args.seed, ) all_table = Artifact.import_data('FeatureTable[Frequency]', all_biom) all_table.save(args.simulated_table) metadata.to_csv(args.simulated_metadata, sep='\t') |
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 | description = """ Simulates the an ASV table and dereplicated sequences for OTU clustering and meta analysis exploration """ import argparse import hashlib import biom import numpy as np import pandas as pd import regex import skbio from skbio import DNA from qiime2 import Artifact def build_regional_table(samples, seqs, read_length): """ Amplifies a regional table from the original table Parameters ---------- samples : DataFrame A biom-style dataframe where the index is sequences (corresponding) to the sequences in seqs """ # Filters the sequences and table down to those present in the sequences degen_count = seqs.astype(str).apply( lambda x: len(regex.findall('[RYSWKMBDHVN]', str(x))) ) len_check = seqs.astype(str).apply(lambda x: (len(x) >= np.absolute(read_length))) print(len_check.sum()) degen_keep = degen_count.index[(degen_count <= 10) & len_check].values var_ids = [id_ for id_ in samples.index if (id_ in degen_keep)] samples = samples.loc[var_ids] table_seqs = seqs[var_ids].copy() # Expands the degenerate sequences exp_ = {id_: np.array([str(s) for s in seq_.expand_degenerates()]) for id_, seq_ in table_seqs.iteritems()} # Generates the table by expanding the sequence where degenerates are # randomnly sampled and then combining the sequences into a table forward_reads = dict([]) reverse_reads = dict([]) joined_reads = dict([]) for sample in samples.columns: expanded = pd.Series(np.hstack([ np.random.choice(exp_[id_], size=count, replace=True) for id_, count in samples.loc[samples[sample] > 0, sample].iteritems()]) ).apply(lambda x: x[:read_length]) joined_reads[sample] = expanded.value_counts() joined_table = pd.DataFrame.from_dict(orient='index', data=joined_reads).fillna(0) joined_table = joined_table.loc[joined_table.sum(axis=1) > 10] joined_reads = pd.Series({ hash_seq(seq): DNA(seq) for seq in joined_table.columns }) joined_table.rename(columns={seq: hash_seq(seq) for seq in joined_table.columns}, inplace=True) jointable = Artifact.import_data('FeatureTable[Frequency]', joined_table, pd.DataFrame) joinseqs = Artifact.import_data('FeatureData[Sequence]', joined_reads, pd.Series) return jointable, joinseqs def hash_seq(x): """ A lazy function to hash things """ return hashlib.md5(x.encode()).hexdigest() parser = argparse.ArgumentParser(description=description) parser.add_argument( '--sequences', help=('The extracted regional sequences'), ) parser.add_argument( '--sample-table', help=("A table of the samples to be simulated."), ) parser.add_argument( '--read-length', help=("length of the final simulated sequences. We treat these as paired" "end sequences"), type=int, ) parser.add_argument( '--table', help=('Path to save the joined-sequence table') ) parser.add_argument( '--reads', help=('Path to save the representative sequences for the joined table'), ) parser.add_argument( '--random', help=("A seed for randomn number generation"), default=None, ) if __name__ == '__main__': args = parser.parse_args() # Gets the regional sequences seqs = Artifact.load(args.sequences).view(pd.Series) # Gets the table table = Artifact.load(args.sample_table).view(pd.DataFrame).fillna(0).astype(int).T # Simulates the samples if args.random is not None: np.random.seed(int(args.random)) # fwd_table, rev_table, join_table, fwd_seqs, rev_seqs, join_seqs = \ join_table, join_seqs = \ build_regional_table(table, seqs, args.read_length) # fwd_table.save(args.fwd_table) # fwd_seqs.save(args.fwd_reads) # rev_table.save(args.rev_table) # rev_seqs.save(args.rev_reads) join_table.save(args.table) join_seqs.save(args.reads) |
29 30 31 32 33 34 35 36 | shell: """ qiime sidle filter-degenerate-sequences \ --i-sequences {input} \ --p-n-workers 2 \ --p-max-degen 5 \ --o-filtered-sequences {output} """ |
48 49 50 51 52 53 54 | shell: """ qiime feature-table filter-features \ --i-table {input.table} \ --m-metadata-file {input.sequences} \ --o-filtered-table {output} """ |
67 68 69 70 71 72 73 74 75 | shell: """ python bin/simulate_samples.py \ --keep-map {input.metadata} \ --sample-table {input.table} \ --group-size {params.group_size} \ --simulated-table {output.table} \ --simulated-metadata {output.metadata} """ |
91 92 93 94 95 96 97 98 99 100 101 | shell: """ qiime feature-classifier extract-reads \ --i-sequences {input} \ --p-f-primer {params.fwd} \ --p-r-primer {params.rev} \ --p-min-length 50 \ --p-max-length 700 \ --p-n-jobs 5 \ --o-reads {output} """ |
113 114 115 116 117 118 119 120 121 122 | shell: """ python bin/simulate_tables_exact.py \ --sequences {input.seqs} \ --sample-table {input.table} \ --read-length {params.length} \ --random {params.seed} \ --table {output.joined_table} \ --reads {output.joined_repset} """ |
140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 | run: import os import numpy as np import pandas as pd samples = inputs[0] region = params[0] joined = params[1] def get_sample_descriptor(fp): name = fp.split('.')[0] abs_fp = os.path.abspath(fp) position, study_info, sample_info = name.split("__") [pool_id, rep, region, pcr, dir_] = sample_info.split('_') dir_label = {'R1': 'forward', 'R2': 'reverse'}[dir_] summary = pd.Series( data={'%s-absolute-filepath' % dir_label: abs_fp, 'sample-id': '{}_{}'.format(pool_id, rep), 'pool': pool_id, 'replicate': rep, 'region': region, 'pcr_type': pcr}, name='_'.join(sample_info.split('_')[:-1])) return summary manifest = pd.DataFrame([get_sample_descriptor(fp) for fp in samples if ('R1.' in fp)]) rev_summary = pd.DataFrame([get_sample_descriptor(fp) for fp in samples if ('R2.' in fp)]) if direction: manifest['reverse-absolute-filepath'] = rev_summary['reverse-absolute-filepath'] else: manifest.rename(columns={'forward-absolute-filepath': 'absolute-filepath'}, inplace=True) manifest.set_index('sample-id', inplace=True) manifest.to_csv(str(output), sep='\t') |
190 191 192 193 194 195 196 197 | shell: """ qiime tools import \ --type {params.type} \ --input-path {input} \ --input-format {params.format} \ --output-path {output} """ |
210 211 212 213 214 215 216 217 | shell: """ qiime cutadapt trim-single \ --i-demultiplexed-sequences {input} \ --p-front {params.f_primer} \ --p-no-discard-untrimmed \ --o-trimmed-sequences {output} """ |
228 229 230 231 232 233 234 235 236 237 | shell: """ qiime dada2 denoise-single \ --i-demultiplexed-seqs {input} \ --p-trunc-len 280 \ --p-n-threads 4 \ --o-table {output.table} \ --o-representative-sequences {output.seqs} \ --o-denoising-stats {output.stats} """ |
251 252 253 254 255 256 257 258 259 | shell: """ qiime cutadapt trim-paired \ --i-demultiplexed-sequences {input} \ --p-front-f {params.f_primer} \ --p-front-r {params.r_primer} \ --p-no-discard-untrimmed \ --o-trimmed-sequences {output} """ |
270 271 272 273 274 275 276 277 278 279 280 | shell: """ qiime dada2 denoise-paired \ --i-demultiplexed-seqs {input} \ --p-trunc-len-f 280 \ --p-trunc-len-r 280 \ --p-n-threads 4 \ --o-table {output.table} \ --o-representative-sequences {output.seqs} \ --o-denoising-stats {output.stats} """ |
297 298 299 300 301 302 303 304 305 306 | shell: """ qiime sidle trim-dada2-posthoc \ --i-table {input.table} \ --i-representative-sequences {input.seqs} \ --p-trim-length {params.length} \ --p-hashed-feature-ids \ --o-trimmed-table {output.table} \ --o-trimmed-representative-sequences {output.seqs} \ """ |
322 323 324 325 326 327 328 329 330 | shell: """ qiime cutadapt trim-paired \ --i-demultiplexed-sequences {input} \ --p-front-f {params.fwd} \ --p-front-r {params.rev} \ --p-discard-untrimmed \ --o-trimmed-sequences {output} """ |
343 344 345 346 347 348 | shell: """ qiime vsearch join-pairs \ --i-demultiplexed-seqs {input} \ --o-joined-sequences {output} """ |
358 359 360 361 362 363 364 | shell: """ qiime quality-filter q-score \ --i-demux {input} \ --o-filtered-sequences {output.seqs}\ --o-filter-stats {output.stats} """ |
377 378 379 380 381 382 383 384 385 386 | shell: """ qiime deblur denoise-16S \ --i-demultiplexed-seqs {input} \ --p-trim-length {params} \ --o-representative-sequences {output.seqs} \ --o-table {output.table} \ --p-sample-stats \ --o-stats {output.stats} """ |
407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 | shell: """ qiime sidle filter-degenerate-sequences \ --i-sequences {input.seqs} \ --p-max-degen {params.degen} \ --p-n-workers 4 \ --o-filtered-sequences {params.tmp} qiime taxa filter-seqs \ --i-sequences {params.tmp} \ --i-taxonomy {input.taxa} \ --p-include {params.include} \ --p-exclude {params.exclude} \ --p-mode "contains" \ --o-filtered-sequences {output.seqs} rm {params.tmp} """ |
436 437 438 439 440 441 442 443 | shell: """ qiime feature-classifier extract-reads \ --i-sequences {input} \ --p-f-primer {params.fwd} \ --p-r-primer {params.rev} \ --o-reads {output} """ |
458 459 460 461 462 463 464 465 466 467 468 469 | shell: """ qiime sidle prepare-extracted-region \ --i-sequences {input} \ --p-fwd-primer {params.fwd} \ --p-rev-primer {params.rev} \ --p-region {params.region} \ --p-trim-length {params.trim} \ --p-n-workers 2 \ --o-collapsed-kmers {output.kmers} \ --o-kmer-map {output.map_} """ |
487 488 489 490 491 492 | shell: """ cp {input.table} {output.table} cp {input.taxa} {output.taxa} cp {input.tree} {output.tree} """ |
502 503 504 505 506 507 508 | shell: """ qiime feature-classifier classify-sklearn \ --i-reads {input.seqs} \ --i-classifier {input.classifier} \ --o-classification {output} """ |
516 517 518 519 520 521 522 523 524 | shell: """ qiime taxa filter-table \ --i-table {input.table} \ --i-taxonomy {input.taxonomy} \ --p-include ';c__' \ --p-mode contains \ --o-filtered-table {output} """ |
532 533 534 535 536 537 538 | shell: """ qiime feature-table filter-seqs \ --i-table {input.table} \ --i-data {input.seqs} \ --o-filtered-data {output} """ |
550 551 552 553 554 555 556 557 | run: from qiime2 import Artifact import qiime2.plugins.feature_table.actions as q2_table tables = [Artifact.load(fp) for fp in input] merged = q2_table.merge(tables, overlap_method='sum').merged_table merged.save(str(output)) |
565 566 567 568 569 570 571 572 | run: from qiime2 import Artifact import qiime2.plugins.feature_table.actions as q2_table tables = [Artifact.load(fp) for fp in input] merged = q2_table.merge_seqs(tables).merged_data merged.save(str(output)) |
592 593 594 595 596 597 598 599 600 601 602 603 | shell: """ qiime vsearch cluster-features-closed-reference \ --i-sequences {input.seqs} \ --i-table {input.table} \ --i-reference-sequences {input.ref} \ --p-perc-identity {params.perc_identity} \ --p-threads {params.threads} \ --o-clustered-table {output.table} \ --o-clustered-sequences {output.centroids} \ --o-unmatched-sequences {output.discard} """ |
610 611 612 613 | shell: """ cp {input} {output} """ |
620 621 622 623 | shell: """ cp {input} {output} """ |
636 637 638 639 640 641 642 | shell: """ qiime feature-classifier classify-sklearn \ --i-reads {input.seqs} \ --i-classifier {input.classifier} \ --o-classification {output} """ |
652 653 654 655 656 657 658 659 660 | shell: """ qiime taxa filter-table \ --i-table {input.table} \ --i-taxonomy {input.taxonomy} \ --p-include ";D_1__" \ --p-mode contains \ --o-filtered-table {output} """ |
668 669 670 671 672 673 674 | shell: """ qiime feature-table filter-seqs \ --i-table {input.table} \ --i-data {input.seqs} \ --o-filtered-data {output} """ |
685 686 687 688 689 690 691 692 693 | shell: """ qiime fragment-insertion sepp \ --i-representative-sequences {input.seqs} \ --i-reference-database {input.reference} \ --o-tree {output.tree} \ --p-threads {params.threads} \ --o-placements {output.placements} """ |
712 713 714 715 716 717 718 719 720 721 722 723 | shell: """ qiime sidle align-regional-kmers \ --i-kmers {input.kmers} \ --i-rep-seq {input.rep_seq} \ --p-region {params.region} \ --p-max-mismatch {params.mismatch} \ --p-n-workers 2 \ --p-chunk-size 75 \ --o-regional-alignment {output.alignment} \ --verbose """ |
739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 | run: from qiime2 import Artifact import qiime2.plugins.sidle.actions as q2_sidle num_regions = len(params[0]) regions = params[0] nworkers = params[1] kmers = [Artifact.load(str(fp_)) for fp_ in input[:num_regions]] align = [Artifact.load(str(fp_)) for fp_ in input[num_regions:2*num_regions]] tables = [Artifact.load(str(fp_)) for fp_ in input[2*num_regions:]] table, summary, db_map = q2_sidle.reconstruct_counts( kmer_map=kmers, regional_alignment=align, regional_table=tables, region=regions, region_normalize='average', n_workers=3, # n_workers=nworkers, # verbose=True, ) table.save(str(output[0])) summary.save(str(output[1])) db_map.save(str(output[2])) |
780 781 782 783 784 785 786 787 788 | shell: """ qiime sidle reconstruct-taxonomy \ --i-reconstruction-map {input.db_map} \ --i-taxonomy {input.taxa} \ --p-database {params.database} \ --p-define-missing merge \ --o-reconstructed-taxonomy {output} """ |
800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 | run: from qiime2 import Artifact import qiime2.plugins.sidle.actions as q2_sidle summary = Artifact.load(str(input[0])) db_map = Artifact.load(str(input[1])) aligned_seqs = Artifact.load(str(input[2])) kmer_maps = [Artifact.load(str(fp)) for fp in input[3:]] regions = params[0] representative_fragments = q2_sidle.reconstruct_fragment_rep_seqs( reconstruction_map=db_map, reconstruction_summary=summary, aligned_sequences=aligned_seqs, kmer_map=kmer_maps, region=regions ).representative_fragments representative_fragments.save(str(output)) |
828 829 830 831 832 833 834 835 836 | shell: """ qiime fragment-insertion sepp \ --i-representative-sequences {input.fragments} \ --i-reference-database {input.reference} \ --p-threads 6 \ --o-tree {output.tree} \ --o-placements {output.placements} """ |
852 853 854 855 856 857 858 | shell: """ qiime feature-table rarefy \ --i-table {input.table} \ --p-sampling-depth {params.depth} \ --o-rarefied-table {output.table} """ |
870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 | run: from qiime2 import Artifact import qiime2.plugins.diversity.actions as q2_diversity table = Artifact.load(str(input[0])) tree = Artifact.load(str(input[1])) metric = params[0] if metric == 'faith_pd': alpha = q2_diversity.alpha_phylogenetic( table=table, phylogeny=tree, metric=metric, ).alpha_diversity else: alpha = q2_diversity.alpha( table=table, metric=metric, ).alpha_diversity alpha.save(output[0]) |
901 902 903 904 905 906 907 908 | shell: """ qiime diversity beta-phylogenetic \ --i-table {input.table} \ --i-phylogeny {input.tree} \ --p-metric {params.metric} \ --o-distance-matrix {output} """ |
917 918 919 920 921 922 923 924 925 926 927 928 929 | shell: """ qiime taxa collapse \ --i-table {input.table} \ --i-taxonomy {input.taxonomy} \ --p-level 6 \ --o-collapsed-table {output.table} qiime diversity beta \ --i-table {output.table} \ --p-metric braycurtis \ --o-distance-matrix {output.dm} """ |
937 938 939 940 941 942 943 | shell: """ qiime diversity beta \ --i-table {input.table} \ --p-metric braycurtis \ --o-distance-matrix {output} """ |
954 955 956 957 958 959 960 961 962 963 964 965 966 | shell: """ qiime tools export \ --input-path {input.dm} \ --output-path {params.dir_} Rscript bin/adonis_single.R \ -d {params.dir_}/distance-matrix.tsv \ -m {input.meta} \ -o {output} rm -r {params.dir_} """ |
Support
- Future updates
Related Workflows





