Help improve this workflow!
This workflow has been published but could be further improved with some additional meta data:- Keyword(s) in categories input, output, operation
You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .
nomis_pipeline
About
-
Repository containing workflows for
IMP3
downstream analyses -
Related project(s): NOMIS
Setup
Conda
# install miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
chmod u+x Miniconda3-latest-Linux-x86_64.sh
./Miniconda3-latest-Linux-x86_64.sh # follow the instructions
Getting the repository including sub-modules
git clone --recurse-submodules ssh://git@git-r3lab-server.uni.lu:8022/susheel.busi/nomis_pipeline.git
Create the main
snakemake
environment
# create venv
conda env create -f requirements.yml -n "snakemake"
Dependencies
The successful completion requires tools created by others
Notes:
-
Dependencies are included as
submodules
where possible -
However, installation issues may persist
-
If so, check the respective repositories listed
How to run
The workflow can be launched using one of the option as follows
./config/sbatch.sh
(or)
CORES=48 snakemake -s workflow/Snakefile --configfile config/config.yaml --use-conda --conda-prefix ${CONDA_PREFIX}/pipeline --cores $CORES -rpn
(or)
Note: For running on
esb-compute-01
or
litcrit
adjust the
CORES
as needed to prevent
MANTIS
from spawning too many workers and launch as below
CORES=24 snakemake -s workflow/Snakefile --configfile config/config.yaml --use-conda --conda-prefix ${CONDA_PREFIX}/pipeline --cores $CORES -rpn
Configs
All config files are stored in the folder
config/
:
Workflows
-
imp workflow
: setup the folders required for running IMP3 on each sample -
viruses workflow
: run VIBRANT and vCONTACT2 on assemblies, including CheckV on vibrant_output -
eukaryotes workflow
: running EUKUlele on assemblies -
bins workflow
: collects all bins together for taxonomy analyses -
taxonomy workflow
: run GTDBtk and CheckM on the bins -
functions workflow
: runs METABOLIC, MAGICCAVE and FUNCS analyses -
mantis workflow
: runs MANTIS on the bins explicitly -
euk_bin workflow
: performs coassembly specifically for eukaryotes (EukRep) and runs binning with CONCOCT -
coassembly_binning
: performs coassembly for all samples and subsequent binning -
misc workflow
: runs gRodon, antismash. To be implemented PopCOGent,and potentially anvi'o coassembly/binning.
Relevant paremters which have to be changed are listed for each workflow and config file. Parameters defining system-relevant settings are not listed but should be also be changed if required, e.g. number of threads used by certain tools etc.
STEPS
The workflow is setup in multiple steps. Prior to running change the following
-
config:
config/
-
config.yaml
:-
change
steps
-
change
-
Options:
-
imp
-
viruses
-
eukaryotes
-
bins
-
taxonomy
-
functions
-
mantis
-
euk_bin
-
coassembly_binning
-
misc
IMPORTANT NOTE: only the
imp
step should be run first, followed by
launching IMP3
outside of this pipeline. Subsequent other
STEPS
can be run
Launching
IMP3
Per-sample IMP3 can be launched as follows:
chmod -R 775 ${SAMPLE} # adding permissions
cd ${SAMPLE}
sbatch ./launchIMP.sh # on IRIS
imp workflow
Download raw data required for the analysis.
-
config:
config/
-
config.yaml
:-
change
work_dir
-
change
-
sbatch.sh
-
change
SMK_ENV
-
if not using
slurm
to submit jobs remove--cluster-config
,--cluster
from thesnakemake
CMD
-
-
slurm.yaml
(only relevant if usingslurm
for job submission)
-
-
workflow:
workflow/
Prior to running the
imp workflow
make the following adjustments.
-
IMP_config.yaml:
-
workflow/notes/IMP_config.yaml
-
change
Metagenomics
-
change
-
-
run_threads:
-
workflow/notes/runIMP.sh
-
change:
threads
-
change:
-
-
launch_threads:
-
workflow/notes/runIMP.sh
-
change:
-n8
-
change:
-
IMPORTANT Note:
This above workflow should be run first, followed by launching IMP3 outside of this pipeline and then subsequent
STEPS
can be run
Main workflow
Main analysis workflow: given SR FASTQ files, run all the steps to generate required output. This includes:
-
setting up folders for IMP
-
viral and eukaryotic annotations
-
functional analyses and
-
taxonomic analyses (optional)
The workflow is run per sample and might require a couple of days to run depending on the sample, used configuration and available computational resources. Note that the workflow will create additional output files not necessarily required to re-create the figures shown in the manuscript.
-
config:
-
per sample
-
config/<sample>/config.yaml
- change all path parameters (not all databases are required, see above)
-
config/<sample>/sbatch.yaml
-
change
SMK_ENV
-
if not using
slurm
to submit jobs remove--cluster-config
,--cluster
from thesnakemake
CMD
-
-
config/<sample>/slurm.yaml
(only relevant if usingslurm
for job submission)
-
-
workflow:
workflow/
Report workflow (2021-05-26 15:54:59: NOT implemented)
This workflow creates various summary files, plots and an HTML report for a sample using the output of the main workflow.
Note: How the metaP peptide/protein reports were generated from raw metaP data is described in
notes/gdb_metap.md
.
-
config:
- sample configs used for the main workflow
-
workflow:
workflow_report/
To execute this workflow for all samples:
./config/reports.sh "YourEnvName" "WhereToCreateCondEnvs"
Figures workflow (2021-05-26 15:54:52: NOT implemented)
Re-create figures (and tables) used in the manuscript. This workflow should be only run after running the main workflow and report workflow for all samples.
-
config:
config/fig.yaml
-
change
work_dir
-
change paths for all samples in
samples
-
-
workflow:
workflow_figures/
conda activate "YourEnvName"
snakemake -s workflow_figures/Snakefile --cores 1 --configfile config/fig.yaml --use-conda --conda-prefix "WhereToCreateCondEnvs" -rpn # dry-run
Notes
Notes for manual/additional analyses done using the generated data.
Code Snippets
24 25 | shell: "ln -vs {input} {output}" |
35 36 | shell: "ln -vs {input} {output}" |
51 52 | shell: "for fname in {input} ; do echo $(basename -s \".contigs.fa\" \"${{fname}}\") ; done > {output}" |
61 62 | shell: "cat {input} > {output[0]}" |
33 34 | shell: "(date && cat {input.read1} > {output.or1} && cat {input.read2} > {output.or2} && date) &> >(tee {log})" |
51 52 | shell: "(date && clumpify.sh in={input.r1} in2={input.r2} out={output.odup1} out2={output.odup2} dupedist={config[clumpify][dupedist]} dedupe=t optical=t threads={threads} groups={config[clumpify][groups]} -Xmx{config[clumpify][memory]} && date) &> >(tee {log})" |
72 73 74 75 76 77 78 | shell: "(date && megahit -1 {input.sr1} -2 {input.sr2} --kmin-1pass -m 0.9 --k-list 27,37,47,57,67,77,87 --min-contig-len 1000 -t {threads} -o $(dirname {output})/tmp && " "cd $(dirname {output}) && " "rsync -avP tmp/ . && " "ln -sf final.contigs.fa $(basename {output}) && " "rm -rf tmp/ && " "date) &> >(tee {log})" |
100 101 | shell: "(date && coverm contig -1 {input.r1} -2 {input.r2} --reference {input.fa} --output-file {output} -t {threads} && date) &> {log}" |
112 113 | shell: "(date && tail -n +2 {input} > {output} && date) &> {log}" |
131 132 | shell: "(date && bwa index {input} -p {params.idx_prefix} && date) &> {log}" |
155 156 157 158 159 160 | shell: "(date && " "bwa mem -t {threads} {params.idx_prefix} {input.r1} {input.r2} | " "samtools view -@ {threads} -SbT {input.asm} | " "samtools sort -@ {threads} -m {params.chunk_size} -T {params.bam_prefix} -o {output} && " "date) &> {log}" |
176 177 178 | shell: "(date && " "jgi_summarize_bam_contig_depths --outputDepth {output.depth} --pairedContigs {output.paired} {input} && date) &> {log}" |
196 197 198 | shell: "(date && export PATH=$PATH:{config[maxbin2][perl]} && " "run_MaxBin.pl -thread {threads} -contig {input.fa} -out $(dirname {output})/coassembly -abund {input.cov} -min_contig_length {config[maxbin2][min_length]} && date) &> {log}" |
214 215 | shell: "(date && metabat2 -i {input.fa} -a {input.cov} -o $(dirname {output})/coassembly -t {threads} -m {config[metabat2][min_length]} -v --unbinned --cvExt && date) &> {log}" |
232 233 234 | shell: "(date && scripts/Fasta_to_Scaffolds2Bin.sh -i $(dirname {input.max}) -e fa > {output.maxscaf} && " "scripts/Fasta_to_Scaffolds2Bin.sh -i $(dirname {input.met}) -e fa > {output.metscaf} && date) &> {log}" |
254 255 256 257 258 | shell: "(date && export PATH=$PATH:{config[dastool][path]} && " "export PATH=$PATH:{config[dastool][src]} && " "DAS_Tool -i {input.max},{input.met} -c {input.fa} -o $(dirname {output.DIR}) --score_threshold {config[dastool][score]} --search_engine diamond -l maxbin2,metabat2 --write_bins 1 --write_bin_evals 1 --threads {threads} --db_directory {params.db} --create_plots 1 && " "touch {output.DUMMY} && date) &> {log}" |
28 29 | shell: "(date && ln -vs {input} {output} && date) &> >(tee {log})" |
44 45 | shell: "(date && EUKulele --sample_dir $(dirname {input}) -o {output[0]} -m mets && date) &> >(tee {log})" |
58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 | run: tax=pd.read_csv(params.tax, header=0, sep="\t", index_col=0) cov=pd.read_csv(input.cov, header=None, sep="\t") cov.columns=['transcript_name', 'coverage'] # keeping only rows that contain 'Eukary' in the 'full_classification' column euks=tax[tax['full_classification'].str.contains("Eukary", na=False)].drop(['counts'], axis=1) # keeping only rows that have at least 70% 'max_pid' filt_euks=euks.query("max_pid >=70") # merging taxonomy with coverage merged=filt_euks.merge(cov, how="left", on="transcript_name") filt_merged=merged[['full_classification','classification', 'coverage']] # Grouping same taxonomy and getting sum of coverage final=filt_merged.groupby(['full_classification','classification'], as_index=False)['coverage'].sum() # writing to file final.to_csv(output[0], index=None, sep="\t") |
87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 | run: # Collecting all files in folder directory=os.path.dirname(input[0]) os.chdir(directory) # verify the path using getcwd() cwd = os.getcwd() # print the current directory print("Current working directory is:", cwd) mylist=[f for f in glob.glob("*.txt")] mylist # making individual dataframes for each file dataframes= [ pd.read_csv( f, header=0, sep="\t", usecols=["full_classification", "coverage"]) for f in mylist ] # add arguments as necessary to the read_csv method # Merging all files based on common column merged=reduce(lambda left,right: pd.merge(left,right,on='full_classification', how='outer'), dataframes) # Giving appropriate column names names=['full_classification']+mylist new_cols=list(map(lambda x: x.replace('_eukaryotes.txt',''),names)) merged.columns=new_cols # checking if any values are "NA" merged.isnull().values.any() # if "NA" run the following merged.fillna('', inplace=True) # Removing rows with all zeroes (0 or 0.0) merged.set_index('full_classification', inplace=True) # first to make first column as rownames edited=merged.loc[~(merged==0).all(axis=1)] # Writing file without zeroes edited.to_csv(output[0], sep='\t', index=True, header=True) |
135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 | run: tax=pd.read_csv(input.tax, header=0, sep="\t", index_col=0) cov=pd.read_csv(input.cov, header=None, sep="\t") cov.columns=['transcript_name', 'coverage'] # keeping only rows that contain 'Eukary' in the 'full_classification' column euks=tax.drop(['counts'], axis=1) # merging taxonomy with coverage merged=euks.merge(cov, how="left", on="transcript_name") filt_merged=merged[['full_classification','classification', 'coverage']] # Grouping same taxonomy and getting sum of coverage final=filt_merged.groupby(['full_classification','classification'], as_index=False)['coverage'].sum() # writing to file final.to_csv(output[0], index=None, sep="\t") |
162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 | run: # Collecting all files in folder directory=os.path.dirname(input[0]) os.chdir(directory) # verify the path using getcwd() cwd = os.getcwd() # print the current directory print("Current working directory is:", cwd) mylist=[f for f in glob.glob("*ALL.txt")] mylist # making individual dataframes for each file dataframes= [ pd.read_csv( f, sep="\t", usecols=['full_classification', 'coverage']) for f in mylist ] # add arguments as necessary to the read_csv method # Merging all files based on common column merged=reduce(lambda left,right: pd.merge(left,right,on='full_classification', how='outer'), dataframes) # Giving appropriate column names names=['full_classification']+mylist new_cols=list(map(lambda x: x.replace('_eukulele_all.txt',''),names)) merged.columns=new_cols # checking if any values are "NA" merged.isnull().values.any() # if "NA" run the following merged.fillna('', inplace=True) # Removing rows with all zeroes (0 or 0.0) merged.set_index('full_classification', inplace=True) # first to make first column as rownames edited=merged.loc[~(merged==0).all(axis=1)] # Writing file without zeroes |
31 32 | shell: "(date && clumpify.sh in={input.r1} in2={input.r2} out={output.odup1} out2={output.odup2} dupedist={config[clumpify][dupedist]} dedupe=t optical=t threads={threads} groups={config[clumpify][groups]} -Xmx{config[clumpify][memory]} && date) &> >(tee {log})" |
55 56 | shell: "(date && kraken2 --threads {threads} --db {input.database} --use-names --confidence 0.5 --paired {input.dedup1} {input.dedup2} --gzip-compressed --output {output.summary} --report {output.rep} && date) &> >(tee {log})" |
67 68 | shell: "(date && awk '{{if ($3 ~ /unclassified/ || $3 ~ /Eukaryota/) print $2}}' {input} > {output} && date) &> >(tee {log})" |
84 85 | shell: "(date && seqtk subseq {input.dedup1} {input.ids} > {output.ex1} && seqtk subseq {input.dedup2} {input.ids} > {output.ex2} && date) &> >(tee {log})" |
99 100 | shell: "(date && cat {input.read1} > {output.or1} && cat {input.read2} > {output.or2} && date) &> >(tee {log})" |
117 118 | shell: "(date && clumpify.sh in={input.r1} in2={input.r2} out={output.odup1} out2={output.odup2} dupedist={config[clumpify][dupedist]} dedupe=t optical=t threads={threads} groups={config[clumpify][groups]} -Xmx{config[clumpify][memory]} && date) &> >(tee {log})" |
138 139 140 141 142 143 144 | shell: "(date && megahit -1 {input.sr1} -2 {input.sr2} --kmin-1pass -m 0.9 --k-list 27,37,47,57,67,77,87 --min-contig-len 1000 -t {threads} -o $(dirname {output})/tmp && " "cd $(dirname {output}) && " "rsync -avP tmp/ . && " "ln -sf final.contigs.fa $(basename {output}) && " "rm -rf tmp/ && " "date) &> >(tee {log})" |
161 162 | shell: "(date && EukRep -i {input} -o {output} --min 2000 -m strict && date)" |
179 180 | shell: "(date && TMPDIR={RESULTS_DIR} coverm make -o {output} -t {threads} -r {input.ref} -c {input.read1} {input.read2} && date) &> >(tee {log})" |
195 196 197 198 199 | shell: """ cut_up_fasta.py {input.contigs} -c 10000 -o 0 --merge_last -b contigs_10K.bed > {output.contigs_cut} concoct_coverage_table.py contigs_10K.bed {input.bam}/*bam > {output.coverage} """ |
215 216 217 218 | shell: """ concoct --coverage_file {input.coverage} --composition_file {input.contigs_cut} -t {threads} -b {output} """ |
232 233 234 235 236 | shell: """ merge_cutup_clustering.py {input.clustering}/clustering_gt1000.csv > {input.clustering}/clustering_merged.csv extract_fasta_bins.py {input.contigs} {input.clustering}/clustering_merged.csv --output_path {output} """ |
38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 | shell: "(date && " "while read -r line; do ls $(dirname {input})/*.fa | grep -o \"$line\" ; done < {output.sample} > {output.tmpfile} && " "sed 's@^@/work/projects/nomis/metaG_JULY_2020/IMP3/@g' {output.tmpfile} | " "sed 's@$@/run1/Preprocessing/mg.r1.preprocessed.fq@g' | " "awk -F, '{{print $0=$1\",\"$1}}' | awk 'BEGIN{{FS=OFS=\",\"}} {{gsub(\"r1\", \"r2\", $2)}} 1' | " "sed $'1 i\\\\\\n# Read pairs:' {output.reads}" # using forward-slashes to get `\\\n` rule metabolic: input: fa=os.path.join(RESULTS_DIR, "bins/bin_collection.done"), reads=rules.prep_metabolic.output output: directory(os.path.join(RESULTS_DIR, "metabolic_output")) log: os.path.join(RESULTS_DIR, "logs/metabolic.log") conda: os.path.join(ENV_DIR, "metabolic.yaml") params: gtdbtk=config["metabolic"]["db"], metabolic=config["metabolic"]["directory"] threads: config["metabolic"]["threads"] message: "Running metabolic for all MAGs" |
63 64 65 66 67 68 69 | shell: "(date && " "export GTDBTK_DATA_PATH={params.gtdbtk} && " "export PERL5LIB && export PERL_LOCAL_LIB_ROOT && export PERL_MB_OPT && export PERL_MM_OPT && " """env PERL5LIB="" PERL_LOCAL_LIB_ROOT="" PERL_MM_OPT="" PERL_MB_OPT="" cpanm Array::Split && """ "perl {params.metabolic}/METABOLIC-C.pl -t {threads} -in-gn $(dirname {input.fa}) -r {input.reads} -o {output} && " "date) &> >(tee {log})" |
84 85 86 | run: bin=pd.read_csv(input[0], sep="\t") bin_edited=bin[['selected_by_DASTool', 'classification']] # selecting columns |
101 102 103 104 105 106 107 | run: kegg=pd.read_csv(input[0], sep="\t", skiprows=1) kegg_edited=kegg[['Geneid', 'Chr']] kegg_edited.rename(columns = {'Geneid': 'KEGG', 'Chr': 'Contig'}, inplace=True) kegg_contigs=(kegg_edited.assign(Contig = kegg_edited['Contig'].str.split(';')).explode('Contig').reset_index(drop=True)) kegg_contigs=kegg_contigs.reindex(['Contig','KEGG'], axis=1) kegg_contigs.to_csv(output[0], sep="\t", index=False) |
120 121 122 123 124 125 126 127 128 129 130 | run: kegg=pd.read_csv(input[0], sep="\t", header=0) cov=pd.read_csv(input[1], sep="\t", header=None) cov.rename(columns={0: 'Contig', 1: 'Coverage'}, inplace=True) length=pd.read_csv(input[2], sep="\t", header=None) length.rename(columns={0: 'Contig', 1: 'Length'}, inplace=True) tmp=pd.merge(kegg, cov, on='Contig') all_merged=pd.merge(tmp, length, on='Contig') all_merged.to_csv(output[0], sep="\t", index=False) |
141 142 143 144 | run: scaffold=pd.read_csv(input[0], sep="\t", header=None) scaffold.columns=['Contig', 'Bin'] scaffold.to_csv(output[0], sep="\t", index=False) |
154 155 156 157 158 159 160 161 162 163 | run: opened=[] for ifile in input: df=pd.read_csv(ifile, index_col=None, sep="\t", header=0) df['Sample']=re.sub("_gtdbtk.txt", "", os.path.basename(ifile)) df = df.reindex(['Sample','Bin','Taxa'], axis=1) opened.append(df) frame=pd.concat(opened, axis=0, ignore_index=True) frame.to_csv(output[0], sep="\t", index=False) |
172 173 174 175 | run: opened=[] for ifile in input: df=pd.read_csv(ifile, index_col=None, sep="\t", header=0) |
190 191 192 193 194 195 196 197 | run: opened=[] for ifile in input: df=pd.read_csv(ifile, index_col=None, sep="\t", header=0) opened.append(df) frame=pd.concat(opened, axis=0, ignore_index=True) frame.to_csv(output[0], sep="\t", index=False) |
206 207 208 209 | run: opened=[] for ifile in input: df=pd.read_csv(ifile, index_col=None, sep="\t", header=0) |
230 231 | script: "merge_funcs.R" |
245 246 | shell: "cat {params} | awk '{{print $1\",\"$2}}' | sed '@^user@d' | sed 's@.fasta.contigs@@g' | sed 's@.fasta_sub.contigs@@g' | awk '!visited[$0]++' > {output}" |
261 262 | shell: "(date && summarize-metabolism --input $(dirname {input.bins}) --output {output.sum} --metadata {input.meta} --summary {output.sum}/results/summarize_metabolism.csv --heatmap {output.sum}/results/summarize_metabolism.pdf --aggregate ON --plotting ON && summarize-metabolism --input $(dirname {input.bins}) --output {output.indiv} --metadata {input.meta} --summary {output.indiv}/results/individual_metabolism.csv --heatmap {output.indiv}/results/individual_metabolism.pdf --plotting ON && date) &> >(tee {log})" |
278 279 | shell: "script=$(realpath {params.script}) && cd {params.path} && ${{script}}" |
297 298 299 | shell: "(date && export PATH=$PATH:{params.path} && " "MagicLamp.py LithoGenie -bin_dir $(dirname {input.bins}) -bin_ext fa -out {output} -t {threads} --norm && date) &> >(tee {log})" |
29 30 31 | shell: "(date && ln -vs {input.in1} {output.fout1} && " "ln -vs {input.in2} {output.fout2} && date) &> >(tee {log})" |
42 43 44 45 46 47 48 | shell: "(date && cp -v {input.config} {output.tout1} && " "cp -v {input.launcher} {output.tout2} && " "cp -v {input.runfile} {output.tout3} && " "sed -i 's/\"\$sample\"/{wildcards.sample}/g' {output.tout1} && " "sed -i 's/\"\$sample\"/{wildcards.sample}/g' {output.tout2} && " "sed -i 's/\"\$sample\"/{wildcards.sample}/g' {output.tout3} && date) &> >(tee {log})" |
28 29 | shell: "ln -vs {input} {output}" |
39 40 | shell: "ln -vs {input} {output}" |
63 64 | shell: "(date && prokka --outdir $(dirname {output.FAA}) {input} --cpus {threads} --force && date) &> >(tee {log})" |
74 75 | shell: "for fname in {input.txt} ; do echo echo \"${{fname}}\"\" \"$(echo {input.FAA}) ; done > {output}" |
83 84 85 86 87 88 89 90 91 92 93 | run: with open(output[0], "w") as ofile: # default HMMs for hmm_name, hmm_path in config["mantis"]["default"].items(): ofile.write("%s=%s\n" % (hmm_name, hmm_path)) # custom HMMs for hmm_path in config["mantis"]["custom"]: ofile.write("custom_hmm=%s\n" % hmm_path) # weights for weights_name, weights_value in config["mantis"]["weights"].items(): ofile.write("%s=%f\n" % (weights_name, weights_value)) |
115 116 | shell: "(date && python {config[mantis][path]}/ run_mantis -t {input.FAA} --output_folder $(dirname {output}) --mantis_config {input.config} --hmmer_threads {params.cores} --cores {threads} --memory {config[mantis][single_mem]} --kegg_matrix && date) &> >(tee {log})" |
36 37 | shell: "(date && antismash --cpus {threads} --genefinding-tool prodigal --fullhmmer --pfam2go --asf --cb-knownclusters --clusterhmmer --cf-create-clusters {input} --output-dir $(dirname {output}) && date) &> >(tee {log})" |
65 66 | shell: """(date && sed -n '/##FASTA/q;p' {input} | awk '$3=="CDS"' | awk '{{print $9}}' | awk 'gsub(";.*","")' | awk 'gsub("ID=","")' > {output} && date) &> >(tee {log})""" |
34 35 | shell: "(date && export GTDBTK_DATA_PATH={params} && gtdbtk classify_wf --cpus {threads} -x fa --genome_dir $(dirname {input}) --out_dir {output} && date) &> >(tee {log})" |
51 52 | shell: "(date && checkm lineage_wf -r -t {threads} -x fa $(dirname {input}) {output} && date) &> >(tee {log})" |
42 43 | shell: "(date && python3 ./vibrant/VIBRANT/VIBRANT_run.py -t {threads} -i {input} -folder $(dirname $(dirname {output.viout1})) && date) &> >(tee {log})" |
59 60 61 62 | shell: "(date && python3 {config[convert_files][simplify]} {input} && " "export PATH='/scratch/users/sbusi/tools/miniconda3/envs/vcontact2/bin:$PATH' && " "vcontact2_gene2genome -p {output.tout1} -o {output.tout2} -s '{config[convert_files][type]}') &> >(tee {log})" |
82 83 84 | shell: "(date && export PATH=$PATH:'/scratch/users/sbusi/tools/miniconda3/envs/vcontact2/bin' && " "/scratch/users/sbusi/tools/miniconda3/envs/vcontact2/bin/vcontact2 --force-overwrite --raw-proteins {input.v1} --rel-mode 'Diamond' --proteins-fp {input.v2} --db 'ProkaryoticViralRefSeq94-Merged' --pcs-mode MCL --vcs-mode ClusterONE --c1-bin /home/users/sbusi/apps/vcontact2/cluster_one-1.0.jar --output-dir {output.cout5} && date) &> >(tee {log})" |
121 122 | shell: "(date && kraken2 --threads {threads} --db {config[kraken2][db]} --confidence 0.75 {input} --output {output.summary} --report {output.report} && date) &> >(tee {log})" |
142 143 | shell: "(date && kaiju -z {threads} -t {config[kaiju][db]}/{params.nodes} -f {config[kaiju][db]}/{params.fmi} -i {input.fasta} -o {output} && date) &> >(tee {log})" |
162 163 | shell: "(date && kaiju2table -e -t {config[kaiju][db]}/{params.nodes} -n {config[kaiju][db]}/{params.names} -r {config[kaiju][rank]} -o {output} {input.files} && date) &> >(tee {log})" |
178 179 | shell: "(date && checkv end_to_end -d {config[checkv][db]} {input} $(dirname {output}) -t {threads} && date) &> >(tee {log})" |
198 199 | shell: "(date && antismash --cpus {threads} --genefinding-tool none --genefinding-gff3 {input.GFF} --fullhmmer --pfam2go --asf --cb-knownclusters --clusterhmmer --cf-create-clusters {input.FA} --output-dir $(dirname {output}) && date) &> >(tee {log})" |
Support
- Future updates
Related Workflows





