MetaGOflow: An EOSC-Life Project Workflow for Marine Genomic Observatories' Data Analysis Using MGnify Pipeline
Help improve this workflow!
This workflow has been published but could be further improved with some additional meta data:- Keyword(s) in categories input, output, operation
You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .
MetaGOflow: A workflow for marine Genomic Observatories' data analysis An EOSC-Life project The workflows developed in the framework of this project are based on
pipeline-v5
of the MGnify resource.
Dependencies To run metaGOflow you need to make sure you have the following set on your computing environmnet first:
- python3 [v 3.8+]
- Docker [v 19.+] or Singularity [v 3.7.+]/ Apptainer [v 1.+]
- cwltool [v 3.+]
- rdflib [v 6.+]
- rdflib-jsonld [v 0.6.2]
- ro-crate-py [v 0.7.0]
- pyyaml [v 6.0]
- Node.js [v 10.24.0+]
- Available storage ~235GB for databases
Code Snippets
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 | baseCommand: [emapper.py] inputs: fasta_file: format: edam:format_1929 # FASTA type: File? inputBinding: separate: true prefix: -i label: Input FASTA file containing query sequences db: type: [string?, File?] # data/eggnog.db inputBinding: prefix: --database label: specify the target database for sequence searches (euk,bact,arch, host:port, local hmmpressed database) db_diamond: type: [string?, File?] # data/eggnog_proteins.dmnd inputBinding: prefix: --dmnd_db label: Path to DIAMOND-compatible database data_dir: type: [string?, Directory?] # data/ inputBinding: prefix: --data_dir label: Directory to use for DATA_PATH mode: type: string? inputBinding: prefix: -m label: hmmer or diamond no_annot: type: boolean? inputBinding: prefix: --no_annot label: Skip functional annotation, reporting only hits no_file_comments: type: boolean? inputBinding: prefix: --no_file_comments label: No header lines nor stats are included in the output files cpu: type: int? inputBinding: prefix: --cpu default: 8 annotate_hits_table: type: File? inputBinding: prefix: --annotate_hits_table label: Annotatate TSV formatted table of query->hits dbmem: type: boolean? inputBinding: prefix: --dbmem label: Store the whole eggNOG sqlite DB into memory before retrieving the annotations. This requires ~44 GB of RAM memory available. output: type: string? inputBinding: prefix: -o |
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | baseCommand: [ split_to_chunks.py ] inputs: seqs: # format: edam:format_1929 # collision with concatenate.cwl type: File inputBinding: prefix: -i chunk_size: type: int inputBinding: prefix: -s file_format: type: string? inputBinding: prefix: -f |
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 | baseCommand: [ run_FGS.sh ] # arguments: # ./FragGeneScan -s SRR1620013_MERGED_FASTQ.fasta -o fgs -w 0 -t illumina_10 inputs: input_fasta: format: 'edam:format_1929' type: File inputBinding: separate: true prefix: "-i" output: type: string inputBinding: separate: true prefix: "-o" seq_type: type: string inputBinding: separate: true prefix: "-s" train: type: string? inputBinding: separate: true prefix: "-t" default: "illumina_10" # stdout: stdout.txt # stderr: stderr.txt |
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | baseCommand: [ unite_protein_predictions.py ] inputs: masking_file: type: File? inputBinding: prefix: "--mask" predicted_proteins_prodigal_out: type: File? inputBinding: prefix: "--prodigal-out" predicted_proteins_prodigal_ffn: type: File? inputBinding: prefix: "--prodigal-ffn" predicted_proteins_prodigal_faa: type: File? inputBinding: prefix: "--prodigal-faa" predicted_proteins_fgs_out: type: File inputBinding: prefix: "--fgs-out" predicted_proteins_fgs_ffn: type: File inputBinding: prefix: "--fgs-ffn" predicted_proteins_fgs_faa: inputBinding: prefix: "--fgs-faa" type: File basename: inputBinding: prefix: "--name" type: string genecaller_order: inputBinding: prefix: "--caller-priority" type: string? |
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 | baseCommand: [ fastp ] arguments: [ $(inputs.detect_adapter_for_pe), $(inputs.overrepresentation_analysis), $(inputs.merge), $(inputs.merged_out), $(inputs.cut_right), $(inputs.base_correction), $(inputs.overlap_len_require), $(inputs.force_polyg_tail_trimming), $(inputs.min_length_required), --thread=$(inputs.threads), --html, "fastp.html", --json, "fastp.json", -i, $(inputs.forward_reads), -I, $(inputs.reverse_reads), -o, $(inputs.forward_reads.nameroot).trimmed.fastq, -O, $(inputs.reverse_reads.nameroot).trimmed.fastq ] inputs: detect_adapter_for_pe: type: boolean default: false inputBinding: valueFrom: ${ if (inputs.detect_adapter_for_pe == true){ return '--detect_adapter_for_pe'; } else { return ''; } } overrepresentation_analysis: type: boolean default: false inputBinding: valueFrom: ${ if (inputs.overrepresentation_analysis == true){ return '--overrepresentation_analysis'; } else { return ''; } } merge: type: boolean default: true inputBinding: valueFrom: ${ if (inputs.merge != false){ return '--merge'; } else { return ''; } } merged_out: type: boolean? default: true inputBinding: prefix: --merged_out valueFrom: ${ if (inputs.merge != false){ return inputs.forward_reads.nameroot.split(/_(.*)/s)[0] + '.merged.fastq'; } else { return ''; } } forward_reads: type: File format: - edam:format_1930 # FASTA - edam:format_1929 # FASTQ reverse_reads: format: - edam:format_1930 # FASTA - edam:format_1929 # FASTQ type: File? threads: type: int? default: 1 qualified_phred_quality: type: int? default: 0 inputBinding: valueFrom: ${ if (inputs.qualified_phred_quality > 0) { return '--qualified_quality_phred=' + inputs.qualified_phred_quality } else { return '' } } unqualified_percent_limit: type: int? default: 0 inputBinding: valueFrom: ${ if (inputs.unqualified_percent_limit > 0) { return '--unqualified_percent_limit=' + inputs.unqualified_percent_limit } else { return '' } } min_length_required: type: int? default: 0 inputBinding: valueFrom: ${ if (inputs.min_length_required > 0) { return '--length_required=' + inputs.min_length_required } else { return '' } } force_polyg_tail_trimming: type: boolean? default: false inputBinding: valueFrom: ${ if (inputs.force_polyg_tail_trimming != false){ return '--trim_poly_g'; } else { return ''; } } disable_trim_poly_g: type: boolean? default: false inputBinding: valueFrom: ${ if (inputs.disable_trim_poly_g == true){ return '--disable_trim_poly_g'; } else { return ''; } } base_correction: type: boolean? default: false inputBinding: valueFrom: ${ if (inputs.merge == true && inputs.base_correction == true){ return '--correction'; } else { return ''; } } overlap_len_require: type: int default: 0 inputBinding: valueFrom: ${ if (inputs.merge == true){ return '--overlap_len_require='+inputs.overlap_len_require; } else { return ''; } } cut_right: type: boolean default: true inputBinding: valueFrom: ${ if (inputs.cut_right == true){ return '--cut_right' } else { return '' } } # overlap_diff_limit (default 5) and overlap_diff_limit_percent (default 20%). # Please note that the reads should meet these three conditions simultaneously. |
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | baseCommand: [ "go_summary_pipeline-1.0.py" ] inputs: InterProScan_results: type: File format: edam:format_3475 inputBinding: prefix: --input-file config: type: [string?, File?] inputBinding: prefix: --config default: "go_summary-config.json" output_name: type: string arguments: - "--output-file" - $(inputs.output_name) |
23 24 25 26 27 | baseCommand: [ hmmscan_tab.py ] # old was with sed arguments: - valueFrom: $(inputs.input_table.nameroot).tsv prefix: -o |
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 | baseCommand: ["hmmsearch"] inputs: omit_alignment: type: boolean? inputBinding: position: 1 prefix: "--noali" gathering_bit_score: type: boolean? inputBinding: position: 4 prefix: "--cut_ga" database: type: string doc: | "Database name or path, depending on how your using it." database_directory: type: [string, Directory?] doc: | "Database path" seqfile: format: edam:format_1929 # FASTA type: File inputBinding: position: 6 separate: true arguments: - valueFrom: | ${ if (inputs.database_directory && inputs.database_directory !== "") { var path = inputs.database_directory.path || inputs.database_directory; return path + "/" + inputs.database; } else { return inputs.database; } } position: 5 - prefix: --domtblout valueFrom: $(inputs.seqfile.nameroot)_hmmsearch.tbl position: 2 - prefix: --cpu valueFrom: '4' # hmmer is too verbose # discard all the std output and error - prefix: -o valueFrom: '/dev/null' - valueFrom: '> /dev/null' shellQuote: false position: 10 - valueFrom: '2> /dev/null' shellQuote: false position: 11 |
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 | baseCommand: [ interproscan.sh ] inputs: inputFile: type: File format: edam:format_1929 inputBinding: position: 8 prefix: '--input' label: Input file path doc: >- Optional, path to fasta file that should be loaded on Master startup. Alternatively, in CONVERT mode, the InterProScan 5 XML file to convert. applications: type: string[]? inputBinding: position: 9 itemSeparator: ',' prefix: '--applications' label: Analysis doc: >- Optional, comma separated list of analyses. If this option is not set, ALL analyses will be run. databases: type: [string?, Directory] cpu: type: int default: 8 inputBinding: position: 2 prefix: '--cpu' label: Number of CPUs doc: >- Optional, number of CPUs to use. If not set, the number of CPUs available on the machine will be used. disableResidueAnnotation: type: boolean? inputBinding: position: 11 prefix: '--disable-residue-annot' label: Disables residue annotation doc: 'Optional, excludes sites from the XML, JSON output.' arguments: - position: 0 valueFrom: '--disable-precalc' - position: 1 valueFrom: '--goterms' - position: 2 valueFrom: '--pathways' - position: 3 prefix: '--tempdir' valueFrom: $(runtime.tmpdir) - position: 7 valueFrom: 'TSV' prefix: '-f' - position: 8 valueFrom: $(runtime.outdir)/$(inputs.inputFile.nameroot).IPS.tsv prefix: '-o' |
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 | baseCommand: [ "run_quality_filtering.py" ] inputs: seq_file: type: File # format: edam:format_1929 # FASTA inputBinding: position: 1 label: 'Trimmed sequence file' doc: > Trimmed and FASTQ to FASTA converted sequences file. submitted_seq_count: type: int label: 'Number of submitted sequences' doc: > Number of originally submitted sequences as in the user submitted FASTQ file - single end FASTQ or pair end merged FASTQ file. # stats_file_name: # type: string # default: stats_summary # label: 'Post QC stats output file name' # doc: > # Give a name for the file which will hold the stats after QC. min_length: type: int default: 100 # For assemblies we need to set this in the input YAML to 500 label: 'Minimum read or contig length' doc: > Specify the minimum read or contig length for sequences to pass QC filtering. input_file_format: string outputs: filtered_file: label: Filtered output file format: edam:format_1929 # FASTA type: File outputBinding: glob: $(inputs.seq_file.nameroot).fasta stats_summary_file: label: Stats summary output file type: File outputBinding: glob: $(inputs.seq_file.nameroot).qc_summary arguments: - position: 2 valueFrom: $(inputs.seq_file.nameroot).fasta - position: 3 valueFrom: $(inputs.seq_file.nameroot).qc_summary - position: 4 valueFrom: $(inputs.submitted_seq_count) - position: 5 prefix: '--min_length' valueFrom: $(inputs.min_length) - position: 6 prefix: '--extension' valueFrom: $(inputs.input_file_format) |
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 | baseCommand: ["MGRAST_base.py" ] inputs: QCed_reads: type: File format: edam:format_1929 # FASTA inputBinding: prefix: -i length_sum: label: Prefix for the files assocaited with sequence length distribution type: string default: seq-length.out gc_sum: label: Prefix for the files associated with GC distribution type: string default: GC-distribution.out nucleotide_distribution: label: Prefix for the files associated with nucleotide distribution type: string default: nucleotide-distribution.out summary: label: File names for summary of sequences, e.g. number, min/max length etc. type: string default: summary.out max_seq: label: Maximum number of sequences to sub-sample type: int? default: 2000000 out_dir_name: label: Specifies output subdirectory type: string default: qc-statistics sequence_count: label: Specifies the number of sequences in the input read file (FASTA formatted) type: int outputs: output_dir: label: Contains all stats output files type: Directory outputBinding: glob: $(inputs.out_dir_name) summary_out: label: Contains the summary statistics for the input sequence file type: File format: iana:text/plain outputBinding: glob: $(inputs.out_dir_name)/$(inputs.summary) arguments: - position: 1 prefix: '-o' valueFrom: $(inputs.out_dir_name)/$(inputs.summary) - position: 2 prefix: '-d' valueFrom: | ${ var suffix = '.full'; if (inputs.sequence_count > inputs.max_seq) { suffix = '.sub-set'; } return "".concat(inputs.out_dir_name, '/', inputs.nucleotide_distribution, suffix); } - position: 3 prefix: '-g' valueFrom: | ${ var suffix = '.full'; if (inputs.sequence_count > inputs.max_seq) { suffix = '.sub-set'; } return "".concat(inputs.out_dir_name, '/', inputs.gc_sum, suffix); } - position: 4 prefix: '-l' valueFrom: | ${ var suffix = '.full'; if (inputs.sequence_count > inputs.max_seq) { suffix = '.sub-set'; } return "".concat(inputs.out_dir_name, '/', inputs.length_sum, suffix); } - position: 5 valueFrom: ${ if (inputs.sequence_count > inputs.max_seq) { return '-m '.concat(inputs.max_seq)} else { return ''} } |
21 | baseCommand: [clean_motus_output.sh] |
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | baseCommand: [ motus ] inputs: reads: type: File inputBinding: position: 1 prefix: -s label: merged and QC reads in fastq # format: edam:format_1930 # FASTQ threads: type: int inputBinding: prefix: -t default: 4 arguments: [profile, -c, -q] |
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 | baseCommand: [ "biom-convert.sh" ] inputs: biom: type: File? format: edam:format_3746 # BIOM inputBinding: prefix: --input-fp table_type: type: string? #biom-convert-table.yaml#table_type? inputBinding: prefix: --table-type # --table-type= <- worked for cwlexec separate: true # false <- worked for cwlexec valueFrom: $(inputs.table_type) # $('"' + inputs.table_type + '"') <- worked for cwlexec json: type: boolean? label: Output as JSON-formatted table. inputBinding: prefix: --to-json hdf5: type: boolean? label: Output as HDF5-formatted table. inputBinding: prefix: --to-hdf5 tsv: type: boolean? label: Output as TSV-formatted (classic) table. inputBinding: prefix: --to-tsv header_key: type: string? doc: | The observation metadata to include from the input BIOM table file when creating a tsv table file. By default no observation metadata will be included. inputBinding: prefix: --header-key arguments: - valueFrom: | ${ var ext = ""; if (inputs.json) { ext = "_json.biom"; } if (inputs.hdf5) { ext = "_hdf5.biom"; } if (inputs.tsv) { ext = "_tsv.biom"; } var pre = inputs.biom.nameroot.split('.'); pre.pop() return pre.join('.') + ext; } prefix: --output-fp - valueFrom: "--collapsed-observations" |
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | baseCommand: [ cmsearch-deoverlap.pl ] inputs: - id: clan_information type: [string?, File?] inputBinding: position: 0 prefix: '--clanin' label: clan information on the models provided doc: Not all models provided need to be a member of a clan - id: cmsearch_matches type: File format: edam:format_3475 inputBinding: position: 1 valueFrom: $(self.basename) |
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 | baseCommand: [ cmsearch ] inputs: - id: covariance_model_database type: [string, File] inputBinding: position: 1 - id: cpu type: int? inputBinding: position: 0 prefix: '--cpu' label: Number of parallel CPU workers to use for multithreads - default: false id: cut_ga type: boolean? inputBinding: position: 0 prefix: '--cut_ga' label: use CM's GA gathering cutoffs as reporting thresholds - id: omit_alignment_section type: boolean? inputBinding: position: 0 prefix: '--noali' label: Omit the alignment section from the main output. doc: This can greatly reduce the output volume. - default: false id: only_hmm type: boolean? inputBinding: position: 0 prefix: '--hmmonly' label: 'Only use the filter profile HMM for searches, do not use the CM' doc: | Only filter stages F1 through F3 will be executed, using strict P-value thresholds (0.02 for F1, 0.001 for F2 and 0.00001 for F3). Additionally a bias composition filter is used after the F1 stage (with P=0.02 survival threshold). Any hit that survives all stages and has an HMM E-value or bit score above the reporting threshold will be output. - id: query_sequences type: File format: edam:format_1929 # FASTA inputBinding: position: 2 # streamable: true - id: search_space_size type: int inputBinding: position: 0 prefix: '-Z' label: search space size in *Mb* to <x> for E-value calculations arguments: - position: 0 prefix: '--tblout' valueFrom: | ${ var name = ""; if (typeof inputs.covariance_model_database === "string") { name = inputs.query_sequences.basename + "." + inputs.covariance_model_database.split("/").slice(-1)[0] + ".cmsearch_matches.tbl"; } else { name = inputs.query_sequences.basename + "." + inputs.covariance_model_database.nameroot + ".cmsearch_matches.tbl"; } return name; } - position: 0 prefix: '-o' valueFrom: | ${ var name = ""; if (typeof inputs.covariance_model_database == "string") { name = inputs.query_sequences.basename + "." + inputs.covariance_model_database.split("/").slice(-1)[0] + ".cmsearch.out"; } else { name = inputs.query_sequences.basename + "." + inputs.covariance_model_database.nameroot + ".cmsearch.out"; } return name; } |
12 | baseCommand: [ esl-index.sh ] |
35 | baseCommand: [ esl-sfetch ] |
28 | baseCommand: get_subunits_coords.py |
45 | baseCommand: get_subunits.py |
25 26 27 28 29 | baseCommand: ktImportText arguments: - valueFrom: "krona.html" prefix: -o |
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 | baseCommand: [ 'mapseq2biom.pl' ] inputs: otu_table: type: [string, File] doc: | the OTU table produced for the taxonomies found in the reference databases that was used with MAPseq inputBinding: prefix: --otuTable query: type: File label: the output from the MAPseq that assigns a taxonomy to a sequence format: iana:text/tab-separated-values inputBinding: prefix: --query label: type: string label: label to add to the top of the outfile OTU table inputBinding: prefix: --label taxid_flag: type: boolean? label: output NCBI taxids for all databases bar UNITE inputBinding: prefix: --taxid arguments: - valueFrom: $(inputs.query.basename).tsv prefix: --outfile - valueFrom: $(inputs.query.basename).txt prefix: --krona - valueFrom: $(inputs.query.basename).notaxid.tsv prefix: --notaxidfile |
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 | baseCommand: mapseq inputs: prefix: File sequences: type: File inputBinding: position: 1 format: edam:format_1929 # FASTA database: type: File inputBinding: position: 2 secondaryFiles: .mscluster format: edam:format_1929 taxonomy: type: [string, File] inputBinding: position: 4 threads: type: int? default: 8 inputBinding: prefix: "-nthreads" position: 5 arguments: ['-tophits', '80', '-topotus', '40', '-outfmt', 'simple'] |
28 | baseCommand: [pull_ncrnas.sh] |
44 | baseCommand: [functional_stats.py] |
52 | baseCommand: [write_summaries.py] |
18 19 20 21 22 23 24 25 26 27 28 29 | baseCommand: [ add_header ] inputs: input_table: #format: [edam:format_3475, edam:format_2333] type: File inputBinding: prefix: -i header: type: string inputBinding: prefix: -h |
15 16 17 18 19 20 21 22 23 24 25 | baseCommand: [ count_lines.py ] inputs: sequences: type: File inputBinding: prefix: -f number: type: int inputBinding: prefix: -n |
24 25 26 27 28 29 | baseCommand: [ bash ] arguments: - valueFrom: | expr \$(cat $(inputs.input_file.path) | wc -l) prefix: -c |
25 26 27 28 29 | arguments: - valueFrom: $(inputs.fastq.nameroot).unclean prefix: '-o' baseCommand: [ fastq_to_fasta.py ] |
22 | baseCommand: [ generate_checksum.py ] |
17 18 | baseCommand: [ pigz ] arguments: ["-p", "8", "-c"] |
29 30 31 32 33 34 35 36 37 38 39 40 41 42 | arguments: - prefix: -n valueFrom: | ${ if (inputs.size_limit) { return inputs.size_limit } if (inputs.type_fasta == 'n') { return 1980 } if (inputs.type_fasta == 'p') { return 1442 } } baseCommand: [ split_fasta_by_size.sh ] |
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 | baseCommand: [ megahit ] inputs: memory: type: float? label: Memory to run assembly. When 0 < -m < 1, fraction of all available memory of the machine is used, otherwise it specifies the memory in BYTE. default: 0.9 inputBinding: position: 4 prefix: "--memory" min-contig-len: type: int? default: 500 inputBinding: position: 3 prefix: "--min-contig-len" forward_reads: type: - File? - type: array items: File inputBinding: position: 1 prefix: "-1" reverse_reads: type: - File? - type: array items: File inputBinding: position: 2 prefix: "-2" threads: type: int default: 1 inputBinding: position: 5 prefix: "--num-cpu-threads" |
Support
Do you know this workflow well? If so, you can
request seller status , and start supporting this workflow.
Created: 1yr ago
Updated: 1yr ago
Maitainers:
public
URL:
https://data.emobon.embrc.eu/MetaGOflow/
Name:
a-workflow-for-marine-genomic-observatories-data-a
Version:
eosc-life-gos @ deb5427
Other Versions:
Accessed: 78
Downloaded:
0
Copyright:
Public Domain
License:
Boost Software License 1.0
Keywords:
- Future updates
Related Workflows

ENCODE pipeline for histone marks developed for the psychENCODE project
psychip pipeline is an improved version of the ENCODE pipeline for histone marks developed for the psychENCODE project.
The o...

Near-real time tracking of SARS-CoV-2 in Connecticut
Repository containing scripts to perform near-real time tracking of SARS-CoV-2 in Connecticut using genomic data. This pipeli...

snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...

ATLAS - Three commands to start analyzing your metagenome data
Metagenome-atlas is a easy-to-use metagenomic pipeline based on snakemake. It handles all steps from QC, Assembly, Binning, t...
raw sequence reads
Genome assembly
Annotation track
checkm2
gunc
prodigal
snakemake-wrapper-utils
MEGAHIT
Atlas
BBMap
Biopython
BioRuby
Bwa-mem2
cd-hit
CheckM
DAS
Diamond
eggNOG-mapper v2
MetaBAT 2
Minimap2
MMseqs
MultiQC
Pandas
Picard
pyfastx
SAMtools
SemiBin
Snakemake
SPAdes
SqueezeMeta
TADpole
VAMB
CONCOCT
ete3
gtdbtk
h5py
networkx
numpy
plotly
psutil
utils
metagenomics

RNA-seq workflow using STAR and DESeq2
This workflow performs a differential gene expression analysis with STAR and Deseq2. The usage of this workflow is described ...

This Snakemake pipeline implements the GATK best-practices workflow
This Snakemake pipeline implements the GATK best-practices workflow for calling small germline variants. The usage of thi...