A repository to conduct experiments with omnitig-related models for genome assembly.
Help improve this workflow!
This workflow has been published but could be further improved with some additional meta data:- Keyword(s) in categories input, output, operation, topic
You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .
This is a repository to conduct experiments with omnitig-related models in the context of genome assembly. The algorithms are implemented in Rust , and around that we built a snakemake toolchain to conduct experiments. To ensure reproducibility, we wrapped everything into a conda environment.
Usage
Required Software
-
conda >= 4.8.3
(lower might be possible, but has not been tested)
Setup
First, set up the conda environment of this project.
conda env create -f environment.yml
Then, activate the environment.
source activate practical-omnitigs
Running Experiments
Make sure that you are in the right conda environment (should be
practical-omnitigs
).
conda info
Subsequently, experiments can be run using
snakemake
.
snakemake --cores all <experiment>
Valid experiments are:
-
selftest
: Check if you have conda set up correctly. It prints the version ofsnakemake
conda
andwget
. The versions ofsnakemake
andconda
should match the definition in/environment.yaml
and the version ofwget
should match the definition in/config/conda-selftest-env.yaml
. -
test
: execute all integration tests of this project on a single small sample genome. -
test_all
: execute all integration tests of this project on all defined genomes (potentially very large).
The experiments are run inside a conda environment that is set up by snakemake. This ensures reproducibility of the results and automates the installation of required tools.
Using the Implementation Directly
The Rust code written for this project includes a command line interface that can be used directly. For documentation on how to use it, please refer to the documentation of the [cli crate][cli crate].
Troubleshooting
If you have problems with using this software package, take a look at our troubleshooting page . If that does not solve your issue, do not hesitate to file a bug report .
Technical Information
Directory Structure
-
.github
: GitHub workflows for continuous testing. -
.idea
: Configuration for the IntelliJ IDEA integrated development environment. -
config
: All config files related to the experiments, including conda environments and experiment declarations. -
data
: Data used and produced by the experiments. -
external-software
: Location to install external software required by the experiment pipeline. -
implementation
: The algorithms that we are testing. Everything is written in Rust.
Implementation
The algorithms of this project are implemented in Rust.
We split the implementation into multiple library crates to increase the reusability of our code.
On top of that, the
cli
crate provides all implemented functionality via a command line interface.
Refer to [its documentation][cli crate] for more information.
Except for
cli
, all crates are published on
crates.io
.
License
This project is licensed under the terms of the BSD 2-Clause license.
See
LICENSE.md
for more information.
How to Cite
If you use this code in your research project, please cite it as "Safe and Complete Genome Assembly in Practice, DOI: 10.5281/zenodo.4335367"
Code Snippets
319 | shell: "echo 'No target specified'" |
523 524 525 526 527 | shell: """ mkdir -p '{params.hashdir}' echo '{wildcards.report_name} {params.genome_name} {wildcards.report_file_name}' > '{params.name_file}' python3 '{input.script}' '{params.hashdir}' '{params.name_file}' 'none' 'none' '{input.combined_eaxmax_plot}' '{output}' {params.script_column_arguments} """ |
588 589 590 | shell: """ python3 '{input.script}' --source-reports '{params.source_reports_arg}' --source-report-names '{params.source_report_names_arg}' --output '{output.file}' """ |
601 602 603 604 | shell: """ mkdir -p "$(dirname '{output}')" python3 '{input.script}' '{params.input_quast_csvs}' '{output}' """ |
611 | shell: "convert {input} {output}" |
618 619 620 | shell: """ tectonic '{input}' """ |
685 | shell: "${{CONDA_PREFIX}}/bin/time -v '{input.binary}' {params.command} --output-as-wtdbg2-node-ids --file-format wtdbg2 --input '{input.nodes}' --input '{input.reads}' --input '{input.raw_reads}' --input '{input.dot}' --output '{output.file}' --latex '{output.latex}' 2>&1 | tee '{log.log}'" |
712 | shell: "${{CONDA_PREFIX}}/bin/time -v '{input.binary}' {params.command} --file-format dot --input '{input.dot}' --output '{output.file}' --latex '{output.latex}' 2>&1 | tee '{log.log}'" |
745 | shell: "ln -sr '{input.raw_assembly_from_assembler}' '{output.raw_assembly}'" |
768 | shell: "ln -sr '{input.raw_assembly_from_assembler}' '{output.raw_assembly}'" |
794 | shell: "ln -sr '{input}' '{output}'" |
920 | shell: "'{input.script}' --threads {threads} --input-contigs '{input.contigs}' --input-reads '{input.reads}' --output-contigs '{output.broken_contigs}'" |
934 | shell: "'{input.binary}' compute-trivial-omnitigs --non-scc --file-format hifiasm --input '{input.contigs}' --output '{output.trivial_omnitigs}' 2>&1 | tee '{log.log}'" |
968 969 970 971 | shell: """ read -r REFERENCE_LENGTH < '{input.reference_length}' ${{CONDA_PREFIX}}/bin/time -v '{input.binary}' -x {wildcards.wtdbg2_mode} -g $REFERENCE_LENGTH -i '{input.reads}' -t {threads} -fo '{params.output_prefix}' --dump-kbm '{output.kbm}' {params.fragment_correction_steps} 2>&1 | tee '{log.log}' """ |
999 1000 1001 1002 | shell: """ read -r REFERENCE_LENGTH < '{input.reference_length}' ${{CONDA_PREFIX}}/bin/time -v '{input.binary}' -x {wildcards.wtdbg2_mode} -g $REFERENCE_LENGTH -i '{input.reads}' -t {threads} -fo '{params.output_prefix}' --dump-kbm '{output.kbm}' --skip-fragment-assembly {params.fragment_correction_steps} 2>&1 | tee '{log.log}' """ |
1029 1030 1031 1032 1033 | shell: """ read -r REFERENCE_LENGTH < '{input.reference_length}' read -r EDGE_COV < '{input.edge_cov}' ${{CONDA_PREFIX}}/bin/time -v '{input.binary}' -x {wildcards.wtdbg2_mode} -g $REFERENCE_LENGTH -e $EDGE_COV -i '{input.reads}' -t {threads} -fo '{params.output_prefix}' --load-nodes '{input.cached_nodes}' --load-clips '{input.cached_clips}' --load-kbm '{input.cached_kbm}' --inject-unitigs '{input.contigs}' {params.skip_fragment_assembly} {params.fragment_correction_steps} 2>&1 | tee '{log.log}' """ |
1059 1060 1061 1062 1063 | shell: """ read -r REFERENCE_LENGTH < '{input.reference_length}' read -r EDGE_COV < '{input.edge_cov}' ${{CONDA_PREFIX}}/bin/time -v '{input.binary}' -x {wildcards.wtdbg2_mode} -g $REFERENCE_LENGTH -e $EDGE_COV -i '{input.reads}' -t {threads} -fo '{params.output_prefix}' --load-nodes '{input.cached_nodes}' --load-clips '{input.cached_clips}' --load-kbm '{input.cached_kbm}' --inject-fragment-unitigs '{input.fragment_contigs}' 2>&1 | tee '{log.log}' """ |
1079 1080 1081 1082 | shell: """ cd '{params.working_directory}' ${{CONDA_PREFIX}}/bin/time -v gunzip -k wtdbg2.{wildcards.subfile}.gz 2>&1 | tee '{params.abslog}' """ |
1095 | shell: "${{CONDA_PREFIX}}/bin/time -v {input.binary} -t {threads} -i '{input.contigs}' -fo '{output.consensus}' 2>&1 | tee '{log.log}'" |
1114 | shell: "${{CONDA_PREFIX}}/bin/time -v {input.binary} --input {input.contigs} --output {output.contigs} --normal-reads {input.normal_reads} --compute-threads {threads} 2>&1 | tee '{log.log}'" |
1123 | shell: "ln -sr -T '{input.contigs}' '{output.contigs}'" |
1128 1129 1130 | shell: """ grep 'Set --edge-cov to ' '{input}' | sed 's/.*Set --edge-cov to //g' > '{output}' """ |
1151 1152 1153 1154 | shell: """ read -r REFERENCE_LENGTH < '{input.reference_length}' ${{CONDA_PREFIX}}/bin/time -v '{input.script}' -g $REFERENCE_LENGTH -t {threads} -o '{params.output_directory}' --{wildcards.flye_mode} '{input.reads}' 2>&1 | tee '{log.log}' """ |
1162 | shell: "ln -sr -T '{input.contigs}' '{output.contigs}'" |
1188 | shell: "${{CONDA_PREFIX}}/bin/time -v '{input.binary}' --primary -t {threads} -o '{params.output_prefix}' '{input.reads}' 2>&1 | tee '{log}'" |
1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 | run: with open(input.gfa, 'r') as input_file, open(input.alternate_gfa, 'r') as alternate_input_file, open(output.fa, 'w') as output_file: for line in itertools.chain(input_file, alternate_input_file): if line[0] != "S": continue columns = line.split("\t") print(f"Writing contig {columns[1]}...") output_file.write(">{}\n{}\n".format(columns[1], columns[2])) print(f"Wrote all contigs") |
1217 | shell: "${{CONDA_PREFIX}}/bin/time -v '{input.rust_binary}' compute-trivial-omnitigs --file-format hifiasm --input '{input.unitigs}' --output '{output.contigs}' --latex '{output.latex}' --non-scc 2>&1 | tee '{log.log}'" |
1227 | shell: "${{CONDA_PREFIX}}/bin/time -v '{input.rust_binary}' compute-omnitigs --file-format hifiasm --input '{input.unitigs}' --output '{output.contigs}' --latex '{output.latex}' --linear-reduction 2>&1 | tee '{log.log}'" |
1250 1251 1252 1253 | shell: """ RUST_BACKTRACE=full ${{CONDA_PREFIX}}/bin/time -v '{input.script}' '{input.reads}' '{params.output_prefix}' {threads} 2>&1 | tee '{log.log}' ln -sr -T '{params.original_contigs}' '{output.contigs}' """ |
1272 1273 1274 1275 1276 | shell: """ RUST_BACKTRACE=full ${{CONDA_PREFIX}}/bin/time -v '{input.binary}' '{input.reads}' -k 35 -l 12 --density 0.002 --threads {threads} --prefix '{params.output_prefix}' 2>&1 | tee '{log.log}' ${{CONDA_PREFIX}}/bin/time -v '{input.simplify_script}' '{params.output_prefix}' 2>&1 | tee -a '{log.log}' ln -sr -T '{params.original_contigs}' '{output.contigs}' """ |
1295 1296 1297 1298 1299 | shell: """ RUST_BACKTRACE=full ${{CONDA_PREFIX}}/bin/time -v '{input.binary}' '{input.reads}' -k 21 -l 14 --density 0.003 --threads {threads} --prefix '{params.output_prefix}' 2>&1 | tee '{log.log}' ${{CONDA_PREFIX}}/bin/time -v '{input.simplify_script}' '{params.output_prefix}' 2>&1 | tee -a '{log.log}' ln -sr -T '{params.original_contigs}' '{output.contigs}' """ |
1318 1319 1320 1321 1322 | shell: """ mkdir -p '{params.output_dir}' ${{CONDA_PREFIX}}/bin/time -v '{input.binary}' -t {threads} -o '{params.output_dir}' --reads '{input.reads}' 2>&1 | tee '{log.log}' ln -sr -T '{params.original_contigs}' '{output.contigs}' """ |
1342 1343 1344 1345 1346 1347 | shell: """ read -r REFERENCE_LENGTH < '{input.reference_length}' mkdir -p '{params.output_dir}' ${{CONDA_PREFIX}}/bin/time -v canu -assemble -p assembly -d '{params.output_dir}' genomeSize=$REFERENCE_LENGTH useGrid=false -pacbio-hifi '{input.reads}' 2>&1 | tee '{log.log}' ln -sr -T '{params.original_contigs}' '{output.contigs}' """ |
1827 | shell: "${{CONDA_PREFIX}}/bin/time -v '{input.binary}' --threads {params.threads} '{input.reads}' '{output.reads}' 2>&1 | tee '{log}'" |
1845 | shell: "${{CONDA_PREFIX}}/bin/time -v '{input.binary}' --threads {params.threads} '{input.reference}' '{output.reference}' 2>&1 | tee '{log}'" |
1857 | shell: "ln -sr -T '{input.reads}' '{output.reads}'" |
1878 1879 1880 | shell: """ '{input.binary}' -v -T{threads} -P'{params.tmp_dir}' -t1 -p -k{wildcards.fastk_k} '{input.reads}' 2>&1 | tee '{log}' """ |
1897 1898 1899 1900 | shell: """ '{input.binary}' -h10000 '{input.hist}' 2>&1 | tee '{log.histogram}' '{input.binary}' -k -h10000 '{input.hist}' 2>&1 | tee -a '{log.histogram}' """ |
1920 1921 1922 | shell: """ '{input.binary}' -v -T{threads} -P'{params.tmp_dir}' '{input.table}' '{output.table}' 2>&1 | tee '{log}' """ |
1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 | shell: """ ln -sr -T '{input.table}' '{output.table}' for INPUT_FILE_NAME in $(ls '{params.input_dirname}/.{params.input_filename}.'* | xargs -n 1 basename); do INPUT_FILE='{params.input_dirname}'/"${{INPUT_FILE_NAME}}" OUTPUT_FILE_NAME=${{INPUT_FILE_NAME/{params.input_filename}/{params.output_filename}}} OUTPUT_FILE='{params.output_dirname}'/"${{OUTPUT_FILE_NAME}}" ln -sr -T "${{INPUT_FILE}}" "${{OUTPUT_FILE}}" done """ |
1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 | shell: """ ln -sr -T '{input.profile}' '{output.profile}' for INPUT_FILE_NAME in $(ls '{params.input_dirname}/.{params.input_filename}.'* | xargs -n 1 basename); do INPUT_FILE='{params.input_dirname}'/"${{INPUT_FILE_NAME}}" OUTPUT_FILE_NAME=${{INPUT_FILE_NAME/{params.input_filename}/{params.output_filename}}} OUTPUT_FILE='{params.output_dirname}'/"${{OUTPUT_FILE_NAME}}" ln -sr -T "${{INPUT_FILE}}" "${{OUTPUT_FILE}}" done for INPUT_FILE_NAME in $(ls '{params.input_dirname}/.{params.input_filename_pidx}.'* | xargs -n 1 basename); do INPUT_FILE='{params.input_dirname}'/"${{INPUT_FILE_NAME}}" OUTPUT_FILE_NAME=${{INPUT_FILE_NAME/{params.input_filename_pidx}/{params.output_filename_pidx}}} OUTPUT_FILE='{params.output_dirname}'/"${{OUTPUT_FILE_NAME}}" ln -sr -T "${{INPUT_FILE}}" "${{OUTPUT_FILE}}" done """ |
2010 2011 2012 2013 | shell: """ cd '{params.working_directory}' '{params.input_binary}' -v -T{threads} -g{wildcards.himodel_min_valid}:{wildcards.himodel_max_valid} -e{wildcards.himodel_kmer_threshold} '{params.input_prefix}' 2>&1 | tee '{params.log}' """ |
2150 2151 2152 2153 | shell: """ '{input.binary}' -v '{input.reference}' '{input.model}' -o'{params.output_prefix}' {params.sim_params} -p{params.ploidy_tree} -fh -r3541529 -U 2>&1 | tee '{log}' ln -sr -T '{params.output_prefix}.fasta' '{output.reads}' """ |
2167 | shell: "${{CONDA_PREFIX}}/bin/time -v '{input.script}' '{input.reads}' '{output.reads}' {wildcards.read_downsampling_factor}" |
2177 | shell: "${{CONDA_PREFIX}}/bin/time -v seqtk seq -AU '{input.reads}' > '{output.reads}'" |
2192 | shell: "${{CONDA_PREFIX}}/bin/time -v '{input.script}' '{input.reference}' '{output.reference}'" |
2207 | shell: "${{CONDA_PREFIX}}/bin/time -v '{input.script}' '{input.reference}' '{output.reference}'" |
2223 | shell: "${{CONDA_PREFIX}}/bin/time -v '{input.script}' '{input.reference}' '{output.reference}'" |
2234 | shell: "${{CONDA_PREFIX}}/bin/time -v '{input.script}' '{input.reference}' '{output.reference_length}'" |
2245 | shell: "cp '{input.filtered}' '{output.linear}'; data/target/release/cli circularise-genome --input '{input.filtered}' 2>&1 --output '{output.circular}' | tee '{output.log}'" |
2327 | shell: "${{CONDA_PREFIX}}/bin/time -v {input.script} {params.extra_arguments} -t {threads} --no-html --large -o '{output.directory}' {params.references} '{input.contigs}'" |
2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 | run: result = {} for key, input_file_name in params.file_map.items(): with open(input_file_name, 'r') as input_file: values = {} for line in input_file: if "Elapsed (wall clock) time (h:mm:ss or m:ss):" in line: line = line.replace("Elapsed (wall clock) time (h:mm:ss or m:ss):", "").strip() values["time"] = decode_time(line) + values.setdefault("time", 0) elif "Maximum resident set size" in line: values["mem"] = max(int(line.split(':')[1].strip()), values.setdefault("mem", 0)) assert "time" in values, f"No time found in {input_file_name}" assert "mem" in values, f"No mem found in {input_file_name}" result[key] = values sum_time = sum([values["time"] for values in result.values()]) max_mem = max([values["mem"] for values in result.values()]) result["total"] = { "time": sum_time, "mem": max_mem, } with open(output.file, 'w') as output_file: json.dump(result, output_file) |
2458 2459 2460 2461 2462 2463 | shell: """ cd '{input.contig_validator_dir}' # The abundance-min here has nothing to do with the abundance_min from bcalm2 bash run.sh -suffixsave 0 -abundance-min 1 -kmer-size {wildcards.k} -r '../../{input.reference}' -a '../../{output.result}' -i '../../{input.reads}' """ |
2474 | shell: "${{CONDA_PREFIX}}/bin/time -v '{input.converter}' {input.fa} {output.gfa} {wildcards.k}" |
2481 | shell: "Bandage image {input} {output} --width 1000 --height 1000" |
2549 2550 2551 2552 2553 2554 2555 | shell: """ wget --progress=dot:mega -O '{output.file}' '{params.url}' wget --progress=dot:mega -O '{output.checksum_file}' '{params.checksum_url}' CHECKSUM=$(md5sum '{output.file}' | cut -f1 -d' ' | sed 's/[\]//g') cat '{output.checksum_file}' | grep "$CHECKSUM" """ |
2570 2571 2572 2573 2574 2575 2576 2577 | shell: """ wget --progress=dot:mega -O '{output.file}' '{params.url}' wget --progress=dot:mega -O '{output.checksum_file}' '{params.checksum_url}' CHECKSUM=$(md5sum '{output.file}' | cut -f1 -d' ' | sed 's/[\]//g') echo $CHECKSUM cat '{output.checksum_file}' | grep "$CHECKSUM" """ |
2587 2588 2589 | shell: """ wget --progress=dot:mega -O '{output.file}' '{params.url}' """ |
2597 2598 2599 | shell: """ wget --progress=dot:mega -O '{output.file}' '{params.url}' """ |
2607 2608 2609 | shell: """ bioawk -c fastx '{{ print ">" $name "\\n" $seq }}' '{input.file}' > '{output.file}' """ |
2615 | shell: "fastq-dump --stdout --fasta default '{input.file}' > '{output.file}'" |
2624 | shell: "cd '{params.working_directory}'; gunzip -k {wildcards.file}.gz" |
2671 | shell: "ln -sr -T '{input.file}' '{output.file}'" |
2689 | shell: "cat {params.input_files} > '{output.file}'" |
2702 | shell: "python3 '{input.script}' '{input.reads}' '{output.reads}' 2>&1 | tee '{log.log}'" |
2715 | shell: "cargo fetch --manifest-path 'implementation/Cargo.toml' 2>&1 | tee '{log.log}'" |
2727 | shell: "cargo test -j {threads} --target-dir '{params.rust_dir}' --manifest-path 'implementation/Cargo.toml' --offline 2>&1 | tee '{log.log}'" |
2739 | shell: "cargo build -j {threads} --release --target-dir '{params.rust_dir}' --manifest-path 'implementation/Cargo.toml' --offline 2>&1 | tee '{log.log}'" |
2747 2748 2749 2750 2751 2752 2753 2754 | shell: """ mkdir -p '{params.external_software_scripts_dir}' cd '{params.external_software_scripts_dir}' rm -rf convertToGFA.py wget https://raw.githubusercontent.com/GATB/bcalm/v2.2.3/scripts/convertToGFA.py chmod u+x convertToGFA.py """ |
2763 2764 2765 2766 2767 2768 2769 2770 2771 2772 2773 | shell: """ mkdir -p '{params.external_software_dir}' cd '{params.external_software_dir}' rm -rf ContigValidator git clone --recursive https://github.com/mayankpahadia1993/ContigValidator.git cd ContigValidator/src echo 'count_kmers: count_kmers_kmc' >> Makefile sed -i 's\\count_kmers: count_kmers_kmc.cpp KMC/kmc_api/kmc_file.o\\count_kmers_kmc: count_kmers_kmc.cpp KMC/kmc_api/kmc_file.o\\g' Makefile LIBRARY_PATH="../../sdsl-lite/lib" CPATH="../../sdsl-lite/include" make -j {threads} """ |
2780 2781 2782 2783 2784 2785 2786 2787 2788 | shell: """ mkdir -p '{params.external_software_dir}' cd '{params.external_software_dir}' rm -rf quast git clone https://github.com/sebschmi/quast cd quast git checkout cf1461f48e937488928b094946bb591cd5b325a3 """ |
2796 2797 2798 2799 2800 2801 2802 2803 2804 2805 | shell: """ mkdir -p '{params.external_software_dir}' cd '{params.external_software_dir}' rm -rf sdsl-lite git clone https://github.com/simongog/sdsl-lite.git cd sdsl-lite git checkout v2.1.1 HOME=`pwd` ./install.sh """ |
2813 2814 2815 2816 2817 2818 2819 2820 2821 2822 2823 2824 2825 2826 | shell: """ mkdir -p '{params.external_software_dir}' cd '{params.external_software_dir}' rm -rf Ratatosk git clone --recursive https://github.com/GuillaumeHolley/Ratatosk.git cd Ratatosk git checkout --recurse-submodules 74ca617afb20a7c24d73d20f2dcdf223db303496 mkdir build cd build cmake .. make -j {threads} """ |
2834 2835 2836 2837 2838 2839 2840 2841 2842 2843 2844 | shell: """ mkdir -p '{params.external_software_dir}' cd '{params.external_software_dir}' rm -rf wtdbg2 git clone https://github.com/sebschmi/wtdbg2.git cd wtdbg2 git checkout 78c3077b713aaee48b6c0835105ce6c666f6e796 sed -i 's:CFLAGS=:CFLAGS=-I${{CONDA_PREFIX}}/include -L${{CONDA_PREFIX}}/lib :g' Makefile """ |
2858 2859 2860 2861 | shell: """ cd '{params.wtdbg2_dir}' make CC=x86_64-conda-linux-gnu-gcc -j {threads} """ |
2867 2868 2869 2870 2871 2872 2873 2874 2875 2876 2877 2878 | shell: """ mkdir -p '{params.external_software_dir}' cd '{params.external_software_dir}' wget -O sim-it.tar.gz https://github.com/ndierckx/Sim-it/archive/refs/tags/Sim-it1.2.tar.gz rm -rf Sim-it-Sim-it1.2 rm -rf sim-it tar -xf sim-it.tar.gz mv Sim-it-Sim-it1.2/ sim-it/ mv sim-it/Sim-it1.2.pl sim-it/sim-it.pl """ |
2884 2885 2886 2887 2888 2889 2890 2891 2892 2893 2894 | shell: """ mkdir -p '{params.external_software_dir}' cd '{params.external_software_dir}' rm -rf Flye git clone https://github.com/sebschmi/Flye cd Flye git checkout 38921327d6c5e57a59e71a7181995f2f0c04be75 mv bin/flye bin/flye.disabled # rename such that snakemake does not delete it """ |
2906 2907 2908 2909 2910 2911 2912 2913 2914 2915 2916 2917 2918 2919 | shell: """ cd '{params.flye_directory}' export CXX=x86_64-conda-linux-gnu-g++ export CC=x86_64-conda-linux-gnu-gcc # export INCLUDES=-I/usr/include/ # Somehow this is not seen by minimap's Makefile, so we had to change it in our custom version of Flye # The following also doesn't seem to work when building minimap, so again we had to modify minimap's Makefile # export LD_LIBRARY_PATH=$CONDA_PREFIX/lib:${{LD_LIBRARY_PATH:=''}} # Redirect library path to include conda libraries # make # This does not create the python script anymore /usr/bin/env python3 setup.py install mv bin/flye.disabled bin/flye # was renamed such that snakemake does not delete it """ |
2928 2929 2930 2931 2932 2933 2934 2935 2936 2937 2938 2939 2940 2941 2942 | shell: """ mkdir -p '{params.external_software_dir}' cd '{params.external_software_dir}' rm -rf rust-mdbg git clone https://github.com/sebschmi/rust-mdbg cd rust-mdbg git checkout 4ff0122a8c63210820ba0341fa7365d6ac216612 cargo fetch # rename such that snakemake does not delete them mv utils/magic_simplify utils/magic_simplify.original mv utils/multik utils/multik.original """ |
2957 2958 2959 2960 2961 2962 2963 2964 2965 2966 2967 2968 | shell: """ cd '{params.mdbg_directory}' cargo --offline build --release -j {threads} --target-dir '{params.mdbg_target_directory}' # were renamed such that snakemake does not delete them cp utils/magic_simplify.original utils/magic_simplify cp utils/multik.original utils/multik # use built binaries instead of rerunning cargo sed -i 's:cargo run --manifest-path .DIR/../Cargo.toml --release:'"'"'{params.rust_mdbg}'"'"':g' utils/multik sed -i 's:cargo run --manifest-path .DIR/../Cargo.toml --release --bin to_basespace --:'"'"'{params.to_basespace}'"'"':g' utils/magic_simplify """ |
2975 2976 2977 2978 2979 2980 2981 2982 2983 2984 2985 2986 2987 2988 | shell: """ mkdir -p '{params.external_software_dir}' cd '{params.external_software_dir}' rm -rf LJA git clone https://github.com/AntonBankevich/LJA cd LJA git checkout 99f93262c50ff269ee28707f7c3bb77ea00eb576 #sed -i 's/find_package(OpenMP)//g' CMakeLists.txt #sed -i "s:\${{OpenMP_CXX_FLAGS}}:-L${{CONDA_PREFIX}}/lib -lgomp :g" CMakeLists.txt #sed -i "s:\${{OpenMP_C_FLAGS}}:-L${{CONDA_PREFIX}}/lib -lgomp :g" CMakeLists.txt #sed -i "s:\${{OpenMP_EXE_LINKER_FLAGS}}:-L${{CONDA_PREFIX}}/lib -lgomp :g" CMakeLists.txt """ |
2998 2999 3000 3001 3002 3003 3004 3005 3006 | shell: """ cd '{params.lja_directory}' export CXX=x86_64-conda-linux-gnu-g++ export CC=x86_64-conda-linux-gnu-gcc cmake . make -j {threads} """ |
3013 3014 3015 3016 3017 3018 3019 3020 3021 | shell: """ mkdir -p '{params.external_software_dir}' cd '{params.external_software_dir}' rm -rf hifiasm git clone https://github.com/sebschmi/hifiasm cd hifiasm git checkout c914c80547d8cdcfef392291831d6b2fb3b011f5 """ |
3031 3032 3033 3034 3035 3036 | shell: """ cd '{params.hifiasm_directory}' make CXX=x86_64-conda-linux-gnu-g++ CC=x86_64-conda-linux-gnu-gcc CXXFLAGS=-I${{CONDA_PREFIX}}/include -j {threads} #make -j {threads} """ |
3045 3046 3047 3048 3049 3050 3051 3052 3053 3054 3055 | shell: """ mkdir -p '{params.external_software_dir}' cd '{params.external_software_dir}' rm -rf homopolymer-compress-rs git clone https://github.com/sebschmi/homopolymer-compress-rs.git cd homopolymer-compress-rs git checkout d94145fb8fa2868876bccb46dd80c12d3b17c724 cargo fetch """ |
3066 3067 3068 3069 | shell: """ cd '{params.homopolymer_compress_rs_dir}' cargo build --offline --release -j {threads} """ |
3078 3079 3080 3081 3082 3083 3084 3085 3086 3087 3088 | shell: """ mkdir -p '{params.external_software_dir}' cd '{params.external_software_dir}' rm -rf wtdbg2-homopolymer-decompression git clone https://github.com/sebschmi/wtdbg2-homopolymer-decompression.git cd wtdbg2-homopolymer-decompression git checkout 3bec6c0b751a70d53312b359171b9a576f67ebb6 cargo fetch """ |
3099 3100 3101 3102 | shell: """ cd '{params.wtdbg2_homopolymer_decompression_dir}' cargo build --offline --release -j {threads} """ |
3111 3112 3113 3114 3115 3116 3117 3118 3119 3120 3121 | shell: """ mkdir -p '{params.external_software_dir}' cd '{params.external_software_dir}' rm -rf HI.SIM git clone https://github.com/sebschmi/HI.SIM.git cd HI.SIM git checkout 734c25c4df3775761ca8920a7d2d57dc44cac09c sed -i 's:CFLAGS = :CFLAGS = -I${{CONDA_PREFIX}}/include -L${{CONDA_PREFIX}}/lib :g' Makefile """ |
3132 3133 3134 3135 | shell: """ cd '{params.hisim_dir}' make CC=x86_64-conda-linux-gnu-gcc -j {threads} all """ |
3144 3145 3146 3147 3148 3149 3150 3151 3152 3153 3154 3155 3156 | shell: """ mkdir -p '{params.external_software_dir}' cd '{params.external_software_dir}' rm -rf FASTK git clone https://github.com/thegenemyers/FASTK.git cd FASTK git checkout 4604bfcdfd9251d05b27fbd5aef38187e9a9c9ad sed -i 's:CFLAGS = :CFLAGS = -I${{CONDA_PREFIX}}/include -L${{CONDA_PREFIX}}/lib :g' Makefile sed -i 's:CFLAGS = :CFLAGS = -I${{CONDA_PREFIX}}/include :g' HTSLIB/Makefile sed -i 's:LDFLAGS = :LDFLAGS = -L${{CONDA_PREFIX}}/lib :g' HTSLIB/Makefile """ |
3167 3168 3169 3170 3171 3172 | shell: """ cd '{params.fastk_dir}' make CC=x86_64-conda-linux-gnu-gcc -j {threads} deflate.lib make CC=x86_64-conda-linux-gnu-gcc -j {threads} libhts.a make CC=x86_64-conda-linux-gnu-gcc -j {threads} all """ |
3204 3205 3206 3207 | shell: """ mkdir -p data/reports rsync --verbose --recursive --no-relative --include="*/" --include="report.pdf" --include="aggregated-report.pdf" --exclude="*" turso:'/proj/sebschmi/git/practical-omnitigs/data/reports/' data/reports """ |
3211 3212 3213 3214 | shell: """ mkdir -p data/reports rsync --verbose --recursive --no-relative --include="*/" --include="report.pdf" --include="aggregated-report.pdf" --exclude="*" tammi:'/abga/work/sebschmi/practical-omnitigs/reports/' data/reports """ |
Support
- Future updates
Related Workflows





