A metatranscriptomic pipeline optimized for the study of microeukaryotes.

public 1yr ago Version: v1.0 0 bookmarks

View Workflow

This README file is a brief introduction to eukrhythmic . For more detailed information, please visit the readthedocs .

Metatranscriptomic Pipelining

Documentation for eukrhythmic a metatranscriptomics analysis pipeline written as a Snakemake workflow that enables users to run

Trimming
Quality statistics
Assembly (using several available assemblers)
Assembly analysis / quality comparisons

Pipeline initialization
Schematic
File naming
Input sample file
Configuration file entries
How to use the pipeline
Adding new assembler modules
Intermediate files & cleanup

Initializing the pipeline

Initialize the pipeline by setting up a conda environment, such that all the requested packages are loaded.

conda env create metatrans --file environment.yaml

Pipeline schematic

Schematic

File naming

Your file names should not subset one another. So if one file is called "supercool_sample", another should not be called "supercool_sample_2". However, if the two were called "supercool_sample_1" and "supercool_sample_2", this would be fine, because neither name is entirely found within the other.

The `metaT_sample` file

In the config.yaml file, there is a listing for a file called metaT_sample in the configuration file. This is essentially the data input source as far as what sample names you are expecting to include in your analysis, as well as any other information about the samples that you would like to be used. This is essential if you would like to apply groupings and co-assemble several samples together, and in general it is essential for the pipeline to work as intended.

Depending on the application, some columns of this file must be added or may not be necessary. For example, for repeated samples in the same location, the latitude and longitude may not be necessary, because geographic variation in the metatranscriptomic assembly will not be evaluated. Any data that are not included in the default steps in the provided script can be excluded.

As detailed in scripts/make_sampleinfo.ipynb , the minimum required to run the general (default) pipeline without any comparative analysis between samples are the "SampleName", "SampleID" and the "FastqFileNames". SampleName and SampleID can be identical if no such distinction exists in your sample. However, strictly, "SampleName" is a descriptive name for your samples, whereas "SampleID" is how the samples will be named throughout the pipeline, thus is is preferable to minimize special characters and length in the "SampleID" column, whereas "SampleName" may be more verbose.

Autogeneration of full `metaT_sample` file

Using the script scripts/autogenerate_metaT_sample.py , you can autogenerate a working sample file for whatever files are present in the directory specified in your configuration file as inputDIR . This file should be run from the base eukrhythmic directory. At minimum, it requires a name for the output metaT_sample file as a parameter, which will be saved in the input directory, and then should be specified in the configuration file as the metaT_sample file. (TODO: in the future, autopopulate the config file with this entry and allow users to run the pipeline without even running this separately, with a default name for the metaT_sample file, as long as the inputDIR is specified).

So within the base eukrhythmic directory, the following command may be run:

python scripts/autogenerate_metaT_sample.py testsampledata.txt

Optionally, additional parameters may be provided. The second optional parameter is a file extension, which defaults to "fastq.gz". The third and fourth optional parameters are labels for forward and reverse reads (defaults to "_1" and "_2", respectively), and the fifth optional parameter is an additional file suffix used to split the filename (e.g. 001; defaults to the file extension; specifically important for single-end reads).

Autogeneration of "FastqFileNames" column with "SampleID" column

In /scripts/, there is a Python script called make_sample_file.py that will generate the fastq file names column, given your data input folder and an existing metaT_sample file that contains sample IDs. This is a good option if you only wish to run a subset of the files in your input data folder, and want to make sure the fastq file names are properly formatted.

If you use this script, all of the files with the fastq extension listed in your INPUTDIR that have a match to entries in your SampleID column will be included. Optionally, the script accepts up to two input arguments that specify how forward/reverse reads are labeled. By default, "_R", "_1", and "_2" are searched for.

Notes about manually creating `metaT_sample`

If you specify "FastqFileNames" manually, ensure that the files are named uniquely and that the entire unique choice of name is specified in this column . Filenames that match to more than two fastq files in your input directory will raise an exception.

In the event that you want more control over how your samples are named, use scripts/make_sampleinfo.ipynb . The AssemblyGroup column may be omitted if you do not mind if your samples are assigned assembly groups according to numbers 1- n , where n is your number of samples, but in this case separategroups must be set to 0 in your config.yaml file. ( In the future, this may be updated to be done automatically if the column is absent, as well ).

Once you have generated the metaT_sample file containing the information about your samples, and have populated config.yaml with the relevant directories, including, importantly, outputDIR , which will be the location of your results, you are ready to run the pipeline.

Configuration file entries

Below is a listing of each supported entry in the configuration file ( config.yaml in the base directory) and how to specify each flag when using the pipeline.

Flag in file	Meaning & how to specify
`metaT_sample`	The name of the sample file containing sample ids to be used as unique identifiers in the pipeline, descriptive sample names, and input FASTA file names.
`inputDIR`	The file directory where the input data is found. Currently, should be specified with "/" separators, but no trailing "/". Should begin with "/" only if you are going to the root of your file system (not a relative path).
`checkqual`	Boolean flag for whether to run quality checking with `salmon` , `QUAST` , `BUSCO` , etc. on assemblies. If 1, these quality checks are performed.
`spikefile`	A path to a FASTA file containing the sequence of any spiking that might affect reads. This will depend on experimental setup. If the file is not valid (e.g., if this flag is set to 0), nothing is done.
`runbbmap`	A boolean flag to specify whether to use a spike file to drop spiked reads, according to what was done in your experiment. If 1, the spikefile is used; otherwise, this filtering is either not performed or is not used downstream in the pipeline (depending on whether a spike file exists).
`kmers`	A list of k -mer sizes to use, where applicable, in assembly. These should all be integer values (default: 20, 50, 110). The median k -mer value in this list will be used when just 1 k -mer value is required.
`assemblers`	The assemblers to be used to assemble the metatranscriptomes (which will later be merged). All of the specified assemblers in this list should have matching Snakemake rules in the `modules` folder of the main pipeline directory (named identically), as well as "clean" rules (explained below).
`jobname`	A descriptive name to be used to name jobs on your high-performance computing system, such that you can track the progress of your workflow.
`adapter`	Path to a FASTA file containing the adapter used during sequencing. Defaults to a static adapter file in the `static` directory.
`separategroups`	A boolean flag. If 1, specified assembly groups in the `metaT_sample` file are used to co-assemble raw files. Otherwise, each raw file is assembled separately regardless of what is specified in the "AssemblyGroup" column of the input file.
`outputDIR`	The path to a directory where all program output will be stored.
`assembledDIR`	The directory to move assembled files to, relative to the output directory. Defaults to "assembled"; not necessary to specify.
`renamedDIR`	The directory to move "renamed" files to (which are files with the name of the assembler added to each FASTA header), relative to the output directory. Defaults to "assembled"; not necessary to specify.
`scratch`	The location to move unnecessary intermediate files to after computation.

How to use the pipeline

There are three main options for using the pipeline: you can run the pipeline directly using command-line arguments , you can set up a simple batch script with those command line options, and run it via SLURM , or you can treat the pipeline as a normal Snakemake workflow, and edit its components manually by changing the configuration file.

Running the pipeline from the command line

To run the pipeline from the command line, you likely still want to make sure that the sample file is generated the way you intend. To explore what goes into this file, and the way it will be autogenerated when executing eukrhythmic from the command line, please read the section on file naming below.

Other than that, all you need to do to execute euk rhythmic from the command line is to alias the executable while in the eukrhythmic base directory after cloning from GitHub , like so:

alias eukrhythmic='./bin/eukrhythmic.sh'

And then invoke:

eukrhythmic <arguments>

To run eukrhythmic on the provided sample data, simply invoke eukrhythmic on a clean install of the program without arguments. You can also run the sample data by using the argument --use-sample , which will copy the relevant configuration entries. The exception is if you are running on an HPC system that uses the SLURM scheduler, invoke eukrhythmic with one argument: -l or --slurm to leverage your computing resources. In that case, you will also want to invoke sbatch eukrhythmic --slurm , to avoid running any steps on your current remote machine. A list of additional arguments required commonly by users is given below:

Flag	`config.yaml` entry	Meaning
-n / --job-name	jobname	A descriptive/unique name to be applied to SLURM jobs.
-s / --sample-file-name	metaT_sample	The listing of SampleIDs, SampleNames, FastqFiles, and assembly groups for each sample.
-o / --out-dir	outputDIR	The output directory to write to.
-i / --in-dir	inputDIR	The location of the sample FASTQ files (crucial unless running sample data).
-q / --check-quality	checkqual	Takes no arguments. If specified, quality assessment is run on assemblies.
-b / --run-bbmap	runbbmap; spikefile	If your data contains spiked reads, you can list this flag and specify a FASTA file containing the spike.
-l / --slurm	-	If provided, run `snakemake` on SLURM.
-g / --generate-file	-	If provided, create the sample file automatically from the files present in the input directory.
--use-sample	-	If provided, override all other parameters and run the sample file.
-c / --scratch-dir	scratch	Takes argument of the directory to be used as scratch space.

Running the pipeline directly with Snakemake

You can use the command-line arguments described above if you run the pipeline from submit/eukrhythmic.sh . Submitting this, customized with whatever arguments you prefer, to a scheduler (from the home directory) will run the pipeline in full.

To use the pipeline, the most important thing to do is to specify the paths to your particular input and output directories. Personalizing this will allow the pipeline to pull the relevant files when computing the results of individual rules, so this step is crucial. You can edit the configuration file randomly, or use the command-line options specified above . Specifically, you would change this line in eukrhythmic.sh :

eukrhythmic --use-sample --slurm

And add an in-dir flag like so:

eukrhythmic --use-sample --in-dir <name-of-your-input> --out-dir <name-of-your-output>

All input fastq files must be contained in the same directory, the inputDIR location, specified by the --in-dir flag. Only these metatranscriptomic data will be included in the analysis. These files do not , however, need to be located inside the eukrhythmic directory (and it is recommended that they are located elsewhere).

The next thing that needs to be done is to produce the sample file, containing all of the relevant information to run the pipeline. You can create this yourself , using only a list of Sample IDs , or completely automatically , which can be done all in one with the eukrhythmic bash script, by specifying:

eukrhythmic --use-sample --in-dir <name-of-your-input> --out-dir <name-of-your-output> -g

using the -g or --generate-file option, which runs the included script .

If you are using the SLURM scheduler, you can run the pipeline by simply adding/keeping the --slurm flag in the eukrhythmic command, after configuring options for your particular system , or by setting the rewritecluster configuration flag to 1, and specifying the options for all jobs in the required section of the cluster.yaml file.

Running the pipeline manually

To use the pipeline as a Snakemake workflow manually, the most important thing to do is to populate config.yaml with the paths to your particular input and output directories, and to generate the sample file (either manually or using a semi or completely automatic procedure. Personalizing this will allow the pipeline to pull the relevant files when computing the results of individual rules, so this step is crucial. You should also edit any other important parts of the configuration file, as described here .

Once the pieces are in place, and you have either activated an environment using environment.yaml or otherwise installed snakemake , you can run the pipeline using:

sbatch submit/snake_submit.sh

If you are using the SLURM scheduler, you can run the pipeline by simply executing the submit/snake_submit.sh file in the eukrhythmic directory, after configuring options for your particular system , or by setting the rewritecluster configuration flag to 1, and specifying the options for all jobs in the required section of the cluster.yaml file. If you are not using a scheduler, or are logged into a computer with sufficient computational resources (e.g., a SLURM job run in interactive mode), you can execute Snakemake directly.

Running the pipeline with `SLURM`

In order to run the pipeline with SLURM or any other similar scheduling platform on an HPC system, the file cluster.yaml in the base directory needs to be populated. Specifications for individual rules can be added or removed as needed; the default configuration is what must absolutely be specified for the pipeline to run properly. Make sure that you include the following:

Your account name
Any flags that you typically use when running commands on the system, in the __default__ -> slurm -> command string
The partition of your system that you plan to use, as queue . By default, this might be compute or normal .

If you set your account name at the top of the cluster.yaml file, as well as setting the default partition just once, and you do not change the parameter rewritecluster to 0 in config.yaml , you can use the command line interface or the provided submission file to circumvent filling out the rest of cluster.yaml . You can also do this by invoking python scripts/importworkspace.py once before running the pipeline, if you already have a valid config.yaml . If you do this, you won't need to change these values for the specifications for all of the individual rules, unless you have specific computational needs or usage requirements, in which case you should set rewritecluster to 0. If defaults are not specified at the beginning of the cluster.yaml file for the user, maximum memory usage, maximum number of cores, and maximum number of threads, euk rhythmic will not execute successfully and an error will be thrown.

Setting CPUs and memory requirements

As a general rule for memory-intensive assemblers, the memory available to the process should be about ten times the number of cores/CPUs that you have available to you on the machine. For example, if using a machine with 30 cores available and 300 GB of available memory, you may want to configure your jobs to use 15 cores and 150 GB of memory, to allow two jobs to run concurrently on one node, and optimize memory relative to number of cores.

Adding new assemblers

In order to add a new assembler to the pipeline, three things need to be included:

A rule for the assembler, including a command that can be run using software in the specified conda environment, called <assembler> .
A rule to "clean" the assembly output, named <assembler>_clean , including moving the completed assemblies to the shared assembly folder, specified as assembledDIR , which is a subdirectory of the outputDIR , also specified in the configuration file. Intermediate files should also be moved to a scratch directory or deleted, based on user preferences and space requirements. Any other files needed by other tools or desired by the user should be moved to a subdirectory of the output directory. If they are specified as output files, snakemake will generate them automatically. Otherwise, the user will need to manually create directories that do not already exist (specifying them as output files is more extensible).
A list entry for the assembler in the configuration file that matches the name of <assembler> in each snakemake rule.

Intermediate files and cleanup

A "scratch" directory, specified in the configuration file, is used to store intermediate files after execution of rules such as assembly, which produce many files which are not needed downstream in the pipeline. To override this behavior, specify the output directory and the scratch directory to be the same location.

After the pipeline has been run, simply enter

snakemake hardclean --cores 1

To safely remove the scratch directory, if you don't need the intermediate files generated in individual pipeline steps.

Code Snippets

shell:
    """
    #busco --list-datasets
    #-l eukaryota_odb10
    #cd ./scripts/busco
    #python3 setup.py install --user
    #cd ../..
    #python ./scripts/busco/src/busco/run_BUSCO.py -i {input.outputassemblies} --auto-lineage-euk -o {output} -c 8 -m transcriptome --long
    busco -f -i {input.outputassemblies} -l ./input/eukaryota_odb10 -o {params.assembly}_busco -c 8 -m transcriptome --long
    mv run_{params.assembly}_busco {output}
    """

SnakeMake From line 22 of rules/busco-snake.smk

shell:
    '''
    cd-hit-est -d 0 -i {input.infiles} -o {output} -T {params.threads} -M {params.maxmemory} -G o -c {params.identityparam} -aS {params.mincoverageshorter} -aL {params.mincoveragelong} 2> {log.err} 1> {log.out}
    '''

SnakeMake cd-hit From line 30 of rules/cd-hit-snake.smk

shell:
    '''
    cd-hit-est -d 0 -i {input.infiles} -o {output} -T {params.threads} -M {params.maxmemory} -G o -c {params.identityparam} -aS {params.mincoverageshorter} -aL {params.mincoveragelong} 2> {log.err} 1> {log.out}
    '''

SnakeMake cd-hit From line 51 of rules/cd-hit-snake.smk

shell:
    '''
    cd-hit-est -d 0 -i {input.infiles} -o {output} -T {params.threads} -M {params.maxmemory} -G o -c {params.identityparam} -aS {params.mincoverageshorter} -aL {params.mincoveragelong} 2> {log.err} 1> {log.out}
    '''

SnakeMake cd-hit From line 72 of rules/cd-hit-snake.smk

shell:
    '''
    if [ -d  {params.scratch} ] 
    then
        rm -r {params.scratch}
    fi
    '''

SnakeMake From line 15 of rules/hardclean-snake.smk

shell:
    '''
    sed 's/>/>{params.assemblername}_{params.assembly}_/' {input.assemblyfiles} > {output}
    '''

SnakeMake From line 35 of rules/manipnames-snake.smk

shell:
    '''
    if [ ! -f {output} ]; then
        cat {params.assemblyfiles} > {output}
    fi
    '''

SnakeMake From line 49 of rules/manipnames-snake.smk

shell:
    '''
    cat {params.assemblyfiles} > {output}
    '''

SnakeMake From line 64 of rules/manipnames-snake.smk

shell:
    '''
    quast {params.outputassemblies} -o {output} --threads 8 --labels {params.assemblers} 2> {log.err} 1> {log.out}
    '''

SnakeMake QUAST From line 38 of rules/quast-snake.smk

run:
    quastfiles = os.listdir(params.quastdir)
    if len(quastfiles) > 0:
        outputassemblers = pd.DataFrame()
        for r in quastfiles:
            if os.path.isfile(os.path.join(params.quastdir, r, "report.tsv")):
                currentreport = pd.read_csv(os.path.join(params.quastdir, r, "report.tsv"), index_col = 0, sep = "\t")
                for c in range(0,len(currentreport.columns)):
                    currentreport = currentreport.rename(columns={currentreport.columns[c]: (currentreport.columns[c] + 
                                                                                             "_" + str(r))})
                outputassemblers = pd.concat([outputassemblers,currentreport], axis = 1)
    outputassemblers.to_csv(path_or_buf = params.outputfile, sep = "\t")

SnakeMake From line 52 of rules/quast-snake.smk

shell:
    '''
    quast {params.outputassemblies} -o {output} --threads 8 --labels {params.assembly} 2> {log.err} 1> {log.out}
    '''        

SnakeMake QUAST From line 79 of rules/quast-snake.smk

run:
    quastfiles = os.listdir(os.path.join(OUTPUTDIR, "quast"))
    if len(quastfiles) > 0:
        outputassemblers = pd.DataFrame()
        for r in quastfiles:
            if os.path.isfile(os.path.join(params.quastdir, r, "report.tsv")):
                currentreport = pd.read_csv(os.path.join(params.quastdir, r, "report.tsv"), index_col = 0, sep = "\t")
                for c in range(0,len(currentreport.columns)):
                    currentreport = currentreport.rename(columns={currentreport.columns[c]: 
                                                                  (currentreport.columns[c] + "_" + str(r))})
                outputassemblers = pd.concat([outputassemblers,currentreport], axis = 1)
    outputassemblers.to_csv(path_or_buf = params.outputfile, sep = "\t")

SnakeMake QUAST From line 93 of rules/quast-snake.smk

shell:
    """
    salmon index -t {input.fastafile} -i {params.indexname} -k {params.kval} 2> {log.err} 1> {log.out}
    salmon quant -i {params.indexname} -l {params.libtype} -1 {input.left} -2 {input.right} --validateMappings -o {params.outdir} 2>> {log.err} 1>> {log.out}
    """

SnakeMake Quant Salmon From line 50 of rules/salmon-snake.smk

shell:
    """
    salmon index -t {input.fastafile} -i {params.indexname} -k {params.kval} 2> {log.err} 1> {log.out}
    salmon quant -i {params.indexname} -l {params.libtype} -1 {input.left} -2 {input.right} --validateMappings -o {params.outdir} 2>> {log.err} 1>> {log.out}
    """

SnakeMake Quant Salmon From line 72 of rules/salmon-snake.smk

shell:
    """
    salmon index -t {input.assemblyfile} -i {params.indexname} -k {params.kval} 2> {log.err} 1> {log.out}
    salmon quant -i {params.indexname} -l {params.libtype} -1 {input.left} -2 {input.right} --validateMappings -o {params.outdir} 2>> {log.err} 1>> {log.out}
    """

SnakeMake Quant Salmon From line 96 of rules/salmon-snake.smk

shell:
    """
    salmon index -t {input.assemblyfile} -i {params.indexname} -k {params.kval} 2> {log.err} 1> {log.out}
    salmon quant -i {params.indexname} -l {params.libtype} -1 {input.left} -2 {input.right} --validateMappings -o {params.outdir} 2>> {log.err} 1>> {log.out}
    """

SnakeMake Quant Salmon From line 120 of rules/salmon-snake.smk

shell:
    """
    for i in {input.input}; do awk -F, -v OFS=, 'NR==1{{split(FILENAME,a,"/quant.sf");$2= a[1] ""}}1' ${{i}} | awk '{{gsub(".*salmon_quant_assembly_","",$5)}}1' > {output.out}; done
    """

SnakeMake From line 22 of rules/spike-snake.smk

shell:
    """
    awk '
        {{samples[$1] = samples[$1] OFS $NF}}
        END {{
            print "Name", samples["Name"]
            delete samples["Name"]
            for (name in samples) print name, samples[name]
        }}
    ' {input.input} > {output.out}
    """

SnakeMake From line 34 of rules/spike-snake.smk

run:
    final = pd.DataFrame()
    spike=pd.read_csv(input.copiestable, sep=' ')
    counts=pd.read_csv(input.countfile, sep=' ')
    for i in params.IDs:
        df = (spike.loc[spike['ID'] == (i)])
        #Spike calculation reference: https://www.protocols.io/view/internal-genomic-dna-standard-for-quantitative-met-jxdcpi6?step=12
        calc = (counts[i] * ((float((df['bbmap'])) * 
                              (params.spikecopies / (float((df['bbmap-total']))))) / 
                             (float((df['bbmap']))))) / float((df['VolumeFiltered']))
        colNames = calc.name
        rowNames = counts.iloc[:,0]
        calc.index = rowNames
        calc.column_name = colNames
        final = final.append(calc)
    final_transpose = final.T
    print(final_transpose)
    final_transpose.to_csv(output.out, sep = ' ', index=True)

SnakeMake BBMap From line 55 of rules/spike-snake.smk

shell:
    """
    unset PERL5LIB
    mkdir -p {params.wd_path}
    cp {input.merged} {params.merged}.fasta
    (cd {params.wd_path} && TransDecoder.LongOrfs -t {params.filename}.fasta -m {params.size}) 2> {log.err} 1> {log.out}
    (cd {params.wd_path} && TransDecoder.Predict -t {params.filename}.fasta -O --no_refine_starts) 2>> {log.err} 1>> {log.out}
    rm {params.merged}.fasta
    """

SnakeMake TransDecoder From line 31 of rules/transdecoder-snake.smk

shell:
    """
    unset PERL5LIB
    bedtools getfasta -fi {input.merged} -bed {params.merged}.fasta.transdecoder.bed -fo {output.cds}
    """

SnakeMake BEDTools From line 58 of rules/transdecoder-snake.smk

shell:
    """
    mv {input.pep} {output.pep}
    mv {input.cds} {output.cds}
    mv {input.gff} {output.gff}
    mv {input.bed} {output.bed}
    rm -rf {params.merged}.fasta.transdecoder_dir*
    rm -rf pipeliner.*.cmds
    """

SnakeMake From line 81 of rules/transdecoder-snake.smk

shell:
    """
    unset PERL5LIB
    mkdir -p {params.wd_path}
    cp {input.merged} {params.merged}.fasta
    (cd {params.wd_path} && TransDecoder.LongOrfs -t {params.filename}.fasta -m {params.size}) 2> {log.err} 1> {log.out}
    (cd {params.wd_path} && TransDecoder.Predict -t {params.filename}.fasta -O --no_refine_starts) 2>> {log.err} 1>> {log.out}
    rm {params.merged}.fasta
    """

SnakeMake TransDecoder From line 109 of rules/transdecoder-snake.smk

shell:
    """
    unset PERL5LIB
    bedtools getfasta -fi {input.merged} -bed {params.merged}.fasta.transdecoder.bed -fo {output.cds}
    """

SnakeMake BEDTools From line 131 of rules/transdecoder-snake.smk

shell:
    """
    mv {input.pep} {output.pep}
    mv {input.cds} {output.cds}
    mv {input.gff} {output.gff}
    mv {input.bed} {output.bed}
    rm -rf {params.merged}.fasta.transdecoder_dir*
    rm -rf pipeliner.*.cmds
    """

SnakeMake From line 154 of rules/transdecoder-snake.smk

shell:
    """
    unset PERL5LIB
    mkdir -p {params.wd_path}
    cp {input.clustered} {params.merged}.fasta
    (cd {params.wd_path} && TransDecoder.LongOrfs -t {params.filename}.fasta -m {params.size}) 2> {log.err} 1> {log.out}
    (cd {params.wd_path} && TransDecoder.Predict -t {params.filename}.fasta -O --no_refine_starts) 2>> {log.err} 1>> {log.out}
    rm {params.merged}.fasta
    """

SnakeMake TransDecoder From line 182 of rules/transdecoder-snake.smk

shell:
    """
    unset PERL5LIB
    bedtools getfasta -fi {input.merged} -bed {params.merged}.fasta.transdecoder.bed -fo {output.cds}
    """

SnakeMake BEDTools From line 204 of rules/transdecoder-snake.smk

ShowHide 21 more snippets with no or duplicated tags.

Comments

Support

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Created: 1yr ago

Updated: 1yr ago

Maitainers: public

URL: https://github.com/AlexanderLabWHOI/eukrhythmic

Name: eukrhythmic

Version: v1.0

Badge:

Insert copied code into your website to add a link to this workflow.

License: MIT License

Keywords:

FASTQ Sample annotation Transcriptome assembly Quality control report BBMap BEDTools cd-hit Quant QUAST Salmon Snakemake TransDecoder metagenomics

Future updates

Related Workflows

psychip_snakemake — Show Details View Workflow

ENCODE pipeline for histone marks developed for the psychENCODE project

public

psychip pipeline is an improved version of the ENCODE pipeline for histone marks developed for the psychENCODE project. The o...

raw sequence reads Alignment Sequence alignment report macs2 ucsc-bedclip bedGraphToBigWig BEDTools BWA Picard SAMtools Snakemake

Free

Near-real time tracking of SARS-CoV-2 in Connecticut

public

Repository containing scripts to perform near-real time tracking of SARS-CoV-2 in Connecticut using genomic data. This pipeli...

JSON nextclade Augur Biopython FOCUS Pandas Snakemake bs4 epiweeks geopy matplotlib numpy pycountry pycountry-convert uszipcode

Free

cellranger-snakemake-gke — Show Details View Workflow

snakemake workflow to run cellranger on a given bucket using gke.

public

A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...

macs2 ucsc-bedclip bedGraphToBigWig BEDTools BWA Picard SAMtools Snakemake

Free

ATLAS - Three commands to start analyzing your metagenome data

public

Metagenome-atlas is a easy-to-use metagenomic pipeline based on snakemake. It handles all steps from QC, Assembly, Binning, t...

raw sequence reads Genome assembly Annotation track checkm2 gunc prodigal snakemake-wrapper-utils MEGAHIT Atlas BBMap Biopython BioRuby Bwa-mem2 cd-hit CheckM DAS Diamond eggNOG-mapper v2 MetaBAT 2 Minimap2 MMseqs MultiQC Pandas Picard pyfastx SAMtools SemiBin Snakemake SPAdes SqueezeMeta TADpole VAMB CONCOCT ete3 gtdbtk h5py networkx numpy plotly psutil utils metagenomics

Free

175

rna-seq-star-deseq2 — Show Details View Workflow

RNA-seq workflow using STAR and DESeq2

public

This workflow performs a differential gene expression analysis with STAR and Deseq2. The usage of this workflow is described ...

Free

dna-seq-gatk-variant-calling — Show Details View Workflow

This Snakemake pipeline implements the GATK best-practices workflow

public

This Snakemake pipeline implements the GATK best-practices workflow for calling small germline variants. The usage of thi...

VCF raw sequence reads Variant calling genetic variants gatk rust-bio-tools snakemake-wrapper-utils tabix BCFtools BWA FastQC MultiQC Pandas Picard SAMtools Snakemake Trimmomatic Variant Effect Predictor (VEP) common matplotlib numpy seaborn DNA

Free

A metatranscriptomic pipeline optimized for the study of microeukaryotes.

Metatranscriptomic Pipelining

Table of Contents

Initializing the pipeline

Pipeline schematic

File naming

The metaT_sample file

Autogeneration of full metaT_sample file

Autogeneration of "FastqFileNames" column with "SampleID" column

Notes about manually creating metaT_sample

Configuration file entries

How to use the pipeline

Running the pipeline from the command line

Running the pipeline directly with Snakemake

Running the pipeline manually

Running the pipeline with SLURM

Setting CPUs and memory requirements

Adding new assemblers

Intermediate files and cleanup

Code Snippets

Comments

Support

Free

Related Workflows

public

public

public

public

public

public

The `metaT_sample` file

Autogeneration of full `metaT_sample` file

Notes about manually creating `metaT_sample`

Running the pipeline with `SLURM`