Help improve this workflow!
This workflow has been published but could be further improved with some additional meta data:- Keyword(s) in categories input, output, operation, topic
You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .
Integrative Pipeline for Splicing Analysis
Installation & Run
To use this program you must have python environment with the following programs and libraries installed:
-
snakemake
-
pysam
-
pandas
-
numpy
Such environment may be created via conda:
-
conda create -n ipsa
-
conda activate ipsa
-
conda install -c conda-forge -c bioconda snakemake pysam pandas
To install the pipeline:
git clone https://github.com/Leoberium/pyIPSA.git
To run (test run if input directory is empty):
-
Python environment with required libraries must be active
-
Make directory with Snakefile active
-
Run
snakemake
command
Test run produces empty
aggregated_junction_stats.tsv
file in output directory.
Options for
snakemake
command are available in
snakemake documentation
.
For Arcuda users
Just load module with Python and install libraries:
-
module load ScriptLang/python/3.8.3
-
pip3 install --user --upgrade snakemake pandas pysam
After that you can run pipeline using cluster engine:
snakemake --cluster qsub --j <number of jobs>
Working folders
Folder in repository:
-
config
- the folder with config file, where you set up your pipeline -
deprecated
- the folder with old scripts not used in workflow -
known_SJ
- the folder with annotated splice junctions -
workflow
- the folder with working scripts of the pipeline
Additional directories created
-
genomes
- the folder which stores all downloaded genomes -
annotations
- the folder which stores all downloaded annotations
Code Snippets
6 7 8 9 10 | shell: """ wget -O {output.genome}.gz {params.url} gunzip {output.genome}.gz """ |
12 13 | run: pysam.index(input.bam) |
28 29 30 31 32 33 34 35 | shell: "python3 -m workflow.scripts.count_junctions " "-i {input.bam} " "-k {input.known} " "-o {output.junctions} " "-l {output.library_stats} " "{params.primary} {params.unique} " "-t {params.threads}" |
43 44 45 46 | shell: "python3 -m workflow.scripts.gather_library_stats " "{OUTPUT_DIR}/J1 " "-o {output.tsv}" |
59 60 61 62 63 64 65 66 | shell: "python3 -m workflow.scripts.aggregate_junctions " "-i {input.junctions} " "-s {input.library_stats} " "-o {output.aggregated_junctions} " "--min_offset {params.min_offset} " "--min_intron_length {params.min_intron_length} " "--max_intron_length {params.max_intron_length}" |
81 82 83 84 85 86 | shell: "python3 -m workflow.scripts.annotate_junctions " "-i {input.aggregated_junctions} " "-k {input.known_sj} " "-f {input.genome} " "-o {output.annotated_junctions}" |
96 97 98 99 100 101 | shell: "python3 -m workflow.scripts.choose_strand " "-i {input.annotated_junctions} " "-r {input.ranked_list} " "-o {output.stranded_junctions} " "-s {output.junction_stats}" |
109 110 111 112 113 114 115 116 117 118 119 120 121 | run: d = defaultdict(list) for replicate in input.junction_stats: p = Path(replicate) name = Path(p.stem).stem with p.open("r") as f: d["replicate"].append(name) for line in f: if line.startswith("-"): break left, right = line.strip().split(": ") d[left].append(right) df = pd.DataFrame(d) |
136 137 138 139 140 141 142 | shell: "python3 -m workflow.scripts.filter " "-i {input.stranded_junctions} " "-e {params.entropy} " "-c {params.total_count} " "{params.gtag} " "-o {output.filtered_junctions}" |
150 151 152 153 | shell: "python3 -m workflow.scripts.merge_junctions " "{input.stranded_junctions} " "-o {output.merged_junctions}" |
12 13 14 15 16 17 | shell: "python3 -m workflow.scripts.count_polyA " "-i {input.bam} " "-o {output.polyA} " "{params.primary} {params.unique} " "-t {params.threads}" |
28 29 30 31 32 33 | shell: "python3 -m workflow.scripts.aggregate_polyA " "-i {input.polyA} " "-s {input.library_stats} " "-o {output.aggregated_polyA} " "--min_overhang {params.min_overhang} " |
13 14 15 16 17 18 19 20 | shell: "python3 -m workflow.scripts.count_sites " "-i {input.bam} " "-j {input.junctions} " "-s {input.stats} " "-o {output.pooled_sites} " "{params.primary} {params.unique} " "-t {params.threads}" |
31 32 33 34 35 36 | shell: "python3 -m workflow.scripts.aggregate_sites " "-i {input.sites} " "-s {input.stats} " "-o {output.aggregated_pooled_sites} " "-m {params.min_offset}" |
47 48 49 50 51 52 53 | shell: "python3 -m workflow.scripts.filter " "-i {input.aggregated_pooled_sites} " "--sites " "-e {params.entropy} " "-c {params.total_count} " "-o {output.filtered_pooled_sites}" |
13 14 15 16 17 18 19 20 | shell: "python3 -m workflow.scripts.count_sites " "-i {input.bam} " "-j {input.junctions} " "-s {input.stats} " "-o {output.sites} " "{params.primary} {params.unique} " "-t {params.threads}" |
31 32 33 34 35 36 | shell: "python3 -m workflow.scripts.aggregate_sites " "-i {input.sites} " "-s {input.stats} " "-o {output.aggregated_sites} " "-m {params.min_offset}" |
47 48 49 50 51 52 53 | shell: "python3 -m workflow.scripts.filter " "-i {input.aggregated_sites} " "--sites " "-e {params.entropy} " "-c {params.total_count} " "-o {output.filtered_sites}" |
52 53 54 55 56 | shell: "python3 -m workflow.scripts.compute_rates " "-j {input.filtered_junctions} " "-s {input.filtered_sites} " "-o {output.rates}" |
65 66 67 68 69 | shell: "python3 -m workflow.scripts.compute_rates " "-j {input.filtered_junctions} " "-s {input.filtered_pooled_sites} " "-o {output.rates}" |
Support
- Future updates
Related Workflows





