Alignment and antibody assembly pipelines for Croote et al. (Science, 2018)
Help improve this workflow!
This workflow has been published but could be further improved with some additional meta data:- Keyword(s) in categories input, output, operation, topic
You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .
Alignment and assembly workflows associated with Croote et al. (2018). These workflows were developed to analyze single human B cells isolated from the peripheral blood of food-allergic individuals and assume single-cell RNA-sequencing were data generated using the Smart-seq2 chemistry (10X and drop-seq data will not assemble).
About
This repository contains two snakemake -based workflows for processing scRNA-seq data. The alignment (mapping) workflow uses STAR and htseq whereas the antibody heavy / light chain assembly workflow uses BASIC followed by the immcantation framework.
Configuration
Environment
-
These workflows assume you have
conda
(a popular package management system ) in yourPATH
. If you do not haveconda
installed, I would recommending downloading miniconda and respondingyes
when prompted to prepend the install location to yourPATH
. To test the installation run:conda list
. -
To launch the workflows, you need a python 3
conda
environment withsnakemake
andpandas
installed. This can be created by running the following:conda create -q -n snakemake_env python=3.5 snakemake-minimal=5.2.4 pandas
The environment can then be activated by running:
source activate snakemake_env
. -
To avoid the challenges of installing and configuring IgBLAST with the necessary IMGT germline databases for V, (D), and J gene segment calling, this workflow uses the
kleinstein/immcantation
docker / singularity image . The image will automatically be pulled as part of the workflow, simply specify whether to use thedocker
orsingularity
image in theconfig.yaml
file.
Samplesheet
The samplesheet is a two column, tab-delimited file. For each row, the
base
column specifies the path to the directory containing the folder
samplename
. It is assumed that each
samplename
is unique and that within each
base/samplename
directory there is one
*R1*.fastq.gz
file and one
*R2*.fastq.gz
file.
base | samplename |
---|---|
/path/to/seq_run_1 | cell_1 |
/path/to/seq_run_1 | cell_2 |
... | ... |
/path/to/seq_run_2 | cell_500 |
Config file
Edit the
config.yaml
file to point to your samplesheet and modify it as necessary for your sequencing data and computing environment.
Running
Once configured, source your
conda
environment and launch the workflows with the following:
snakemake --use-conda
For cluster computing (e.g. on a SLURM cluster), the following can serve as a template:
snakemake --use-conda --cluster "sbatch --ntasks=1 \
--job-name={params.name} --cpus-per-task={threads} --partition={params.partition} \
--mem={resources.mem_mb} -o logs/{params.name}.%j.log" -j 10 -w 200 -k
If you would like to run only one of the workflows, edit the Snakefile
all
rule or add the desired output e.g.
combined_assemblies.tsv
to the end of the snakemake call.
Expected outputs
Assembly workflow
The final output is a tab-delimited file containing parsed assembly data from all cells. For example, here are some of the columns:
SEQUENCE_ID | FUNCTIONAL | V_CALL | J_CALL | CDR3_IMGT | C_CALL | C_LEN | SAMPLENAME |
---|---|---|---|---|---|---|---|
cell_id=transcripts;heavy_chain;BASIC | T | IGHV3-23 01,IGHV3-23D 01 | IGHJ4*02 | GCGAAAGATGAGTGGAAACCACCTCGCCGCGTTGACTAC | IGHA1*01 | 1059 | PA16P1B11 |
cell_id=transcripts;light_chain;BASIC | T | IGKV1-9*01 | IGKJ1*01 | CAACAGCTTAATTTTTATCCGTGGACG | IGKC*01 | 321 | PA16P1B11 |
Alignment workflow
The final output is a counts table where each row is an Ensembl gene and each column is a cell.
Testing
Travis CI
is used for automated continuous integration testing of the assembly workflow with scRNA-seq reads downsampled from two human plasmablasts. Unfortunately, the large memory requirements of STAR prevent similar testing of the alignment workflow. While not originally developed for species other than human, these workflows will also work for mouse by changing
species
in
config.yaml
.
If you would like to test the assembly workflow, edit
.test/config.yaml
according to your singularity / docker environment and then run the following from the current directory:
snakemake --use-conda --directory .test combined_assemblies.tsv
Code Snippets
22 23 24 25 26 27 28 29 30 31 | shell: "fa=resources/star/{params.fasta_name}.fa.gz && " "gtf=resources/star/{params.gtf_name}.gtf.gz && " "wget {params.fasta_url} -O $fa && " "wget {params.gtf_url} -O $gtf && " "zcat $fa > {output.fasta} && " "zcat $gtf > {output.gtf} && " "if [ {params.include_ERCCs} = true ]; then " "cat resources/star/ERCC92.fa >> {output.fasta} && " "cat resources/star/ERCC92.gtf >> {output.gtf}; fi" |
57 58 59 60 61 62 63 64 65 66 67 | shell: "genomedir=$(dirname {output}) && " "mkdir -p $genomedir && " "cd $(dirname $genomedir) && " "STAR --runMode genomeGenerate " "--genomeDir $genomedir " "--genomeFastaFiles {input.fasta} " "--sjdbGTFfile {input.gtf} " "--sjdbOverhang {params.star_read_len} " "--sjdbFileChrStartEnd {input.sjadditional} " "--genomeSAsparseD 2 --runThreadN {threads}" |
96 97 98 99 100 101 102 103 104 105 106 107 108 109 | shell: "wdir=$(dirname {output}) && " "echo $wdir && " "mkdir -p $wdir && " "STAR " "--genomeDir $(dirname {input[0]}) " "--readFilesIn {input[1]} {input[2]} " "--readFilesCommand gunzip -c " "--outSAMmapqUnique 60 " "--outFilterMismatchNmax 999 " "--outFilterMismatchNoverReadLmax 0.1 " "--twopassMode Basic " "--runThreadN {threads} " "--outSAMtype BAM Unsorted " "--outFileNamePrefix $wdir/" |
133 134 135 | shell: "mkdir -p $(dirname {output}) && " "htseq-count -s no -r name -f bam -m intersection-nonempty " "{input.bam} {input.gtf} > {output}" |
156 157 | shell: "python {params.scripts_dir}/combine_counts.py {config[samplesheet]} {output}" |
22 23 24 25 26 27 28 | shell: "outdir=$(dirname {output}) && " "cd $(dirname $outdir) && " "BASIC.py -b $(which bowtie2) " "-g {config[species]} -PE_1 $(basename {input[0]}) " "-PE_2 $(basename {input[1]}) " "-p {threads} -n transcripts -i {config[receptor]} " "-t {params.scratch} -a -o basic" |
52 53 54 55 | shell: 'blastn ' '-db {params.const_db_dir}/imgt_{config[species]}_{params.ig_receptor}_combined.fasta ' '-query {input} -outfmt "6 qseqid sacc length pident mismatch gaps qstart qend score evalue" ' '-out {output} -task megablast -word_size 20 -num_threads {threads} -max_target_seqs 3' |
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 | run: if os.stat(str(input.assembly)).st_size == 0: print('Input assembly fasta is empty. Touching empty outputs') for out in output: with open(str(out), 'w') as outfile: pass else: if params.container_type == 'docker': # docker needs to bind an absolute path abs_dir_path = os.path.dirname(os.path.abspath(str(input.assembly))) assembly_name = os.path.basename(str(input.assembly)) # run docker with current user:group (avoids permissions issues) shell( 'docker run -v {abs_dir_path}:/data:z -u `stat -c "%u:%g" $PWD` ' 'kleinstein/immcantation:2.6.0 changeo-igblast ' '-o /data -s /data/{assembly_name} -n igblast ' '-p 1 -g {config[species]} -t ig') else: # singularity shell( "{params.singularity_pre_cmd} " "singularity exec -B $(dirname {output[0]}):/data " "{input.img} changeo-igblast " "-o /data -s {input.assembly} -n igblast " "-p 1 -g {config[species]} -t ig") |
119 120 121 | shell: "python {params.scripts_dir}/merge_changeo_constant.py " "{input.changeo} {input.constant} {output}" |
142 143 | shell: "python {params.scripts_dir}/combine_assemblies.py {config[samplesheet]} {output}" |
17 18 19 | shell: "docker pull kleinstein/immcantation:2.6.0 && " "echo $(date) > {output}" |
35 36 37 38 39 40 | shell: "{params.singularity_pre_cmd} " "cd $(dirname {output}) && " "img=$(basename {output}) && " "singularity pull --name $img " "docker://kleinstein/immcantation:2.6.0" |
Support
- Future updates
Related Workflows





