Multimodal chromatin profiling using nanobody-based single-cell CUT&Tag
Help improve this workflow!
This workflow has been published but could be further improved with some additional meta data:- Keyword(s) in categories input, output, operation, topic
You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .
Multimodal chromatin profiling using nanobody-based single-cell CUT&Tag
Marek Bartosovic, Goncalo Castelo-Branco
Code repository related to preprint.
Multimodal chromatin profiling using nanobody-based single-cell CUT&Tag Marek Bartosovic, Gonçalo Castelo-Branco bioRxiv 2022.03.08.483459; doi: https://doi.org/10.1101/2022.03.08.483459
Data availability
Processed files - Seurat objects, fragments file (cellranger), bigwig tracks per cluster and .h5 matrices are available as supplementary files in the GEO repository https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE198467
Reproducing the analysis
Step 1: prepare environment
conda environment is provided in env/environment.yaml
conda env create -f env/environment.yaml
Additional package dependencies that need to be installed:
R
install.packages(c('argparse','ggplot2','funr','Signac','scales','Seurat','rmarkdown','mclust','GGally','BiocManager','patchwork','markdown','UpSetR','pheatmap','viridis','purrr','Rmagic','devtools','raster'))
BiocManager::install(c('ensembldb','EnsDb.Mmusculus.v79','GenomeInfoDb', 'GenomicRanges', 'IRanges', 'Rsamtools','BiocGenerics','rtracklayer','limma','slingshot','BiocGenerics', 'DelayedArray', 'DelayedMatrixStats','limma', 'S4Vectors', 'SingleCellExperiment','SummarizedExperiment', 'batchelor', 'Matrix.utils')))
Have seqtk installed in $PATH
https://github.com/lh3/seqtk
Install papermill for cli for jupyter notebooks
python3 -m pip install papermill
Step 2: Download the data
use fasterq-dump or alternative to download the fastq files
https://github.com/ncbi/sra-tools/wiki/HowTo:-fasterq-dump
GEO repository
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE198467
Step 3: Clone the github repo with analysis code
git clone https://github.com/mardzix/bcd_nano_CUTnTag
Step 4: Modify config
Change config/config.yaml and specify path to
-
Absolute path to fastq files
-
Specific path to the tmp folder (Create any folder, e.g. in home)
-
Specify conda environment to use
-
Modify path to cellranger reference
Step 5: Run the pipeline
Pipeline is implemented in workflow management software Snakemake
Change the cluster-specific profile to your preference (e.g. slurm, condor etc.), or run without profile
For some example profiles see:
-
https://snakemake.readthedocs.io/en/stable/executing/cli.html#profiles
-
https://github.com/Snakemake-Profiles/slurm
-
https://github.com/Snakemake-Profiles/htcondor
snakemake --snakefile code/workflow/Snakefile_single_modality.smk --cores 16 --profile htcondor -p
Johannes Köster, Sven Rahmann, Snakemake—a scalable bioinformatics workflow engine, Bioinformatics, Volume 28, Issue 19, 1 October 2012, Pages 2520–2522, https://doi.org/10.1093/bioinformatics/bts480
Code Snippets
25 26 | shell: 'Rscript {input.script} -i {input.seurat} -m {wildcards.combination} -o {output}' |
39 40 41 42 43 | shell: "Rscript -e \"rmarkdown::render(input='{input.notebook}', " " output_file = '{params.report}', " " params=list(seurat = '{params.seurat_in}', " " output = '{params.seurat_out}'))\" " |
52 53 | shell: "Rscript {input.script} --seurat {input.seurat} --clusters OPC MOL --reduction wnn.umap --out {output.seurat}" |
68 69 | shell: 'Rscript {input.script} -i {input.seurat} -o {output.seurat} {params.clusters} -d {params.idents} -m {params.modalities} -a {params.assay} -t {input.seurat_pt}' |
39 40 | shell: "fasterq-dump -t {params.tmp} -f -e {threads} --split-files --include-technical -o {params.out} {wildcards.SRA}" |
47 48 | shell: "gzip {input}" |
55 56 | shell: "mv {input} {output}" |
70 71 72 73 | shell: 'rm -r results/nbiotech_data/cellranger/{wildcards.sample}/; ' 'cd results/nbiotech_data/cellranger/; ' '/data/bin/cellranger-atac count --id {wildcards.sample} --sample {params.sample} --reference {params.cellranger_ref} --fastqs {params.fastq_dir}' |
82 83 84 | shell: 'bamCoverage -b {input.cellranger_bam} -o {output.bigwig} -p {threads} --minMappingQuality 5 ' ' --binSize 50 --centerReads --smoothLength 250 --normalizeUsing RPKM --ignoreDuplicates --extendReads' |
93 94 95 | shell: 'macs2 callpeak -t {input.cellranger_bam} -g mm -f BAMPE -n {wildcards.sample} ' '--outdir {params.macs_outdir} --keep-dup=1 --llocal 100000 --cutoff-analysis --min-length 1000 --max-gap 1000 2>&1 ' |
104 105 106 | shell: 'macs2 callpeak -t {input.cellranger_bam} -g mm -f BAMPE -n {wildcards.sample} ' '--outdir {params.macs_outdir} --keep-dup=1 --llocal {wildcards.llocal} 2>&1 ' |
115 116 117 118 | shell: 'macs2 callpeak -t {input} -g mm -f BAMPE -n {wildcards.sample} ' '--outdir {params.macs_outdir} --llocal 100000 --keep-dup=1 --broad-cutoff=0.1 ' '--min-length 1000 --max-gap 1000 --broad 2>&1 ' |
125 126 | shell: "wget -O {output} {params.url}" |
134 135 | shell: "bedtools genomecov -bg -g {input.genome} -i {input.fragments} > {output}" |
144 145 | shell: "~/bin/SEACR/SEACR_1.3.sh {input} 0.01 norm relaxed {params.out_prefix}" |
155 156 157 | shell: 'python3 {params.script} {input.fragments} {wildcards.sample} | bgzip > {output.fragments}; ' 'tabix -p bed {output.fragments}' |
169 170 171 172 | shell: 'bedtools intersect -abam {input.bam} -b {input.peaks} -u | samtools view -f2 | ' 'awk -f {params.get_cell_barcode} | sed "s/CB:Z://g" | python3 {params.add_sample_to_list} {wildcards.sample} | ' 'sort -T {params.tmpdir} | uniq -c > {output.overlap} && [[ -s {output.overlap} ]] ; ' |
183 184 185 186 | shell: ' samtools view -f2 {input.bam}| ' 'awk -f {params.get_cell_barcode} | sed "s/CB:Z://g" | python3 {params.add_sample_to_list} {wildcards.sample} | ' 'sort -T {params.tmpdir} | uniq -c > {output.all_bcd} && [[ -s {output.all_bcd} ]] ; ' |
205 206 | shell: "Rscript {params.script} --metadata {input.metadata} --fragments {input.fragments} --bcd_all {input.bcd_all} --bcd_peak {input.bcd_peak} --sample {wildcards.sample} --sample {wildcards.sample} --out_prefix {params.out_prefix}" |
220 221 | shell: "Rscript {input.script} --sample {wildcards.sample} --antibody {params.antibody} --metadata {input.metadata} --fragments {input.fragments} --peaks {input.peaks} --out_prefix {params.out_prefix} --window {wildcards.binwidth} --genome_version {params.genome}" |
228 229 | shell: 'wget -O {output} {params.url}' |
238 239 240 | shell: 'cd {params.dirname}; ' 'tar -xvzf `basename {input.archive}`' |
247 248 | shell: 'wget -O {output.bw_tar} {params.url}' |
287 288 289 | shell: 'cd {params.dirname}; ' 'tar -xvzf `basename {input.bw_tar}`' |
64 65 | shell: "python3 {input.script} -i {input.fastq} -o {params.out_folder} --single_cell --barcode {wildcards.barcode} 2>&1" |
80 81 82 83 | shell: 'rm -rf results/multimodal_data/{wildcards.sample}/cellranger/{wildcards.sample}_{wildcards.antibody}_{wildcards.barcode}/; ' 'cd results/multimodal_data/{wildcards.sample}/cellranger/; ' '/data/bin/cellranger-atac count --id {wildcards.sample}_{wildcards.antibody}_{wildcards.barcode} --reference {params.cellranger_ref} --fastqs {params.fastq_folder}' |
91 92 93 | shell: 'bamCoverage -b {input.cellranger_bam} -o {output.bigwig} -p {threads} --minMappingQuality 5 ' ' --binSize 50 --centerReads --smoothLength 250 --normalizeUsing RPKM --ignoreDuplicates --extendReads' |
102 103 104 | shell: 'macs2 callpeak -t {input.cellranger_bam} -g mm -f BAMPE -n {wildcards.antibody} ' '--outdir {params.macs_outdir} --keep-dup=1 --llocal 100000 --cutoff-analysis --min-length 1000 --max-gap 1000 2>&1 ' |
113 114 115 | shell: 'macs2 callpeak -t {input.cellranger_bam} -g mm -f BAMPE -n {wildcards.antibody} ' '--outdir {params.macs_outdir} --keep-dup=1 --llocal {wildcards.llocal} 2>&1 ' |
124 125 126 127 | shell: 'macs2 callpeak -t {input} -g mm -f BAMPE -n {wildcards.antibody} ' '--outdir {params.macs_outdir} --llocal 100000 --keep-dup=1 --broad-cutoff=0.1 ' '--min-length 1000 --max-gap 1000 --broad 2>&1 ' |
134 135 | shell: 'cut -f1,2 {params.faidx} > {output}' |
143 144 | shell: "bedtools genomecov -bg -g {input.genome} -i {input.fragments} > {output}" |
153 154 | shell: "~/bin/SEACR/SEACR_1.3.sh {input} 0.01 norm relaxed {params.out_prefix}" |
164 165 166 | shell: 'python3 {params.script} {input.fragments} {wildcards.sample} | bgzip > {output.fragments}; ' 'tabix -p bed {output.fragments}' |
178 179 180 181 | shell: 'bedtools intersect -abam {input.bam} -b {input.peaks} -u | samtools view -f2 | ' 'awk -f {params.get_cell_barcode} | sed "s/CB:Z://g" | python3 {params.add_sample_to_list} {wildcards.sample} | ' 'sort -T {params.tmpdir} | uniq -c > {output.overlap} && [[ -s {output.overlap} ]] ; ' |
192 193 194 195 | shell: ' samtools view -f2 {input.bam}| ' 'awk -f {params.get_cell_barcode} | sed "s/CB:Z://g" | python3 {params.add_sample_to_list} {wildcards.sample} | ' 'sort -T {params.tmpdir} | uniq -c > {output.all_bcd} && [[ -s {output.all_bcd} ]] ; ' |
214 215 | shell: "Rscript {params.script} --metadata {input.metadata} --fragments {input.fragments} --bcd_all {input.bcd_all} --bcd_peak {input.bcd_peak} --antibody {wildcards.modality} --sample {wildcards.sample} --out_prefix {params.out_prefix}" |
227 228 | shell: "Rscript {input.script} --sample {wildcards.sample} --antibody {wildcards.antibody} --metadata {input.metadata} --fragments {input.fragments} --out_prefix {params.out_prefix} --window {wildcards.binwidth} --genome_version {params.genome}" |
242 243 244 | shell: "Rscript {input.script} --sample {wildcards.sample} --antibody {wildcards.modality} --metadata {input.metadata} --fragments {input.fragments} " \ " --peaks {input.peaks} --out_prefix {params.out_prefix} --genome_version {params.genome}" |
255 256 | shell: 'zcat {input.fragments} | sort -T {params.tmpdir} -k1,1 -k2,2n | bgzip > {output.fragments_merged} && tabix -p bed {output.fragments_merged}' |
268 269 270 271 272 | shell: "bedToBam -i {input.fragments} -g {input.chrom_sizes} > {output.bam} && " "samtools sort -@ {threads} -o {output.bam_sorted} {output.bam} &&" "samtools index {output.bam_sorted} && " "bamCoverage -b {output.bam_sorted} -o {output.bigwig} -p {threads} --minMappingQuality 5 --binSize 50 --smoothLength 250 --normalizeUsing RPKM --ignoreDuplicates" |
281 282 283 284 | shell: 'macs2 callpeak -t {input} -g mm -f BED -n {wildcards.modality} ' '--outdir {params.macs_outdir} --llocal 100000 --keep-dup=1 --broad-cutoff=0.1 ' '--min-length 1000 --max-gap 1000 --broad --nomodel 2>&1 ' |
293 294 | shell: 'Rscript {input.script} -i {input.seurat} -o {output.seurat}' |
307 308 | shell: 'Rscript {params.script} -i {input.seurat} -o {output.seurat} -a {wildcards.feature} -d 40 -g {params.plot_group} ' |
54 55 | shell: "Rscript {input.script} -i {input.seurat} -r {input.rna} -o {output}" |
68 69 70 71 72 73 74 75 76 | shell: "Rscript -e \"rmarkdown::render(input='{input.notebook}', " " output_file = '{params.report}', " " params=list(out_prefix = '{params.out_prefix}', " " modality = '{wildcards.modality}', " " feature = '{wildcards.feature}', " " input = '{params.out_prefix}{input.seurat}', " " integrated = '{params.out_prefix}{input.integrated}', " # TODO - fix absolute paths integration " output = '{params.out_prefix}{output.seurat}'))\" " |
85 86 | shell: 'Rscript {input.script} --input {input.seurat} --output {output.seurat}' |
95 96 | shell: "Rscript {input.script} --input {input.seurat} --fragments {input.fragments} --output_folder {output.bigwig} --idents {wildcards.idents} " |
105 106 | shell: "Rscript {input.script} -i {input.seurat} -o {output.markers} --idents {wildcards.idents}" |
114 115 | shell: 'Rscript {input.script} -i {input.seurat} -o {output}' |
123 124 | shell: "Rscript {input.script} -i {input.seurat} -o {output.csv} -d {wildcards.ident}" |
132 133 | shell: "python3 {input.script} {input.bam} {wildcards.sample} {output.bam}" |
151 152 | shell: "samtools merge -@ {threads} {output.bam} {input.bam}" |
160 161 162 | shell: 'samtools index {input.bam}; ' 'bamCoverage -b {input.bam} -o {output.bw} -p {threads} --normalizeUsing RPKM' |
172 173 | shell: "python3 {input.script} {input.bam} {input.table} NA {output.bam_files}" |
183 184 | shell: 'sh {input.script} {input.bam} {output.bw} {threads}' |
194 195 196 | shell: 'cat {input.csv} | grep {wildcards.sample} > {output.csv_per_sample}; ' 'python3 {input.script} {input.bam} {output.csv_per_sample} NA {output.bam_per_cluster}' |
205 206 | shell: 'sh {input.script} {input.bam} {output.bw} {threads}' |
214 215 | shell: 'sh {input.script} {input.bam} {output.peaks}' |
225 226 | shell: 'Rscript {input.script} --input {input.csv} --nmarkers {params.nmarkers} --output {output.bed}' |
236 237 | shell: 'python3 {params.script} {input} > {output}' |
Support
- Future updates
Related Workflows





