Introduction
nf-core/scrnaseq is a bioinformatics best-practice analysis pipeline for processing 10x Genomics single-cell RNA-seq data.
This is a community effort in building a pipeline capable to support:
-
Alevin-Fry + AlevinQC
-
STARSolo
-
Kallisto + BUStools
-
Cellranger
-
UniverSC
Documentation
The nf-core/scrnaseq pipeline comes with documentation about the pipeline usage , parameters and output .
Usage
Note If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with
-profile test
before running the workflow on actual data.
First, prepare a samplesheet with your input data that looks as follows:
samplesheet.csv
:
sample,fastq_1,fastq_2,expected_cells
pbmc8k,pbmc8k_S1_L007_R1_001.fastq.gz,pbmc8k_S1_L007_R2_001.fastq.gz,"10000"
pbmc8k,pbmc8k_S1_L008_R1_001.fastq.gz,pbmc8k_S1_L008_R2_001.fastq.gz,"10000"
Each row represents a fastq file (single-end) or a pair of fastq files (paired end).
Now, you can run the pipeline using:
nextflow run nf-core/scrnaseq \
-profile <docker/singularity/.../institute> \
--input samplesheet.csv \
--genome_fasta GRCm38.p6.genome.chr19.fa \
--gtf gencode.vM19.annotation.chr19.gtf \
--protocol 10XV2 \
--aligner <alevin/kallisto/star/cellranger/universc> \
--outdir <OUTDIR>
Warning: Please provide pipeline parameters via the CLI or Nextflow
-params-file
option. Custom config files including those provided by the-c
Nextflow option can be used to provide any configuration except for parameters ; see docs .
For more details and further functionality, please refer to the usage documentation and the parameter documentation .
Decision Tree for users
The nf-core/scrnaseq pipeline features several paths to analyze your single cell data. Future additions will also be done soon, e.g. the addition of multi-ome analysis types. To aid users in analyzing their data, we have added a decision tree to help people decide on what type of analysis they want to run and how to choose appropriate parameters for that.
graph TD
A[sc RNA] -->|alevin-fry| B(h5ad/seurat/mtx matrices)
A[sc RNA] -->|CellRanger| B(h5ad/seurat/mtx matrices)
A[sc RNA] -->|kbpython| B(h5ad/seurat/mtx matrices)
A[sc RNA] -->|STARsolo| B(h5ad/seurat/mtx matrices)
A[sc RNA] -->|Universc| B(h5ad/seurat/mtx matrices)
Options for the respective alignment method can be found here to choose between methods.
Pipeline output
To see the results of an example test run with a full size dataset refer to the results tab on the nf-core website pipeline page. For more details about the output files and reports, please refer to the output documentation .
Credits
nf-core/scrnaseq was originally written by Bailey PJ, Botvinnik O, Marques de Almeida F, Gabernet G, Peltzer A, Sturm G.
We thank the following people for their extensive assistance in the development of this pipeline:
-
@heylf
-
@KevinMenden
-
@FloWuenne
-
@rob-p
Contributions and Support
If you would like to contribute to this pipeline, please see the contributing guidelines .
For further information or help, don't hesitate to get in touch on the
Slack
#scrnaseq
channel
(you can join with
this invite
).
Citations
If you use nf-core/scrnaseq for your analysis, please cite it using the following doi: 10.5281/zenodo.3568187
The basic benchmarks that were used as motivation for incorporating the three available modular workflows can be found in this publication .
We offer all three paths for the processing of scRNAseq data so it remains up to the user to decide which pipeline workflow is chosen for a particular analysis question.
An extensive list of references for the tools used by the pipeline can be found in the
CITATIONS.md
file.
Code Snippets
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | """ #!/usr/bin/env Rscript require(alevinQC) alevinFryQCReport( mapDir = "${alevin_results}/af_map", quantDir = "${alevin_results}/af_quant", permitDir= "${alevin_results}/af_quant", sampleId = "${prefix}", outputFile = "alevin_report_${meta.id}.html", outputFormat = "html_document", outputDir = "./", forceOverwrite = TRUE ) yaml::write_yaml( list( '${task.process}'=list( 'alevinqc' = paste(packageVersion('alevinQC'), collapse='.') ) ), "versions.yml" ) """ |
20 21 22 23 24 25 | """ concat_h5ad.py \\ --input $samplesheet \\ --out combined_matrix.h5ad \\ --suffix "_matrix.h5ad" """ |
28 29 30 | """ touch combined_matrix.h5ad """ |
30 31 32 33 | """ $unzip cat $name | t2g.py --use_version > transcripts_to_genes.txt """ |
22 23 24 25 26 27 28 29 | """ gffread -F $gtf -w "${genome_fasta}.transcriptome.fa" -g $genome_fasta cat <<-END_VERSIONS > versions.yml "${task.process}": gffread: \$(gffread --version 2>&1) END_VERSIONS """ |
22 23 24 25 26 27 28 29 30 31 | """ filter_gtf_for_genes_in_genome.py \\ --gtf $gtf \\ --fasta $fasta \\ -o ${fasta.baseName}_genes.gtf cat <<-END_VERSIONS > versions.yml "${task.process}": python: \$(python --version | sed 's/Python //g') END_VERSIONS """ |
45 46 47 48 49 50 51 52 | """ # convert file types mtx_to_h5ad.py \\ --aligner ${params.aligner} \\ --input filtered_feature_bc_matrix.h5 \\ --sample ${meta.id} \\ --out ${meta.id}/${meta.id}_matrix.h5ad """ |
55 56 57 58 59 60 61 62 63 64 65 66 67 68 | """ # convert file types for input_type in spliced unspliced ; do mtx_to_h5ad.py \\ --aligner ${params.aligner} \\ --sample ${meta.id} \\ --input *count/counts_unfiltered/\${input_type}.mtx \\ --barcode *count/counts_unfiltered/\${input_type}.barcodes.txt \\ --feature *count/counts_unfiltered/\${input_type}.genes.txt \\ --txp2gene ${txp2gene} \\ --star_index ${star_index} \\ --out ${meta.id}/${meta.id}_\${input_type}_matrix.h5ad ; done """ |
71 72 73 74 75 76 77 78 79 80 81 82 83 | """ # convert file types mtx_to_h5ad.py \\ --task_process ${task.process} \\ --aligner ${params.aligner} \\ --sample ${meta.id} \\ --input $mtx_matrix \\ --barcode $barcodes_tsv \\ --feature $features_tsv \\ --txp2gene ${txp2gene} \\ --star_index ${star_index} \\ --out ${meta.id}/${meta.id}_matrix.h5ad """ |
86 87 88 89 90 | """ mkdir ${meta.id} touch ${meta.id}/${meta.id}_matrix.h5ad touch versions.yml """ |
39 40 41 | """ mkdir ${meta.id} """ |
44 45 46 47 48 49 50 51 52 53 54 | """ # convert file types for input_type in spliced unspliced ; do mtx_to_seurat.R \\ *count/counts_unfiltered/\${input_type}.mtx \\ *count/counts_unfiltered/\${input_type}.barcodes.txt \\ *count/counts_unfiltered/\${input_type}.genes.txt \\ ${meta.id}/${meta.id}_\${input_type}_matrix.rds \\ ${aligner} done """ |
57 58 59 60 61 62 63 64 | """ mtx_to_seurat.R \\ $matrix \\ $barcodes \\ $features \\ ${meta.id}/${meta.id}_matrix.rds \\ ${aligner} """ |
67 68 69 70 71 | """ mkdir ${meta.id} touch ${meta.id}/${meta.id}_matrix.rds touch versions.yml """ |
21 22 23 24 25 26 27 28 29 30 | """ check_samplesheet.py \\ $samplesheet \\ samplesheet.valid.csv cat <<-END_VERSIONS > versions.yml "${task.process}": python: \$(python --version | sed 's/Python //g') END_VERSIONS """ |
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | """ # export required var export ALEVIN_FRY_HOME=. # prep simpleaf simpleaf set-paths # run simpleaf index simpleaf \\ index \\ --threads $task.cpus \\ --fasta $genome_fasta \\ $seq_inputs \\ $args \\ -o salmon cat <<-END_VERSIONS > versions.yml "${task.process}": simpleaf: \$(simpleaf -V | tr -d '\\n' | cut -d ' ' -f 2) salmon: \$(salmon --version | sed -e "s/salmon //g") END_VERSIONS """ |
50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 | """ # export required var export ALEVIN_FRY_HOME=. # prep simpleaf simpleaf set-paths # run simpleaf quant gzip -dcf $whitelist > whitelist.uncompressed.txt simpleaf quant \\ -1 ${forward.join( "," )} \\ -2 ${reverse.join( "," )} \\ -i ${index} \\ -o ${prefix}_alevin_results \\ -m $txp2gene \\ -t $task.cpus \\ -c $protocol \\ $expect_cells \\ $unfiltered_command \\ $args $save_whitelist [[ ! -f ${prefix}_alevin_results/af_quant/all_freq.bin ]] && cp ${prefix}_alevin_results/af_quant/permit_freq.bin ${prefix}_alevin_results/af_quant/all_freq.bin cat <<-END_VERSIONS > versions.yml "${task.process}": simpleaf: \$(simpleaf -V | tr -d '\\n' | cut -d ' ' -f 2) salmon: \$(salmon --version | sed -e "s/salmon //g") END_VERSIONS """ |
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 | """ STAR \\ --genomeDir $index \\ --readFilesIn ${reverse.join( "," )} ${forward.join( "," )} \\ --runThreadN $task.cpus \\ --outFileNamePrefix $prefix. \\ --soloCBwhitelist <(gzip -cdf $whitelist) \\ --soloType $protocol \\ --soloFeatures $star_feature \\ $other_10x_parameters \\ $out_sam_type \\ $ignore_gtf \\ $seq_center \\ $cell_filter \\ $args \\ $mv_unsorted_bam if [ -f ${prefix}.Unmapped.out.mate1 ]; then mv ${prefix}.Unmapped.out.mate1 ${prefix}.unmapped_1.fastq gzip ${prefix}.unmapped_1.fastq fi if [ -f ${prefix}.Unmapped.out.mate2 ]; then mv ${prefix}.Unmapped.out.mate2 ${prefix}.unmapped_2.fastq gzip ${prefix}.unmapped_2.fastq fi if [ -d ${prefix}.Solo.out ]; then # Backslashes still need to be escaped (https://github.com/nextflow-io/nextflow/issues/67) find ${prefix}.Solo.out \\( -name "*.tsv" -o -name "*.mtx" \\) -exec gzip {} \\; fi cat <<-END_VERSIONS > versions.yml "${task.process}": star: \$(STAR --version | sed -e "s/STAR_//g") END_VERSIONS """ |
33 34 35 36 37 38 39 40 41 | """ mkdir -p "${prefix}/outs/" touch ${prefix}/outs/fake_file.txt cat <<-END_VERSIONS > versions.yml "${task.process}": cellranger: \$(echo \$( cellranger --version 2>&1) | sed 's/^.*[^0-9]\\([0-9]*\\.[0-9]*\\.[0-9]*\\).*\$/\\1/' ) END_VERSIONS """ |
24 25 26 27 28 29 30 31 32 33 34 35 | """ cellranger \\ mkgtf \\ $gtf \\ ${prefix}.gtf \\ $args cat <<-END_VERSIONS > versions.yml "${task.process}": cellranger: \$(echo \$( cellranger --version 2>&1) | sed 's/^.*[^0-9]\\([0-9]*\\.[0-9]*\\.[0-9]*\\).*\$/\\1/' ) END_VERSIONS """ |
25 26 27 28 29 30 31 32 33 34 35 36 37 | """ cellranger \\ mkref \\ --genome=$reference_name \\ --fasta=$fasta \\ --genes=$gtf \\ $args cat <<-END_VERSIONS > versions.yml "${task.process}": cellranger: \$(echo \$( cellranger --version 2>&1) | sed 's/^.*[^0-9]\\([0-9]*\\.[0-9]*\\.[0-9]*\\).*\$/\\1/' ) END_VERSIONS """ |
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | """ printf "%s %s\\n" $rename_to | while read old_name new_name; do [ -f "\${new_name}" ] || ln -s \$old_name \$new_name done fastqc \\ $args \\ --threads $task.cpus \\ $renamed_files cat <<-END_VERSIONS > versions.yml "${task.process}": fastqc: \$( fastqc --version | sed -e "s/FastQC v//g" ) END_VERSIONS """ |
46 47 48 49 50 51 52 53 54 | """ touch ${prefix}.html touch ${prefix}.zip cat <<-END_VERSIONS > versions.yml "${task.process}": fastqc: \$( fastqc --version | sed -e "s/FastQC v//g" ) END_VERSIONS """ |
23 24 25 26 27 28 29 30 31 32 | """ gffread \\ $gff \\ $args \\ -o ${prefix}.gtf cat <<-END_VERSIONS > versions.yml "${task.process}": gffread: \$(gffread --version 2>&1) END_VERSIONS """ |
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | """ # Not calling gunzip itself because it creates files # with the original group ownership rather than the # default one for that user / the work directory gzip \\ -cd \\ $args \\ $archive \\ > $gunzip cat <<-END_VERSIONS > versions.yml "${task.process}": gunzip: \$(echo \$(gunzip --version 2>&1) | sed 's/^.*(gzip) //; s/ Copyright.*\$//') END_VERSIONS """ |
41 42 43 44 45 46 47 | """ touch $gunzip cat <<-END_VERSIONS > versions.yml "${task.process}": gunzip: \$(echo \$(gunzip --version 2>&1) | sed 's/^.*(gzip) //; s/ Copyright.*\$//') END_VERSIONS """ |
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 | """ kb \\ count \\ -t $task.cpus \\ -i $index \\ -g $t2g \\ $cdna \\ $introns \\ -x $technology \\ $args \\ -o ${prefix}.count \\ ${reads.join( " " )} \\ -m ${memory}G cat <<-END_VERSIONS > versions.yml "${task.process}": kallistobustools: \$(echo \$(kb --version 2>&1) | sed 's/^.*kb_python //;s/positional arguments.*\$//') END_VERSIONS """ |
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | """ kb \\ ref \\ -i kb_ref_out.idx \\ -g t2g.txt \\ -f1 cdna.fa \\ --workflow $workflow_mode \\ $fasta \\ $gtf cat <<-END_VERSIONS > versions.yml "${task.process}": kallistobustools: \$(echo \$(kb --version 2>&1) | sed 's/^.*kb_python //;s/positional arguments.*\$//') END_VERSIONS """ |
46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 | """ kb \\ ref \\ -i kb_ref_out.idx \\ -g t2g.txt \\ -f1 cdna.fa \\ -f2 intron.fa \\ -c1 cdna_t2c.txt \\ -c2 intron_t2c.txt \\ --workflow $workflow_mode \\ $fasta \\ $gtf cat <<-END_VERSIONS > versions.yml "${task.process}": kallistobustools: \$(echo \$(kb --version 2>&1) | sed 's/^.*kb_python //;s/positional arguments.*\$//') END_VERSIONS """ |
28 29 30 31 32 33 34 35 36 37 38 39 40 | """ multiqc \\ --force \\ $args \\ $config \\ $extra_config \\ . cat <<-END_VERSIONS > versions.yml "${task.process}": multiqc: \$( multiqc --version | sed -e "s/multiqc, version //g" ) END_VERSIONS """ |
43 44 45 46 47 48 49 50 51 52 | """ touch multiqc_data touch multiqc_plots touch multiqc_report.html cat <<-END_VERSIONS > versions.yml "${task.process}": multiqc: \$( multiqc --version | sed -e "s/multiqc, version //g" ) END_VERSIONS """ |
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | """ mkdir star STAR \\ --runMode genomeGenerate \\ --genomeDir star/ \\ --genomeFastaFiles $fasta \\ --sjdbGTFfile $gtf \\ --runThreadN $task.cpus \\ $memory \\ $args cat <<-END_VERSIONS > versions.yml "${task.process}": star: \$(STAR --version | sed -e "s/STAR_//g") samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//') gawk: \$(echo \$(gawk --version 2>&1) | sed 's/^.*GNU Awk //; s/, .*\$//') END_VERSIONS """ |
45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 | """ samtools faidx $fasta NUM_BASES=`gawk '{sum = sum + \$2}END{if ((log(sum)/log(2))/2 - 1 > 14) {printf "%.0f", 14} else {printf "%.0f", (log(sum)/log(2))/2 - 1}}' ${fasta}.fai` mkdir star STAR \\ --runMode genomeGenerate \\ --genomeDir star/ \\ --genomeFastaFiles $fasta \\ --sjdbGTFfile $gtf \\ --runThreadN $task.cpus \\ --genomeSAindexNbases \$NUM_BASES \\ $memory \\ $args cat <<-END_VERSIONS > versions.yml "${task.process}": star: \$(STAR --version | sed -e "s/STAR_//g") samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//') gawk: \$(echo \$(gawk --version 2>&1) | sed 's/^.*GNU Awk //; s/, .*\$//') END_VERSIONS """ |
70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 | """ mkdir star touch star/Genome touch star/Log.out touch star/SA touch star/SAindex touch star/chrLength.txt touch star/chrName.txt touch star/chrNameLength.txt touch star/chrStart.txt touch star/exonGeTrInfo.tab touch star/exonInfo.tab touch star/geneInfo.tab touch star/genomeParameters.txt touch star/sjdbInfo.txt touch star/sjdbList.fromGTF.out.tab touch star/sjdbList.out.tab touch star/transcriptInfo.tab cat <<-END_VERSIONS > versions.yml "${task.process}": star: \$(STAR --version | sed -e "s/STAR_//g") samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//') gawk: \$(echo \$(gawk --version 2>&1) | sed 's/^.*GNU Awk //; s/, .*\$//') END_VERSIONS """ |
38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 | """ universc \\ --id 'sample-${meta.id}' \\ ${input_reads} \\ --technology '${meta.technology}' \\ --chemistry '${meta.chemistry}' \\ --reference ${reference_name} \\ --description ${sample_arg} \\ --jobmode "local" \\ --localcores ${task.cpus} \\ --localmem ${task.memory.toGiga()} \\ --per-cell-data \\ $args 1> _log 2> _err # save log files echo !! > sample-${meta.id}/outs/_invocation cp _log sample-${meta.id}/outs/_log cp _err sample-${meta.id}/outs/_err cat <<-END_VERSIONS > versions.yml "${task.process}": cellranger: \$(echo \$(cellranger count --version 2>&1 | head -n 2 | tail -n 1 | sed 's/^.* //g' | sed 's/(//g' | sed 's/)//g' )) universc: \$(echo \$(bash /universc/launch_universc.sh --version | grep version | grep universc | sed 's/^.* //g' )) END_VERSIONS """ |
66 67 68 69 70 71 72 73 74 75 | """ mkdir -p "sample-${meta.id}/outs/" touch sample-${meta.id}/outs/fake_file.txt cat <<-END_VERSIONS > versions.yml "${task.process}": cellranger: \$(echo \$(cellranger count --version 2>&1 | head -n 2 | tail -n 1 | sed 's/^.* //g' | sed 's/(//g' | sed 's/)//g' )) universc: \$(echo \$(bash /universc/launch_universc.sh --version | grep version | grep universc | sed 's/^.* //g' )) END_VERSIONS """ |
Support
- Future updates
Related Workflows





