Automated quantitative analysis of DIA proteomics mass spectrometry measurements.
Help improve this workflow!
This workflow has been published but could be further improved with some additional meta data:- Keyword(s) in categories input, output, operation
You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .
Automated quantitative analysis of DIA proteomics mass spectrometry measurements.
Introduction
nfcore/diaproteomics is a bioinformatics analysis pipeline used for quantitative processing of data independant (DIA) proteomics data ( preprint available here ).
The workflow is based on the OpenSwathWorkflow for SWATH-MS proteomic data. DIA RAW files (mzML) serve as inputs and library search is performed based on a given input spectral library. Optionally, spectral libraries can be generated ( EasyPQP ) from multiple matched DDA measurments and respective search results. Generated libraries can then further be aligned applying a pairwise RT alignment and concatenated into a single large library. In the same way internal retention time standarts (irts) can be either supplied or generted by the workflow in order to align library and DIA measurements into the same retention time space. FDR rescoring is applied using Pyprophet based on a competitive target-decoy approach on peakgroup or global peptide and protein level. In a last step DIAlignR for chromatogram alignment and quantification is carried out and a csv of peptide quantities, MSstats based protein statistics and several visualisations are exported.
(This chart was created with the help of
Lucidchart
)
Quick Start
-
Install
nextflow
-
Install any of
Docker
,Singularity
,Podman
,Shifter
orCharliecloud
for full pipeline reproducibility (please only useConda
as a last resort; see docs ) -
Download the pipeline and test it on a minimal dataset with a single command:
nextflow run nf-core/diaproteomics -profile test,<docker/singularity/podman/shifter/charliecloud/conda/institute>
Please check nf-core/configs to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use
-profile <institute>
in your command. This will enable eitherdocker
orsingularity
and set the appropriate execution settings for your local compute environment. -
Start running your own analysis!
nextflow run nf-core/diaproteomics -profile <docker/singularity/podman/conda/institute> --input 'sample_sheet.tsv' --input_spectral_library 'library_sheet.tsv' --irts 'irt_sheet.tsv' --mz_extraction_window 30 --mz_extraction_window_unit ppm --rt_extraction_window 600 --pyprophet_global_fdr_level 'protein' --pyprophet_protein_fdr 0.01
Alternatively, create spectral libraries and iRTs:
nextflow run nf-core/diaproteomics -profile <docker/singularity/podman/conda/institute> --input 'sample_sheet.tsv' --generate_spectral_library --input_sheet_dda 'dda_sheet.tsv' --generate_pseudo_irts --merge_libraries --align_libraries
See usage docs for all of the available options when running the pipeline.
Pipeline Summary
By default, the pipeline currently performs the following:
-
Optional spectral library generation from DDA input ('EasyPQP')
-
DIA Targeted Extraction ('OpenSwathWorkflow')
-
False discovery rate estimation ('Pyprophet')
-
Chromatogram alignment ('DIAlignR')
-
Statistical postprocessing ('MSstats')
Documentation
The nf-core/diaproteomics pipeline comes with documentation about the pipeline: usage and output .
Credits
nf-core/diaproteomics was originally written by Leon Bichmann.
We thank the following people for their extensive assistance in the development of this pipeline:
Shubham Gupta, George Rosenberger, Leon Kuchenbecker, Timo Sachsenberg, Oliver Alka, Julianus Pfeuffer and the nf-core team
Contributions and Support
If you would like to contribute to this pipeline, please see the contributing guidelines .
For further information or help, don't hesitate to get in touch on the
Slack
#diaproteomics
channel
(you can join with
this invite
).
Citations
If you apply DIAproteomics on your data please cite:
DIAproteomics: A multi-functional data analysis pipeline for data-independent-acquisition proteomics and peptidomics
Leon Bichmann, Shubham Gupta, George Rosenberger, Leon Kuchenbecker, Timo Sachsenberg, Oliver Alka, Julianus Pfeuffer, Oliver Kohlbacher & Hannes Rost.
bioRxiv: "https://www.biorxiv.org/content/10.1101/2020.12.08.415844v1"
You can cite the
nf-core
publication as follows:
The nf-core framework for community-curated bioinformatics pipelines.
Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x .
In addition, references of tools and data used in this pipeline are as follows:
OpenSwathWorkflow
Röst H. et al, Nat Biotechnol. 2014 Mar;32(3):219-23. doi: 10.1038/nbt.2841.
PyProphet
Rosenberger G. et al, Nat Methods 2017 Sep;14(9):921-927. doi: 10.1038/nmeth.4398. Epub 2017 Aug 21.
DIAlignR
Gupta S. et al, Mol Cell Proteomics 2019 Apr;18(4):806-817. doi: 10.1074/mcp.TIR118.001132. Epub 2019 Jan 31.
MSstats
Choi M. et al, Bioinformatics 2014 Sep 1;30(17):2524-6. doi: 10.1093/bioinformatics/btu305. Epub 2014 May 2.
Code Snippets
334 335 336 337 338 339 340 | """ echo $workflow.manifest.version > v_pipeline.txt echo $workflow.nextflow.version > v_nextflow.txt FileInfo --help &> v_openms.txt pyprophet --version &> v_pyprophet.txt scrape_software_versions.py &> software_versions_mqc.yaml """ |
358 359 360 | """ ThermoRawFileParser.sh -i=${raw_file} -f=2 -b=${raw_file.baseName}.mzML """ |
379 380 381 | """ IDFileConverter -in ${dda_id_file} -out ${id}_${sample}_peptide_ids.idXML -threads ${task.cpus} """ |
401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 | """ easypqp convert \\ --unimod ${unimod_file} \\ --pepxml ${idxml_file} \\ --spectra ${dda_mzml_file} easypqp library \\ --out ${dda_mzml_file.baseName}_run_peaks.tsv \\ --rt_psm_fdr_threshold ${params.library_rt_fdr} \\ --nofdr \\ ${dda_mzml_file.baseName}.psmpkl \\ ${dda_mzml_file.baseName}.peakpkl mv ${dda_mzml_file.baseName}_run_peaks.tsv ${id}_${sample}_library.tsv """ |
434 435 436 437 438 439 440 441 442 443 444 445 446 | """ TargetedFileConverter \\ -in ${lib_file_na} \\ -out ${lib_file_na.baseName}.tsv \\ -threads ${task.cpus} OpenSwathAssayGenerator \\ -in ${lib_file_na.baseName}.tsv \\ -min_transitions ${params.min_transitions} \\ -max_transitions ${params.max_transitions} \\ -out ${id}_${sample}_assay.tsv \\ -threads ${task.cpus} """ |
473 474 475 476 477 478 479 480 | """ merge_and_align_libraries_from_easypqp.py \\ --input_libraries ${lib_files_for_merging} \\ --min_overlap ${params.min_overlap_for_merging} \\ --rsq_threshold 0.75 \\ --output ${sample}_library_merged.tsv \\ ${align_flag} """ |
500 501 502 503 504 505 506 507 508 509 510 511 512 513 | """ select_pseudo_irts_from_lib.py \\ --input_libraries ${lib_file_assay_irt} \\ --min_rt 0 \\ --n_irts ${params.n_irts} \\ --max_rt 100 \\ --output ${lib_file_assay_irt.baseName}_pseudo_irts.tsv \\ ${quant_flag} TargetedFileConverter \\ -in ${lib_file_assay_irt.baseName}_pseudo_irts.tsv \\ -out ${lib_file_assay_irt.baseName}_pseudo_irts.pqp \\ -threads ${task.cpus} """ |
533 534 535 536 537 538 539 540 541 542 543 544 | """ TargetedFileConverter \\ -in ${lib_file_nd} \\ -out ${lib_file_nd.baseName}.pqp \\ -threads ${task.cpus} OpenSwathDecoyGenerator \\ -in ${lib_file_nd.baseName}.pqp \\ -method ${params.decoy_method} \\ -out ${lib_file_nd.baseName}_decoy.pqp \\ -threads ${task.cpus} """ |
563 564 565 | """ ThermoRawFileParser.sh -i=${raw_file} -f=2 -b=${raw_file.baseName}.mzML """ |
590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 | """ mkdir tmp TargetedFileConverter \\ -in ${lib_file} \\ -out ${lib_file.baseName}.pqp \\ -threads ${task.cpus} TargetedFileConverter \\ -in ${irt_file} \\ -out ${irt_file.baseName}.pqp \\ -threads ${task.cpus} OpenSwathWorkflow \\ -in ${mzml_file} \\ -tr ${lib_file.baseName}.pqp \\ -sort_swath_maps \\ -tr_irt ${irt_file.baseName}.pqp \\ -min_rsq ${params.irt_min_rsq} \\ -out_osw ${mzml_file.baseName}.osw \\ -out_chrom ${mzml_file.baseName}_chrom.mzML \\ -mz_extraction_window ${params.mz_extraction_window} \\ -mz_extraction_window_ms1 ${params.mz_extraction_window_ms1} \\ -mz_extraction_window_unit ${params.mz_extraction_window_unit} \\ -mz_extraction_window_ms1_unit ${params.mz_extraction_window_ms1_unit} \\ -rt_extraction_window ${params.rt_extraction_window} \\ -min_upper_edge_dist ${params.min_upper_edge_dist} \\ -RTNormalization:alignmentMethod ${params.irt_alignment_method} \\ -RTNormalization:estimateBestPeptides \\ -RTNormalization:outlierMethod none \\ -RTNormalization:NrRTBins ${params.irt_n_bins} \\ -RTNormalization:MinBinsFilled ${params.irt_min_bins_covered} \\ -mz_correction_function quadratic_regression_delta_ppm \\ -Scoring:stop_report_after_feature 5 \\ -Scoring:TransitionGroupPicker:compute_peak_quality false \\ -Scoring:TransitionGroupPicker:peak_integration 'original' \\ -Scoring:TransitionGroupPicker:background_subtraction 'none' \\ -Scoring:TransitionGroupPicker:PeakPickerMRM:sgolay_frame_length 11 \\ -Scoring:TransitionGroupPicker:PeakPickerMRM:sgolay_polynomial_order 3 \\ -Scoring:TransitionGroupPicker:PeakPickerMRM:gauss_width 30 \\ -Scoring:TransitionGroupPicker:PeakPickerMRM:use_gauss 'false' \\ -Scoring:TransitionGroupPicker:PeakIntegrator:integration_type 'intensity_sum' \\ -Scoring:TransitionGroupPicker:PeakIntegrator:baseline_type 'base_to_base' \\ -Scoring:TransitionGroupPicker:PeakIntegrator:fit_EMG 'false' \\ -batchSize 1000 \\ -readOptions ${params.cache_option} \\ -tempDirectory tmp \\ -Scoring:DIAScoring:dia_nr_isotopes 3 \\ -enable_uis_scoring \\ -Scoring:uis_threshold_sn -1 \\ -threads ${task.cpus} \\ ${force_option} ${ms1_option} ${ms1_scoring} ${ms1_mi} """ |
661 662 663 664 665 666 667 | """ pyprophet merge \\ --template=${lib_file_template} \\ --out=${sample}_osw_file_merged.osw \\ --no-same_run \\ ${all_osws} """ |
691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 | """ pyprophet score \\ --in=${scored_osw} \\ --level=${params.pyprophet_fdr_ms_level} \\ --out=${scored_osw.baseName}_scored.osw \\ --classifier=${params.pyprophet_classifier} \\ --pi0_lambda ${params.pyprophet_pi0_start} ${params.pyprophet_pi0_end} ${params.pyprophet_pi0_steps} \\ --threads=${task.cpus} pyprophet peptide \\ --in=${scored_osw.baseName}_scored.osw \\ --out=${scored_osw.baseName}_global_merged.osw \\ --context=run-specific pyprophet peptide --in=${scored_osw.baseName}_global_merged.osw --context=experiment-wide pyprophet peptide --in=${scored_osw.baseName}_global_merged.osw --context=global pyprophet ${params.pyprophet_global_fdr_level} --in=${scored_osw.baseName}_global_merged.osw --context=run-specific pyprophet ${params.pyprophet_global_fdr_level} --in=${scored_osw.baseName}_global_merged.osw --context=experiment-wide pyprophet ${params.pyprophet_global_fdr_level} --in=${scored_osw.baseName}_global_merged.osw --context=global """ |
716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 | """ pyprophet score \\ --in=${scored_osw} \\ --level=${params.pyprophet_fdr_ms_level} \\ --out=${scored_osw.baseName}_scored.osw \\ --classifier=${params.pyprophet_classifier} \\ --threads=${task.cpus} pyprophet peptide \\ --in=${scored_osw.baseName}_scored.osw \\ --out=${scored_osw.baseName}_global_merged.osw \\ --context=run-specific pyprophet peptide --in=${scored_osw.baseName}_global_merged.osw --context=experiment-wide pyprophet peptide --in=${scored_osw.baseName}_global_merged.osw --context=global pyprophet ${params.pyprophet_global_fdr_level} --in=${scored_osw.baseName}_global_merged.osw --context=run-specific pyprophet ${params.pyprophet_global_fdr_level} --in=${scored_osw.baseName}_global_merged.osw --context=experiment-wide pyprophet ${params.pyprophet_global_fdr_level} --in=${scored_osw.baseName}_global_merged.osw --context=global """ |
760 761 762 763 764 765 766 767 | """ pyprophet export \\ --in=${global_osw} \\ --max_rs_peakgroup_qvalue=${params.pyprophet_peakgroup_fdr} \\ --max_global_peptide_qvalue=${params.pyprophet_peptide_fdr} \\ --max_global_protein_qvalue=${params.pyprophet_protein_fdr} \\ --out=legacy.tsv """ |
788 789 790 791 792 793 794 795 796 797 798 799 800 | """ FileConverter \\ -in ${chrom_file_noindex} \\ -process_lowmemory \\ -out ${chrom_file_noindex.baseName.split('_chrom')[0]}.chrom.mzML OpenSwathMzMLFileCacher \\ -in ${chrom_file_noindex.baseName.split('_chrom')[0]}.chrom.mzML \\ -lossy_compression false \\ -process_lowmemory \\ -lowmem_batchsize 50000 \\ -out ${chrom_file_noindex.baseName.split('_chrom')[0]}.chrom.sqMass """ |
832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 | """ mkdir osw mv ${pyresults} osw/ mkdir xics mv *.chrom.sqMass xics/ DIAlignR.R \\ ${params.dialignr_global_align_fdr} \\ ${params.dialignr_analyte_fdr} \\ ${params.dialignr_unalign_fdr} \\ ${params.dialignr_align_fdr} \\ ${params.dialignr_query_fdr} \\ ${params.pyprophet_global_fdr_level} \\ ${params.dialignr_xicfilter} \\ ${dialignr_parallel} \\ ${task.cpus} mv DIAlignR.tsv ${sample}_peptide_quantities.csv """ |
875 876 877 878 879 880 881 882 883 884 885 886 | """ TargetedFileConverter \\ -in ${lib_file} \\ -out ${lib_file.baseName}.tsv reformat_output_for_msstats.py \\ --input ${dialignr_file} \\ --exp_design ${exp_design} \\ --library ${lib_file.baseName}.tsv \\ --fdr_level "none" \\ --output "${sample}_${condition}.csv" """ |
890 891 892 893 894 895 896 897 898 899 | """ TargetedFileConverter -in ${lib_file} -out ${lib_file.baseName}.tsv reformat_output_for_msstats.py \\ --input ${dialignr_file} \\ --exp_design ${exp_design} \\ --library ${lib_file.baseName}.tsv \\ --fdr_level ${params.pyprophet_global_fdr_level} \\ --output "${sample}_${condition}.csv" """ |
923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 | """ TargetedFileConverter -in ${lib_file} -out ${lib_file.baseName}.tsv mztab_output.py \\ --input ${dialignr_file} \\ --exp_design ${exp_design} \\ --library ${lib_file.baseName}.tsv \\ --fdr_level ${params.pyprophet_global_fdr_level} \\ --fdr_threshold_pep ${params.pyprophet_peptide_fdr} \\ --fdr_threshold_prot ${params.pyprophet_protein_fdr} \\ --ms1_scoring ${params.use_ms1} \\ --rt_extraction_window ${params.rt_extraction_window} \\ --mz_extraction_window ${params.mz_extraction_window} \\ --mz_extraction_window_ms1 ${params.mz_extraction_window_ms1} \\ --mz_extraction_unit ${params.mz_extraction_window_unit} \\ --mz_extraction_unit_ms1 ${params.mz_extraction_window_ms1_unit} \\ --dialignr_global_align_fdr ${params.dialignr_global_align_fdr} \\ --dialignr_analyte_fdr ${params.dialignr_analyte_fdr} \\ --dialignr_unalign_fdr ${params.dialignr_unalign_fdr} \\ --dialignr_align_fdr ${params.dialignr_align_fdr} \\ --dialignr_query_fdr ${params.dialignr_query_fdr} \\ --workflow_version $workflow.manifest.version \\ --output "${sample}_${condition}.mzTab" """ |
970 971 972 | """ msstats.R > msstats.log || echo "Optional MSstats step failed. Please check logs and re-run or do a manual statistical analysis." """ |
999 1000 1001 | """ plot_quantities_and_counts.R ${sample} """ |
1018 1019 1020 | """ markdown_to_html.py $output_docs -o results_description.html """ |
Support
- Future updates
Related Workflows





