Proteomics label-free quantification (LFQ) analysis pipeline
Help improve this workflow!
This workflow has been published but could be further improved with some additional meta data:- Keyword(s) in categories input, output
You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .
Proteomics label-free quantification (LFQ) analysis pipeline using OpenMS and MSstats, with feature quantification, feature summarization, quality control and group-based statistical analysis. .
Introduction
The pipeline is built using Nextflow , a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker containers making installation trivial and results highly reproducible.
Quick Start
-
Install
nextflow
-
Install either
Docker
orSingularity
for full pipeline reproducibility (please only useConda
as a last resort; see docs ) -
Download the pipeline and test it on a minimal dataset with a single command:
nextflow run nf-core/proteomicslfq -profile test,<docker/singularity/conda/institute>
Please check nf-core/configs to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use
-profile <institute>
in your command. This will enable eitherdocker
orsingularity
and set the appropriate execution settings for your local compute environment. -
Start running your own analysis!
nextflow run nf-core/proteomicslfq \ -profile <docker/singularity/conda/institute> \ --input '*.mzml' \ --database 'myProteinDB.fasta' \ --expdesign 'myDesign.tsv'
See usage docs for all of the available options when running the pipeline. Or configure the pipeline via nf-core launch from the web or the command line.
Documentation
The nf-core/proteomicslfq pipeline comes with documentation about the pipeline which you can read at
https://nf-co.re/proteomicslfq
or partly find in the
docs/
directory
.
It performs conversion to indexed mzML, database search (with multiple search engines), re-scoring (with e.g. Percolator), merging, FDR filtering, modification localization with Luciphor2 (e.g. phospho-sites), protein inference and grouping as well as label-free quantification by either spectral counting or feature-based alignment and integration. Downstream processing includes statistical post-processing with MSstats and quality control with PTXQC. For more info, see the output docs .
Credits
nf-core/proteomicslfq was originally written by Julianus Pfeuffer, Lukas Heumos, Leon Bichmann, Timo Sachsenberg, Yasset Perez-Riverol.
Contributions and Support
If you would like to contribute to this pipeline, please see the contributing guidelines .
For further information or help, don't hesitate to get in touch on the
Slack
#proteomicslfq
channel
(you can join with
this invite
).
Citation
If you use nf-core/proteomicslfq for your analysis, please cite it using the following doi: 10.5281/zenodo.XXXXXX
You can cite the
nf-core
publication as follows:
The nf-core framework for community-curated bioinformatics pipelines.
Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x . ReadCube: Full Access Link
An extensive list of references for the tools used by the pipeline can be found in the
CITATIONS.md
file.
Code Snippets
269 270 271 272 273 | """ ## -t2 since the one-table format parser is broken in OpenMS2.5 ## -l for legacy behavior to always add sample columns parse_sdrf convert-openms -t2 -l -s ${sdrf} > sdrf_parsing.log """ |
364 365 366 | """ ThermoRawFileParser.sh -i=${rawfile} -f=2 -o=./ > ${rawfile}_conversion.log """ |
386 387 388 389 | """ mkdir out FileConverter -in ${mzmlfile} -out out/${mzmlfile.baseName}.mzML > ${mzmlfile.baseName}_mzmlindexing.log """ |
429 430 431 432 433 434 435 | """ DecoyDatabase -in ${mydatabase} \\ -out ${mydatabase.baseName}_decoy.fasta \\ -decoy_string ${params.decoy_affix} \\ -decoy_string_position ${params.affix_type} \\ > ${mydatabase.baseName}_decoy_database.log """ |
476 477 478 479 480 481 482 483 484 485 | """ mkdir out PeakPickerHiRes -in ${mzml_file} \\ -out out/${mzml_file.baseName}.mzML \\ -threads ${task.cpus} \\ -debug ${params.pp_debug} \\ -processOption ${in_mem} \\ ${lvls} \\ > ${mzml_file.baseName}_pp.log """ |
537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 | """ MSGFPlusAdapter -in ${mzml_file} \\ -out ${mzml_file.baseName}_msgf.idXML \\ -threads ${task.cpus} \\ -java_memory ${task.memory.toMega()} \\ -database "${database}" \\ -instrument ${inst} \\ -protocol "${params.protocol}" \\ -matches_per_spec ${params.num_hits} \\ -min_precursor_charge ${params.min_precursor_charge} \\ -max_precursor_charge ${params.max_precursor_charge} \\ -min_peptide_length ${params.min_peptide_length} \\ -max_peptide_length ${params.max_peptide_length} \\ -enzyme "${enzyme}" \\ -tryptic ${params.num_enzyme_termini} \\ -precursor_mass_tolerance ${prec_tol} \\ -precursor_error_units ${prec_tol_unit} \\ -fixed_modifications ${fixed.tokenize(',').collect { "'${it}'" }.join(" ") } \\ -variable_modifications ${variable.tokenize(',').collect { "'${it}'" }.join(" ") } \\ -max_mods ${params.max_mods} \\ -debug ${params.db_debug} \\ > ${mzml_file.baseName}_msgf.log """ |
626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 | """ CometAdapter -in ${mzml_file} \\ -out ${mzml_file.baseName}_comet.idXML \\ -threads ${task.cpus} \\ -database "${database}" \\ -instrument ${inst} \\ -missed_cleavages ${params.allowed_missed_cleavages} \\ -num_hits ${params.num_hits} \\ -num_enzyme_termini ${params.num_enzyme_termini} \\ -enzyme "${enzyme}" \\ -precursor_charge ${params.min_precursor_charge}:${params.max_precursor_charge} \\ -fixed_modifications ${fixed.tokenize(',').collect { "'${it}'" }.join(" ") } \\ -variable_modifications ${variable.tokenize(',').collect { "'${it}'" }.join(" ") } \\ -max_variable_mods_in_peptide ${params.max_mods} \\ -precursor_mass_tolerance ${prec_tol} \\ -precursor_error_units ${prec_tol_unit} \\ -fragment_mass_tolerance ${bin_tol} \\ -fragment_bin_offset ${bin_offset} \\ -debug ${params.db_debug} \\ -force \\ > ${mzml_file.baseName}_comet.log """ |
676 677 678 679 680 681 682 683 684 685 686 | """ PeptideIndexer -in ${id_file} \\ -out ${id_file.baseName}_idx.idXML \\ -threads ${task.cpus} \\ -fasta ${database} \\ -enzyme:name "${enzyme}" \\ -enzyme:specificity ${pepidx_num_enzyme_termini} \\ ${il} \\ ${allow_um} \\ > ${id_file.baseName}_index_peptides.log """ |
711 712 713 714 715 716 | """ PSMFeatureExtractor -in ${id_file} \\ -out ${id_file.baseName}_feat.idXML \\ -threads ${task.cpus} \\ > ${id_file.baseName}_extract_percolator_features.log """ |
752 753 754 755 756 757 758 759 760 761 762 763 764 | """ ## Percolator does not have a threads parameter. Set it via OpenMP env variable, ## to honor threads on clusters OMP_NUM_THREADS=${task.cpus} PercolatorAdapter \\ -in ${id_file} \\ -out ${id_file.baseName}_perc.idXML \\ -threads ${task.cpus} \\ -subset_max_train ${params.subset_max_train} \\ -decoy_pattern ${params.decoy_affix} \\ -post_processing_tdc \\ -score_type pep \\ > ${id_file.baseName}_percolator.log """ |
792 793 794 795 796 797 798 799 800 | """ FalseDiscoveryRate -in ${id_file} \\ -out ${id_file.baseName}_fdr.idXML \\ -threads ${task.cpus} \\ -protein false \\ -algorithm:add_decoy_peptides \\ -algorithm:add_decoy_proteins \\ > ${id_file.baseName}_fdr.log """ |
823 824 825 826 827 828 829 | """ IDPosteriorErrorProbability -in ${id_file} \\ -out ${id_file.baseName}_idpep.idXML \\ -fit_algorithm:outlier_handling ${params.outlier_handling} \\ -threads ${task.cpus} \\ > ${id_file.baseName}_idpep.log """ |
854 855 856 857 858 859 860 861 862 863 | """ IDScoreSwitcher -in ${id_file} \\ -out ${id_file.baseName}_switched.idXML \\ -threads ${task.cpus} \\ -old_score "Posterior Error Probability" \\ -new_score ${qval_score} \\ -new_score_type q-value \\ -new_score_orientation lower_better \\ > ${id_file.baseName}_scoreswitcher_qval.log """ |
887 888 889 890 891 892 893 894 895 896 | """ ConsensusID -in ${id_files_from_ses} \\ -out ${mzml_id}_consensus.idXML \\ -per_spectrum \\ -threads ${task.cpus} \\ -algorithm ${params.consensusid_algorithm} \\ -filter:min_support ${params.min_consensus_support} \\ -filter:considered_hits ${params.consensusid_considered_top_hits} \\ > ${mzml_id}_consensusID.log """ |
919 920 921 922 923 924 925 926 927 | """ FalseDiscoveryRate -in ${id_file} \\ -out ${id_file.baseName}_fdr.idXML \\ -threads ${task.cpus} \\ -protein false \\ -algorithm:add_decoy_peptides \\ -algorithm:add_decoy_proteins \\ > ${id_file.baseName}_fdr.log """ |
947 948 949 950 951 952 953 | """ IDFilter -in ${id_file} \\ -out ${id_file.baseName}_filter.idXML \\ -threads ${task.cpus} \\ -score:pep ${params.psm_pep_fdr_cutoff} \\ > ${id_file.baseName}_idfilter.log """ |
979 980 981 982 983 984 985 986 987 988 989 | """ IDScoreSwitcher -in ${id_file} \\ -out ${id_file.baseName}_pep.idXML \\ -threads ${task.cpus} \\ -old_score "q-value" \\ -new_score "Posterior Error Probability_score" \\ -new_score_type "Posterior Error Probability" \\ -new_score_orientation lower_better \\ > ${id_file.baseName}_switch_pep_for_luciphor.log """ |
1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 | """ LuciphorAdapter -id ${id_file} \\ -in ${mzml_file} \\ -out ${id_file.baseName}_luciphor.idXML \\ -threads ${task.cpus} \\ -num_threads ${task.cpus} \\ -target_modifications ${params.mod_localization.tokenize(',').collect { "'${it}'" }.join(" ") } \\ -fragment_method ${frag_method} \\ ${losses} \\ ${dec_mass} \\ ${dec_losses} \\ -max_charge_state ${params.max_precursor_charge} \\ -max_peptide_length ${params.max_peptide_length} \\ -debug ${params.luciphor_debug} \\ > ${id_file.baseName}_luciphor.log """ |
1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 | """ ProteomicsLFQ -in ${(mzmls as List).join(' ')} \\ -ids ${(id_files as List).join(' ')} \\ -design ${expdes} \\ -fasta ${fasta} \\ -protein_inference ${params.protein_inference} \\ -quantification_method ${params.quantification_method} \\ -targeted_only ${params.targeted_only} \\ -mass_recalibration ${params.mass_recalibration} \\ -transfer_ids ${params.transfer_ids} \\ -protein_quantification ${params.protein_quant} \\ -out out.mzTab \\ -threads ${task.cpus} \\ ${msstats_present} \\ -out_cxml out.consensusXML \\ -proteinFDR ${params.protein_level_fdr_cutoff} \\ -debug ${params.inf_quant_debug} \\ > proteomicslfq.log """ |
1120 1121 1122 | """ msstats_plfq.R ${csv} ${mztab} > msstats.log || echo "Optional MSstats step failed. Please check logs and re-run or do a manual statistical analysis." """ |
1149 1150 1151 | """ ptxqc.R ${mzTab} > ptxqc.log """ |
1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 | """ echo $workflow.manifest.version > v_pipeline.txt echo $workflow.nextflow.version > v_nextflow.txt ThermoRawFileParser.sh --version &> v_thermorawfileparser.txt echo \$(FileConverter 2>&1) > v_fileconverter.txt || true echo \$(DecoyDatabase 2>&1) > v_decoydatabase.txt || true echo \$(MSGFPlusAdapter 2>&1) > v_msgfplusadapter.txt || true echo \$(msgf_plus 2>&1) > v_msgfplus.txt || true echo \$(CometAdapter 2>&1) > v_cometadapter.txt || true echo \$(comet 2>&1) > v_comet.txt || true echo \$(PeptideIndexer 2>&1) > v_peptideindexer.txt || true echo \$(PSMFeatureExtractor 2>&1) > v_psmfeatureextractor.txt || true echo \$(PercolatorAdapter 2>&1) > v_percolatoradapter.txt || true percolator -h &> v_percolator.txt echo \$(IDFilter 2>&1) > v_idfilter.txt || true echo \$(IDScoreSwitcher 2>&1) > v_idscoreswitcher.txt || true echo \$(FalseDiscoveryRate 2>&1) > v_falsediscoveryrate.txt || true echo \$(IDPosteriorErrorProbability 2>&1) > v_idposteriorerrorprobability.txt || true echo \$(ProteomicsLFQ 2>&1) > v_proteomicslfq.txt || true echo $workflow.manifest.version &> v_msstats_plfq.txt scrape_software_versions.py &> software_versions_mqc.yaml """ |
1266 1267 1268 | """ markdown_to_html.py $output_docs -o results_description.html """ |
Support
- Future updates
Related Workflows





