pepVet

Evaluate proteolytic digests for LC-MS/MS proteomics. Score peptides, compare enzymes, triage proteins, and plan your workflow before any sample reaches the instrument.

What pepVet does

Enzyme choice is the first and most consequential decision in a bottom-up proteomics experiment. Cut too aggressively and you drown in tiny fragments below the detection threshold. Cut too conservatively and overlong peptides fail to fly, fragment, or resolve on the column. Most tools stop after listing which peptides an enzyme could produce. pepVet goes further: it scores each peptide for LC-MS/MS suitability, ranks enzymes by digest quality, and triages proteins by expected difficulty, all before any sample touches the instrument.

Quick start

library(pepVet)

bsa <- system.file("extdata", "P02769.fasta", package = "pepVet")

# One-call evaluation with styled console report
result <- pepvet_check(bsa, enzyme = "trypsin", missed_cleavages = 1L)
result$scores

# Multi-enzyme comparison
comp <- compare_digests(bsa,
  enzymes = c("trypsin", "lysc", "glutamyl endopeptidase", "asp-n endopeptidase")
)
digest_report(comp)
recommend_enzyme(bsa, enzymes = c("trypsin", "lysc"))

Visualization

pepVet provides 12 ggplot2-based plot functions for digest diagnostics, enzyme comparison, physicochemical distributions, and proteome-scale overviews. Every function returns a ggplot or patchwork object that can be customized further.

Single-protein diagnostic: plot_digest_profile() called on BSA (P02769.fasta) digested with trypsin at one missed cleavage gives a four-panel figure showing length distribution, GRAVY hydrophobicity, sequence coverage, and component scores:

Proteome-scale enzyme comparison: plot_batch_comparison() called on the 50-protein fixture (small_proteome_50_proteins.fasta) evaluated against 10 enzymes (trypsin, Lys-C, chymotrypsin, Asp-N, Glu-C, Arg-C, thermolysin, pepsin, Staphylococcal peptidase I, proteinase K) gives verdict summaries, score distributions, component heatmaps, and per-protein win rates:

See the Visualising Digest Quality article for a full walkthrough of all plot functions.

Features

Digest simulation

digest_protein() cleaves any protein sequence with any of 40 cleaver-compatible enzyme rules and returns a peptide tibble with coordinates and missed-cleavage counts.
annotate_cleavage_sites() labels each trypsin-family cleavage site as high, medium, or low efficiency using local P1-P1' sequence context.

Scoring

score_peptides() summarises a peptide set into five orthogonal component scores (S_length, S_coverage, S_count, S_hydro, S_charge) plus an optional sixth (S_unique) when a background proteome digest is supplied.
pepvet_preset() returns workflow-specific parameter sets for DDA, DIA, targeted, membrane, FFPE/degraded, and fractionated workflows.

Evaluation and comparison

evaluate_digest() wraps digest and scoring into one call and returns a named list with scores, peptides, and resolved parameters.
compare_digests() runs across a vector of enzymes for a single protein and returns a ranked tibble.
recommend_enzyme() returns the name of the best-scoring enzyme.

Batch workflows

batch_evaluate() evaluates every protein in a multi-FASTA independently and returns a flat tibble with one row per protein, including all score columns, verdicts, and four difficulty flags.
summarize_batch() computes proteome-level verdict distribution, composite score statistics, per-component means, and heuristic enzyme-switch candidates.
triage_proteins() appends an action column (proceed, consider_alternative, try_other_enzyme, skip) to the batch tibble.

Reporting and export

digest_report() renders a colour-coded console summary for single-protein or multi-enzyme results.
export_peptide_list() filters valid peptides and exports as Skyline-compatible CSV, generic annotated CSV, or FASTA.

Peptide properties

calculate_peptide_mass() computes monoisotopic neutral mass and m/z.
calculate_pI() computes isoelectric point using a Lehninger-style pKa set.

Scoring model

Six components, one weighted composite, one advisory verdict.

Score	What it measures	Why it matters
`S_length`	Fraction of peptides in the active length window [7, 25] aa	Short and long peptides lower identification rates
`S_coverage`	Fraction of the protein covered by valid peptides	Dark regions weaken protein-level inference
`S_count`	Valid count relative to enzyme-aware expected density	Too few weakens evidence; too many signals over-digestion
`S_hydro`	Fraction of valid peptides in the active GRAVY window [-1.0, 0.6]	Extreme hydrophobicity or hydrophilicity hurts LC retention
`S_charge`	Valid peptides with non-terminal K/R/H	Proxy for multi-charge potential and fragment ion richness
`S_unique`	Fraction of valid peptides unique in a supplied proteome	Shared peptides cannot distinguish isoforms or paralogs

Default weights (AHP-derived, consistency ratio 0.028): S_length 0.200, S_coverage 0.348, S_count 0.226, S_hydro 0.138, S_charge 0.088.

Verdict thresholds: Good >= 0.65, Moderate >= 0.40, Poor < 0.40. These are heuristic ranking labels, not calibrated probabilities.

Workflow presets

Each preset adjusts the valid-length window, GRAVY range, and component weights together.

preset <- pepvet_preset("targeted")
do.call(evaluate_digest, c(list(sequence = bsa, enzyme = "trypsin"), preset))

Preset	Best fit	Key shift	Source
`standard`	Routine DDA	[7,25] aa, GRAVY [-1,0.6], AHP defaults	Tabb 2008
`dia`	DIA and SWATH	[7,30] aa, GRAVY [-1,0.8], high coverage weight	Ludwig 2018
`targeted`	SRM, PRM, MRM	[8,20] aa, GRAVY [-0.8,0.4], S_unique 30%	Lange 2008, Picotti 2012
`membrane`	Hydrophobic proteins	GRAVY [-1.0,2.0], S_hydro 5%	Vit & Petrak 2017
`ffpe_degraded`	Degraded samples	[6,30] aa, high S_count weight	Coscia 2020, Buczak 2023
`fractionated`	SCX / high-pH RP	Same as standard, include_pI = TRUE	-

Installation

pepVet depends on Bioconductor packages. Install them first:

if (!requireNamespace("BiocManager", quietly = TRUE))
  install.packages("BiocManager")
BiocManager::install(c("Biostrings", "IRanges", "cleaver"))

if (!requireNamespace("remotes", quietly = TRUE))
  install.packages("remotes")
remotes::install_github("LangeLab/pepVet", dependencies = TRUE)

Reference FASTA fixtures

The package ships pinned FASTA files for reproducible examples and regression tests.

File	Protein	Use
`P02769.fasta`	BSA (607 aa)	Canonical positive-control digest
`P68431.fasta`	Histone H3.1 (136 aa)	Exposes trypsin over-digestion on basic proteins
`P56817.fasta`	BACE1 (501 aa)	Membrane protein with mixed hydrophobicity
`P00698.fasta`	Lysozyme C (147 aa)	Small protein, well-characterised digest
`Q8WZ42.fasta`	Titin (34350 aa)	Very large protein for scale testing
`P0CG48.fasta`	Ubiquitin (685 aa)	Short protein edge case
`P37840_isoforms.fasta`	Alpha-synuclein isoforms (3 seqs)	Proteome-aware uniqueness example
`small_proteome_50_proteins.fasta`	50 human proteins	Batch workflow fixture

Scope

pepVet is not a peptide detectability predictor. It is a rule-based, multi-criteria digest-ranking model for pre-acquisition planning. Scores are interpretable rankings within a given enzyme-workflow combination, not calibrated probabilities. The model does not account for PTMs, chromatographic gradients, or instrument-specific fragmentation parameters.

Documentation

Full documentation site: langelab.github.io/pepVet
Getting started: pepVet-introduction
Choosing an enzyme: enzyme-selection
Workflow presets: workflow-presets
Scoring model: scoring-model
Visualization gallery: visualisation
Changelog: NEWS.md
Bug reports and questions: GitHub Issues

Citation

citation("pepVet")

License

MIT. See LICENSE.md.

Contributing

Pull requests, bug reports, and documentation fixes are welcome. See CONTRIBUTING.md for the review workflow and CODE_OF_CONDUCT.md for community standards.

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
R		R
data-raw		data-raw
data		data
inst		inst
man		man
pkgdown		pkgdown
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.lintr		.lintr
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
DESCRIPTION		DESCRIPTION
Dockerfile		Dockerfile
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.md		README.md
_pkgdown.yml		_pkgdown.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

pepVet

What pepVet does

Quick start

Visualization

Features

Scoring model

Workflow presets

Installation

Reference FASTA fixtures

Scope

Documentation

Citation

License

Contributing

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

pepVet

What pepVet does

Quick start

Visualization

Features

Scoring model

Workflow presets

Installation

Reference FASTA fixtures

Scope

Documentation

Citation

License

Contributing

About

Topics

Resources

License

Licenses found

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages