Skip to content

Comments

chore(llmobs): add GEPA as alternative prompt optimization backend#16599

Open
tillwf wants to merge 4 commits intomainfrom
till.wohlfarth/MLOB-5317/prompt_optimization_gepa
Open

chore(llmobs): add GEPA as alternative prompt optimization backend#16599
tillwf wants to merge 4 commits intomainfrom
till.wohlfarth/MLOB-5317/prompt_optimization_gepa

Conversation

@tillwf
Copy link
Contributor

@tillwf tillwf commented Feb 20, 2026

Summary

  • Add GEPA (reflective evolutionary optimizer) as an alternative method for LLMObs._prompt_optimization(), alongside the existing "metaprompting" default
  • Extract load_optimization_system_prompt() as a reusable module-level function so both metaprompting and GEPA can share the system prompt template
  • Add LLMObsGEPAAdapter that bridges our task/evaluators/optimization_task interface to GEPA's GEPAAdapter protocol
  • Add gepa>=0.0.26 as an optional dependency (pip install ddtrace[gepa])

Motivation

The current metaprompting loop is sequential: run experiment, call optimizer, repeat. GEPA adds an evolutionary approach with Pareto selection, batch sampling, and candidate
tracking — potentially finding better prompts with fewer iterations. By implementing GEPA's propose_new_texts adapter method, prompt generation still goes through the user's
existing optimization_task function, so no script changes are needed beyond adding method="gepa".

Changes

New files

  • ddtrace/llmobs/_optimizers/__init__.py — Package init
  • ddtrace/llmobs/_optimizers/gepa_strategy.pyLLMObsGEPAAdapter implementing GEPA's protocol:
    • evaluate() — runs user's task + evaluators on a batch, returns EvaluationBatch
    • make_reflective_dataset() — builds feedback from trajectories for GEPA's reflection
    • propose_new_texts() — wraps user's optimization_task via shared system prompt template
    • _to_numeric_score() — converts any evaluator return type to float
    • _dataset_to_gepa_format() — converts Dataset records for GEPA

Modified files

  • ddtrace/llmobs/_prompt_optimization.py:
    • Extracted load_optimization_system_prompt(config) from OptimizationIteration._load_system_prompt() (method now delegates)
    • Added method: str = "metaprompting" parameter to PromptOptimization.__init__()
    • Added routing in run() to GEPA paths when method="gepa"
    • Added _run_gepa_without_split(), _run_gepa_with_split(), _run_gepa_core() methods
  • ddtrace/llmobs/_llmobs.py — Added method parameter to _prompt_optimization() classmethod, passed through to constructor
  • pyproject.toml — Added gepa = ["gepa>=0.0.26"] optional dependency

What stays unchanged

  • All existing metaprompting paths (_run_without_split, _run_with_split)
  • OptimizationIteration, OptimizationResult, TestPhaseResult classes
  • _run_experiment(), _create_split_datasets(), all validation functions
  • User-facing API (only adds optional method kwarg at the end)

Usage

# Only change: add method="gepa"                                                                                    
opt = LLMObs._prompt_optimization(                                                                                  
    name="my_opt",                                                                                                  
    task=my_task,                                                                                                   
    optimization_task=my_opt_fn,  # SAME function — used via propose_new_texts                                      
    dataset=dataset,                                                                                                
    evaluators=[my_evaluator],                                                                                      
    summary_evaluators=[my_summary],                                                                                
    compute_score=my_score_fn,                                                                                      
    method="gepa",                                                                                                  
    config={                                                                                                        
        "prompt": "...",                                                                                            
        "max_metric_calls": 150,  # optional GEPA budget                                                            
    },                                                                                                              
)                                                                                                                   
result = opt.run()                                                                                                  

Test plan

  • Verify existing metaprompting tests still pass (scripts/run-tests tests/llmobs/)
  • Verify method="invalid" raises ValueError
  • Verify method="gepa" without gepa installed raises ImportError with install instructions
  • Unit test _to_numeric_score() with all input types (float, bool, dict, string, EvaluatorResult)
  • Unit test _dataset_to_gepa_format() conversion
  • Unit test propose_new_texts correctly calls optimization_task
  • Integration test with mocked gepa.optimize() for full GEPA run path (with and without split)
  • Manual test with real GEPA against live dataset

@tillwf tillwf requested a review from ncybul February 20, 2026 13:35
@tillwf tillwf requested review from a team as code owners February 20, 2026 13:35
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1dd5df9aae

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

except ImportError:
raise ImportError("gepa package is required for method='gepa'. Install with: pip install ddtrace[gepa]")

from ddtrace.llmobs._optimizers.gepa_strategy import LLMObsGEPAAdapter

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Ship GEPA adapter module before using it

The new GEPA path imports LLMObsGEPAAdapter from ddtrace.llmobs._optimizers.gepa_strategy, but this commit does not add that module anywhere in the tree, so calling PromptOptimization.run() with method="gepa" will fail immediately with ModuleNotFoundError even when the gepa extra is installed. This blocks the entire feature path introduced in this change.

Useful? React with 👍 / 👎.

@pr-commenter
Copy link

pr-commenter bot commented Feb 20, 2026

Performance SLOs

Comparing candidate till.wohlfarth/MLOB-5317/prompt_optimization_gepa (5f2d201) with baseline main (b5b77a2)

📈 Performance Regressions (2 suites)
📈 iastaspects - 117/117

✅ add_aspect

Time: ✅ 103.843µs (SLO: <130.000µs 📉 -20.1%) vs baseline: +3.5%

Memory: ✅ 42.939MB (SLO: <46.000MB -6.7%) vs baseline: +5.0%


✅ add_inplace_aspect

Time: ✅ 100.738µs (SLO: <130.000µs 📉 -22.5%) vs baseline: -0.4%

Memory: ✅ 42.959MB (SLO: <46.000MB -6.6%) vs baseline: +5.0%


✅ add_inplace_noaspect

Time: ✅ 28.423µs (SLO: <40.000µs 📉 -28.9%) vs baseline: +1.0%

Memory: ✅ 42.939MB (SLO: <46.000MB -6.7%) vs baseline: +5.0%


✅ add_noaspect

Time: ✅ 48.792µs (SLO: <70.000µs 📉 -30.3%) vs baseline: ~same

Memory: ✅ 42.880MB (SLO: <46.000MB -6.8%) vs baseline: +4.8%


✅ bytearray_aspect

Time: ✅ 252.004µs (SLO: <400.000µs 📉 -37.0%) vs baseline: +0.7%

Memory: ✅ 42.939MB (SLO: <46.000MB -6.7%) vs baseline: +5.0%


✅ bytearray_extend_aspect

Time: ✅ 634.997µs (SLO: <800.000µs 📉 -20.6%) vs baseline: -0.4%

Memory: ✅ 42.920MB (SLO: <46.000MB -6.7%) vs baseline: +4.8%


✅ bytearray_extend_noaspect

Time: ✅ 263.702µs (SLO: <400.000µs 📉 -34.1%) vs baseline: -0.1%

Memory: ✅ 42.959MB (SLO: <46.000MB -6.6%) vs baseline: +4.8%


✅ bytearray_noaspect

Time: ✅ 136.339µs (SLO: <300.000µs 📉 -54.6%) vs baseline: -0.3%

Memory: ✅ 42.939MB (SLO: <46.000MB -6.7%) vs baseline: +4.8%


✅ bytes_aspect

Time: ✅ 218.889µs (SLO: <300.000µs 📉 -27.0%) vs baseline: -0.1%

Memory: ✅ 42.880MB (SLO: <46.000MB -6.8%) vs baseline: +4.7%


✅ bytes_noaspect

Time: ✅ 133.096µs (SLO: <200.000µs 📉 -33.5%) vs baseline: +0.2%

Memory: ✅ 42.939MB (SLO: <46.000MB -6.7%) vs baseline: +4.9%


✅ bytesio_aspect

Time: ✅ 3.774ms (SLO: <5.000ms 📉 -24.5%) vs baseline: +0.4%

Memory: ✅ 42.900MB (SLO: <46.000MB -6.7%) vs baseline: +4.9%


✅ bytesio_noaspect

Time: ✅ 314.859µs (SLO: <420.000µs 📉 -25.0%) vs baseline: -0.8%

Memory: ✅ 42.920MB (SLO: <46.000MB -6.7%) vs baseline: +5.0%


✅ capitalize_aspect

Time: ✅ 89.057µs (SLO: <300.000µs 📉 -70.3%) vs baseline: -0.7%

Memory: ✅ 42.939MB (SLO: <46.000MB -6.7%) vs baseline: +4.9%


✅ capitalize_noaspect

Time: ✅ 249.674µs (SLO: <300.000µs 📉 -16.8%) vs baseline: +0.4%

Memory: ✅ 42.900MB (SLO: <46.000MB -6.7%) vs baseline: +4.7%


✅ casefold_aspect

Time: ✅ 89.089µs (SLO: <500.000µs 📉 -82.2%) vs baseline: +0.2%

Memory: ✅ 42.939MB (SLO: <46.000MB -6.7%) vs baseline: +4.9%


✅ casefold_noaspect

Time: ✅ 307.912µs (SLO: <500.000µs 📉 -38.4%) vs baseline: +1.0%

Memory: ✅ 42.959MB (SLO: <46.000MB -6.6%) vs baseline: +5.0%


✅ decode_aspect

Time: ✅ 86.539µs (SLO: <100.000µs 📉 -13.5%) vs baseline: ~same

Memory: ✅ 42.861MB (SLO: <46.000MB -6.8%) vs baseline: +4.8%


✅ decode_noaspect

Time: ✅ 151.762µs (SLO: <210.000µs 📉 -27.7%) vs baseline: -0.5%

Memory: ✅ 42.920MB (SLO: <46.000MB -6.7%) vs baseline: +5.0%


✅ encode_aspect

Time: ✅ 84.030µs (SLO: <200.000µs 📉 -58.0%) vs baseline: -0.3%

Memory: ✅ 42.920MB (SLO: <46.000MB -6.7%) vs baseline: +4.9%


✅ encode_noaspect

Time: ✅ 139.381µs (SLO: <200.000µs 📉 -30.3%) vs baseline: ~same

Memory: ✅ 42.880MB (SLO: <46.000MB -6.8%) vs baseline: +4.8%


✅ format_aspect

Time: ✅ 14.574ms (SLO: <19.200ms 📉 -24.1%) vs baseline: ~same

Memory: ✅ 43.096MB (SLO: <46.000MB -6.3%) vs baseline: +4.9%


✅ format_map_aspect

Time: ✅ 16.406ms (SLO: <21.500ms 📉 -23.7%) vs baseline: -0.5%

Memory: ✅ 43.037MB (SLO: <46.000MB -6.4%) vs baseline: +5.0%


✅ format_map_noaspect

Time: ✅ 370.873µs (SLO: <500.000µs 📉 -25.8%) vs baseline: ~same

Memory: ✅ 42.880MB (SLO: <46.000MB -6.8%) vs baseline: +4.9%


✅ format_noaspect

Time: ✅ 301.610µs (SLO: <500.000µs 📉 -39.7%) vs baseline: -0.2%

Memory: ✅ 42.861MB (SLO: <46.000MB -6.8%) vs baseline: +4.8%


✅ index_aspect

Time: ✅ 125.793µs (SLO: <300.000µs 📉 -58.1%) vs baseline: +4.7%

Memory: ✅ 42.939MB (SLO: <46.000MB -6.7%) vs baseline: +4.9%


✅ index_noaspect

Time: ✅ 40.208µs (SLO: <300.000µs 📉 -86.6%) vs baseline: -0.1%

Memory: ✅ 42.939MB (SLO: <46.000MB -6.7%) vs baseline: +4.8%


✅ join_aspect

Time: ✅ 210.269µs (SLO: <300.000µs 📉 -29.9%) vs baseline: -0.7%

Memory: ✅ 42.920MB (SLO: <46.000MB -6.7%) vs baseline: +4.9%


✅ join_noaspect

Time: ✅ 141.734µs (SLO: <300.000µs 📉 -52.8%) vs baseline: -0.8%

Memory: ✅ 42.920MB (SLO: <46.000MB -6.7%) vs baseline: +4.7%


✅ ljust_aspect

Time: ✅ 579.333µs (SLO: <700.000µs 📉 -17.2%) vs baseline: 📈 +16.6%

Memory: ✅ 42.979MB (SLO: <46.000MB -6.6%) vs baseline: +5.0%


✅ ljust_noaspect

Time: ✅ 261.335µs (SLO: <300.000µs 📉 -12.9%) vs baseline: +1.9%

Memory: ✅ 42.880MB (SLO: <46.000MB -6.8%) vs baseline: +4.9%


✅ lower_aspect

Time: ✅ 293.094µs (SLO: <500.000µs 📉 -41.4%) vs baseline: -0.5%

Memory: ✅ 42.920MB (SLO: <46.000MB -6.7%) vs baseline: +4.7%


✅ lower_noaspect

Time: ✅ 236.614µs (SLO: <300.000µs 📉 -21.1%) vs baseline: +0.4%

Memory: ✅ 42.920MB (SLO: <46.000MB -6.7%) vs baseline: +4.9%


✅ lstrip_aspect

Time: ✅ 0.269ms (SLO: <3.000ms 📉 -91.0%) vs baseline: -0.5%

Memory: ✅ 42.880MB (SLO: <46.000MB -6.8%) vs baseline: +4.8%


✅ lstrip_noaspect

Time: ✅ 0.178ms (SLO: <3.000ms 📉 -94.1%) vs baseline: +0.7%

Memory: ✅ 42.880MB (SLO: <46.000MB -6.8%) vs baseline: +4.7%


✅ modulo_aspect

Time: ✅ 14.269ms (SLO: <18.750ms 📉 -23.9%) vs baseline: ~same

Memory: ✅ 42.998MB (SLO: <46.000MB -6.5%) vs baseline: +4.6%


✅ modulo_aspect_for_bytearray_bytearray

Time: ✅ 14.775ms (SLO: <19.350ms 📉 -23.6%) vs baseline: ~same

Memory: ✅ 42.998MB (SLO: <46.000MB -6.5%) vs baseline: +5.0%


✅ modulo_aspect_for_bytes

Time: ✅ 14.393ms (SLO: <18.900ms 📉 -23.8%) vs baseline: +0.3%

Memory: ✅ 43.037MB (SLO: <46.000MB -6.4%) vs baseline: +4.8%


✅ modulo_aspect_for_bytes_bytearray

Time: ✅ 14.677ms (SLO: <19.150ms 📉 -23.4%) vs baseline: +0.7%

Memory: ✅ 42.998MB (SLO: <46.000MB -6.5%) vs baseline: +4.8%


✅ modulo_noaspect

Time: ✅ 0.362ms (SLO: <3.000ms 📉 -87.9%) vs baseline: +1.0%

Memory: ✅ 42.939MB (SLO: <46.000MB -6.7%) vs baseline: +4.9%


✅ replace_aspect

Time: ✅ 18.333ms (SLO: <24.000ms 📉 -23.6%) vs baseline: -0.7%

Memory: ✅ 42.939MB (SLO: <46.000MB -6.7%) vs baseline: +4.6%


✅ replace_noaspect

Time: ✅ 280.376µs (SLO: <300.000µs -6.5%) vs baseline: +0.3%

Memory: ✅ 42.920MB (SLO: <46.000MB -6.7%) vs baseline: +4.9%


✅ repr_aspect

Time: ✅ 309.690µs (SLO: <420.000µs 📉 -26.3%) vs baseline: -0.9%

Memory: ✅ 42.900MB (SLO: <46.000MB -6.7%) vs baseline: +4.9%


✅ repr_noaspect

Time: ✅ 46.918µs (SLO: <90.000µs 📉 -47.9%) vs baseline: +0.8%


✅ rstrip_aspect

Time: ✅ 384.653µs (SLO: <500.000µs 📉 -23.1%) vs baseline: +0.3%

Memory: ✅ 42.900MB (SLO: <46.000MB -6.7%) vs baseline: +4.8%


✅ rstrip_noaspect

Time: ✅ 184.674µs (SLO: <300.000µs 📉 -38.4%) vs baseline: +0.2%

Memory: ✅ 42.900MB (SLO: <46.000MB -6.7%) vs baseline: +4.7%


✅ slice_aspect

Time: ✅ 184.098µs (SLO: <300.000µs 📉 -38.6%) vs baseline: +0.1%

Memory: ✅ 42.939MB (SLO: <46.000MB -6.7%) vs baseline: +5.0%


✅ slice_noaspect

Time: ✅ 54.074µs (SLO: <90.000µs 📉 -39.9%) vs baseline: -0.5%

Memory: ✅ 42.900MB (SLO: <46.000MB -6.7%) vs baseline: +4.8%


✅ stringio_aspect

Time: ✅ 4.364ms (SLO: <5.000ms 📉 -12.7%) vs baseline: 📈 +14.7%

Memory: ✅ 42.920MB (SLO: <46.000MB -6.7%) vs baseline: +5.1%


✅ stringio_noaspect

Time: ✅ 345.533µs (SLO: <500.000µs 📉 -30.9%) vs baseline: +0.5%

Memory: ✅ 42.920MB (SLO: <46.000MB -6.7%) vs baseline: +4.9%


✅ strip_aspect

Time: ✅ 269.984µs (SLO: <350.000µs 📉 -22.9%) vs baseline: ~same

Memory: ✅ 42.959MB (SLO: <46.000MB -6.6%) vs baseline: +5.0%


✅ strip_noaspect

Time: ✅ 176.784µs (SLO: <240.000µs 📉 -26.3%) vs baseline: -0.3%

Memory: ✅ 42.900MB (SLO: <46.000MB -6.7%) vs baseline: +4.8%


✅ swapcase_aspect

Time: ✅ 334.315µs (SLO: <500.000µs 📉 -33.1%) vs baseline: +0.2%

Memory: ✅ 42.959MB (SLO: <46.000MB -6.6%) vs baseline: +5.0%


✅ swapcase_noaspect

Time: ✅ 271.454µs (SLO: <400.000µs 📉 -32.1%) vs baseline: +0.3%

Memory: ✅ 42.939MB (SLO: <46.000MB -6.7%) vs baseline: +5.0%


✅ title_aspect

Time: ✅ 316.375µs (SLO: <500.000µs 📉 -36.7%) vs baseline: -1.2%

Memory: ✅ 42.900MB (SLO: <46.000MB -6.7%) vs baseline: +4.8%


✅ title_noaspect

Time: ✅ 259.186µs (SLO: <400.000µs 📉 -35.2%) vs baseline: -0.4%

Memory: ✅ 42.959MB (SLO: <46.000MB -6.6%) vs baseline: +5.0%


✅ translate_aspect

Time: ✅ 491.217µs (SLO: <700.000µs 📉 -29.8%) vs baseline: ~same

Memory: ✅ 42.880MB (SLO: <46.000MB -6.8%) vs baseline: +4.7%


✅ translate_noaspect

Time: ✅ 424.190µs (SLO: <500.000µs 📉 -15.2%) vs baseline: -0.7%

Memory: ✅ 42.920MB (SLO: <46.000MB -6.7%) vs baseline: +4.9%


✅ upper_aspect

Time: ✅ 294.003µs (SLO: <500.000µs 📉 -41.2%) vs baseline: -0.8%

Memory: ✅ 42.959MB (SLO: <46.000MB -6.6%) vs baseline: +5.0%


✅ upper_noaspect

Time: ✅ 234.559µs (SLO: <400.000µs 📉 -41.4%) vs baseline: -0.2%

Memory: ✅ 42.959MB (SLO: <46.000MB -6.6%) vs baseline: +5.0%


📈 iastaspectsospath - 24/24

✅ ospathbasename_aspect

Time: ✅ 508.743µs (SLO: <700.000µs 📉 -27.3%) vs baseline: 📈 +19.4%

Memory: ✅ 42.743MB (SLO: <46.000MB -7.1%) vs baseline: +4.6%


✅ ospathbasename_noaspect

Time: ✅ 431.973µs (SLO: <700.000µs 📉 -38.3%) vs baseline: -0.3%

Memory: ✅ 42.605MB (SLO: <46.000MB -7.4%) vs baseline: +5.1%


✅ ospathjoin_aspect

Time: ✅ 627.798µs (SLO: <700.000µs 📉 -10.3%) vs baseline: ~same

Memory: ✅ 42.920MB (SLO: <46.000MB -6.7%) vs baseline: +5.0%


✅ ospathjoin_noaspect

Time: ✅ 634.515µs (SLO: <700.000µs -9.4%) vs baseline: -0.5%

Memory: ✅ 42.664MB (SLO: <46.000MB -7.3%) vs baseline: +4.4%


✅ ospathnormcase_aspect

Time: ✅ 348.275µs (SLO: <700.000µs 📉 -50.2%) vs baseline: -1.3%

Memory: ✅ 42.546MB (SLO: <46.000MB -7.5%) vs baseline: +3.9%


✅ ospathnormcase_noaspect

Time: ✅ 358.655µs (SLO: <700.000µs 📉 -48.8%) vs baseline: -0.3%

Memory: ✅ 42.605MB (SLO: <46.000MB -7.4%) vs baseline: +5.0%


✅ ospathsplit_aspect

Time: ✅ 486.752µs (SLO: <700.000µs 📉 -30.5%) vs baseline: -1.0%

Memory: ✅ 42.644MB (SLO: <46.000MB -7.3%) vs baseline: +4.6%


✅ ospathsplit_noaspect

Time: ✅ 501.130µs (SLO: <700.000µs 📉 -28.4%) vs baseline: -0.3%

Memory: ✅ 42.625MB (SLO: <46.000MB -7.3%) vs baseline: +4.2%


✅ ospathsplitdrive_aspect

Time: ✅ 375.629µs (SLO: <700.000µs 📉 -46.3%) vs baseline: +0.1%

Memory: ✅ 42.920MB (SLO: <46.000MB -6.7%) vs baseline: +5.0%


✅ ospathsplitdrive_noaspect

Time: ✅ 72.982µs (SLO: <700.000µs 📉 -89.6%) vs baseline: +0.9%

Memory: ✅ 42.546MB (SLO: <46.000MB -7.5%) vs baseline: +3.8%


✅ ospathsplitext_aspect

Time: ✅ 461.712µs (SLO: <700.000µs 📉 -34.0%) vs baseline: +1.2%

Memory: ✅ 42.802MB (SLO: <46.000MB -7.0%) vs baseline: +4.4%


✅ ospathsplitext_noaspect

Time: ✅ 463.433µs (SLO: <700.000µs 📉 -33.8%) vs baseline: +0.1%

Memory: ✅ 42.644MB (SLO: <46.000MB -7.3%) vs baseline: +4.9%

🟡 Near SLO Breach (1 suite)
🟡 tracer - 6/6

✅ large

Time: ✅ 31.356ms (SLO: <32.950ms -4.8%) vs baseline: -0.7%

Memory: ✅ 36.766MB (SLO: <39.250MB -6.3%) vs baseline: +5.2%


✅ medium

Time: ✅ 3.113ms (SLO: <3.200ms -2.7%) vs baseline: -1.7%

Memory: ✅ 35.527MB (SLO: <38.750MB -8.3%) vs baseline: +4.8%


✅ small

Time: ✅ 364.476µs (SLO: <370.000µs 🟡 -1.5%) vs baseline: +3.8%

Memory: ✅ 35.606MB (SLO: <38.750MB -8.1%) vs baseline: +4.9%

⚠️ Unstable Tests (2 suites)
⚠️ coreapiscenario - 10/10 (1 unstable)

⚠️ context_with_data_listeners

Time: ⚠️ 13.253µs (SLO: <20.000µs 📉 -33.7%) vs baseline: -0.4%

Memory: ✅ 35.606MB (SLO: <38.000MB -6.3%) vs baseline: +4.9%


✅ context_with_data_no_listeners

Time: ✅ 3.279µs (SLO: <10.000µs 📉 -67.2%) vs baseline: +0.1%

Memory: ✅ 35.507MB (SLO: <38.000MB -6.6%) vs baseline: +4.6%


✅ get_item_exists

Time: ✅ 0.578µs (SLO: <10.000µs 📉 -94.2%) vs baseline: -0.7%

Memory: ✅ 35.586MB (SLO: <38.000MB -6.4%) vs baseline: +5.2%


✅ get_item_missing

Time: ✅ 0.630µs (SLO: <10.000µs 📉 -93.7%) vs baseline: -0.2%

Memory: ✅ 35.606MB (SLO: <38.000MB -6.3%) vs baseline: +5.2%


✅ set_item

Time: ✅ 24.017µs (SLO: <30.000µs 📉 -19.9%) vs baseline: +0.4%

Memory: ✅ 35.606MB (SLO: <38.000MB -6.3%) vs baseline: +4.9%


⚠️ packagesupdateimporteddependencies - 24/24 (1 unstable)

✅ import_many

Time: ✅ 156.562µs (SLO: <170.000µs -7.9%) vs baseline: +0.6%

Memory: ✅ 40.339MB (SLO: <46.000MB 📉 -12.3%) vs baseline: +5.0%


✅ import_many_cached

Time: ✅ 121.646µs (SLO: <130.000µs -6.4%) vs baseline: -0.2%

Memory: ✅ 40.164MB (SLO: <46.000MB 📉 -12.7%) vs baseline: +4.3%


✅ import_many_stdlib

Time: ✅ 0.767ms (SLO: <1.750ms 📉 -56.2%) vs baseline: +0.3%

Memory: ✅ 40.419MB (SLO: <46.000MB 📉 -12.1%) vs baseline: +4.4%


⚠️ import_many_stdlib_cached

Time: ⚠️ 0.174ms (SLO: <1.100ms 📉 -84.2%) vs baseline: -0.3%

Memory: ✅ 40.398MB (SLO: <46.000MB 📉 -12.2%) vs baseline: +5.2%


✅ import_many_unknown

Time: ✅ 833.976µs (SLO: <890.000µs -6.3%) vs baseline: +0.4%

Memory: ✅ 40.300MB (SLO: <46.000MB 📉 -12.4%) vs baseline: +4.8%


✅ import_many_unknown_cached

Time: ✅ 796.931µs (SLO: <870.000µs -8.4%) vs baseline: -0.3%

Memory: ✅ 40.414MB (SLO: <46.000MB 📉 -12.1%) vs baseline: +5.0%


✅ import_one

Time: ✅ 19.806µs (SLO: <30.000µs 📉 -34.0%) vs baseline: +0.6%

Memory: ✅ 40.259MB (SLO: <46.000MB 📉 -12.5%) vs baseline: +5.3%


✅ import_one_cache

Time: ✅ 6.307µs (SLO: <10.000µs 📉 -36.9%) vs baseline: +0.4%

Memory: ✅ 40.307MB (SLO: <46.000MB 📉 -12.4%) vs baseline: +5.4%


✅ import_one_stdlib

Time: ✅ 18.770µs (SLO: <20.000µs -6.1%) vs baseline: +1.0%

Memory: ✅ 40.291MB (SLO: <46.000MB 📉 -12.4%) vs baseline: +5.5%


✅ import_one_stdlib_cache

Time: ✅ 6.298µs (SLO: <10.000µs 📉 -37.0%) vs baseline: +0.1%

Memory: ✅ 40.389MB (SLO: <46.000MB 📉 -12.2%) vs baseline: +5.4%


✅ import_one_unknown

Time: ✅ 45.430µs (SLO: <50.000µs -9.1%) vs baseline: ~same

Memory: ✅ 40.533MB (SLO: <46.000MB 📉 -11.9%) vs baseline: +5.6%


✅ import_one_unknown_cache

Time: ✅ 6.276µs (SLO: <10.000µs 📉 -37.2%) vs baseline: -0.1%

Memory: ✅ 40.290MB (SLO: <43.000MB -6.3%) vs baseline: +5.5%

✅ All Tests Passing (20 suites)
djangosimple - 30/30

✅ appsec

Time: ✅ 19.423ms (SLO: <22.300ms 📉 -12.9%) vs baseline: -0.4%

Memory: ✅ 67.633MB (SLO: <73.500MB -8.0%) vs baseline: +4.9%


✅ exception-replay-enabled

Time: ✅ 1.383ms (SLO: <1.450ms -4.6%) vs baseline: +0.2%

Memory: ✅ 65.746MB (SLO: <71.500MB -8.0%) vs baseline: +4.7%


✅ iast

Time: ✅ 19.482ms (SLO: <22.250ms 📉 -12.4%) vs baseline: -0.4%

Memory: ✅ 67.633MB (SLO: <75.000MB -9.8%) vs baseline: +4.9%


✅ profiler

Time: ✅ 15.351ms (SLO: <16.550ms -7.2%) vs baseline: +0.2%

Memory: ✅ 58.812MB (SLO: <61.000MB -3.6%) vs baseline: +4.8%


✅ resource-renaming

Time: ✅ 19.483ms (SLO: <21.750ms 📉 -10.4%) vs baseline: -0.4%

Memory: ✅ 67.594MB (SLO: <73.500MB -8.0%) vs baseline: +4.8%


✅ span-code-origin

Time: ✅ 19.950ms (SLO: <28.200ms 📉 -29.3%) vs baseline: +1.3%

Memory: ✅ 67.633MB (SLO: <75.000MB -9.8%) vs baseline: +4.9%


✅ tracer

Time: ✅ 19.613ms (SLO: <21.750ms -9.8%) vs baseline: +0.7%

Memory: ✅ 67.633MB (SLO: <75.000MB -9.8%) vs baseline: +4.9%


✅ tracer-and-profiler

Time: ✅ 21.093ms (SLO: <23.500ms 📉 -10.2%) vs baseline: +0.5%

Memory: ✅ 69.127MB (SLO: <75.000MB -7.8%) vs baseline: +4.9%


✅ tracer-dont-create-db-spans

Time: ✅ 19.601ms (SLO: <21.500ms -8.8%) vs baseline: -0.2%

Memory: ✅ 67.633MB (SLO: <75.000MB -9.8%) vs baseline: +4.9%


✅ tracer-minimal

Time: ✅ 16.909ms (SLO: <17.500ms -3.4%) vs baseline: +0.5%

Memory: ✅ 67.613MB (SLO: <75.000MB -9.8%) vs baseline: +4.8%


✅ tracer-native

Time: ✅ 19.425ms (SLO: <21.750ms 📉 -10.7%) vs baseline: -0.2%

Memory: ✅ 67.633MB (SLO: <72.500MB -6.7%) vs baseline: +4.9%


✅ tracer-no-caches

Time: ✅ 17.510ms (SLO: <19.650ms 📉 -10.9%) vs baseline: -0.5%

Memory: ✅ 67.633MB (SLO: <75.000MB -9.8%) vs baseline: +4.9%


✅ tracer-no-databases

Time: ✅ 19.203ms (SLO: <20.100ms -4.5%) vs baseline: -0.5%

Memory: ✅ 67.633MB (SLO: <75.000MB -9.8%) vs baseline: +4.9%


✅ tracer-no-middleware

Time: ✅ 19.282ms (SLO: <21.500ms 📉 -10.3%) vs baseline: +0.4%

Memory: ✅ 67.653MB (SLO: <75.000MB -9.8%) vs baseline: +4.9%


✅ tracer-no-templates

Time: ✅ 19.411ms (SLO: <22.000ms 📉 -11.8%) vs baseline: -0.5%

Memory: ✅ 67.645MB (SLO: <73.500MB -8.0%) vs baseline: +4.9%


errortrackingdjangosimple - 6/6

✅ errortracking-enabled-all

Time: ✅ 16.337ms (SLO: <19.850ms 📉 -17.7%) vs baseline: -0.2%

Memory: ✅ 67.154MB (SLO: <75.000MB 📉 -10.5%) vs baseline: +4.9%


✅ errortracking-enabled-user

Time: ✅ 16.354ms (SLO: <19.400ms 📉 -15.7%) vs baseline: ~same

Memory: ✅ 67.139MB (SLO: <75.000MB 📉 -10.5%) vs baseline: +4.9%


✅ tracer-enabled

Time: ✅ 16.311ms (SLO: <19.450ms 📉 -16.1%) vs baseline: ~same

Memory: ✅ 67.156MB (SLO: <75.000MB 📉 -10.5%) vs baseline: +4.8%


errortrackingflasksqli - 6/6

✅ errortracking-enabled-all

Time: ✅ 2.106ms (SLO: <2.300ms -8.4%) vs baseline: ~same

Memory: ✅ 55.011MB (SLO: <60.000MB -8.3%) vs baseline: +4.8%


✅ errortracking-enabled-user

Time: ✅ 2.115ms (SLO: <2.250ms -6.0%) vs baseline: ~same

Memory: ✅ 55.070MB (SLO: <60.000MB -8.2%) vs baseline: +4.9%


✅ tracer-enabled

Time: ✅ 2.101ms (SLO: <2.300ms -8.7%) vs baseline: -0.2%

Memory: ✅ 54.972MB (SLO: <60.000MB -8.4%) vs baseline: +4.7%


flasksimple - 18/18

✅ appsec-get

Time: ✅ 3.410ms (SLO: <4.750ms 📉 -28.2%) vs baseline: -0.5%

Memory: ✅ 54.668MB (SLO: <66.500MB 📉 -17.8%) vs baseline: +4.7%


✅ appsec-post

Time: ✅ 2.899ms (SLO: <6.750ms 📉 -57.0%) vs baseline: +0.2%

Memory: ✅ 55.082MB (SLO: <66.500MB 📉 -17.2%) vs baseline: +4.8%


✅ appsec-telemetry

Time: ✅ 3.439ms (SLO: <4.750ms 📉 -27.6%) vs baseline: +0.9%

Memory: ✅ 54.827MB (SLO: <66.500MB 📉 -17.6%) vs baseline: +4.9%


✅ debugger

Time: ✅ 1.867ms (SLO: <2.000ms -6.6%) vs baseline: ~same

Memory: ✅ 48.236MB (SLO: <51.500MB -6.3%) vs baseline: +4.7%


✅ iast-get

Time: ✅ 1.865ms (SLO: <2.000ms -6.8%) vs baseline: +0.5%

Memory: ✅ 45.287MB (SLO: <49.000MB -7.6%) vs baseline: +5.2%


✅ profiler

Time: ✅ 1.904ms (SLO: <2.100ms -9.3%) vs baseline: +0.1%

Memory: ✅ 51.382MB (SLO: <52.500MB -2.1%) vs baseline: +5.1%


✅ resource-renaming

Time: ✅ 3.395ms (SLO: <3.650ms -7.0%) vs baseline: -0.2%

Memory: ✅ 54.708MB (SLO: <60.000MB -8.8%) vs baseline: +4.9%


✅ tracer

Time: ✅ 3.414ms (SLO: <3.650ms -6.5%) vs baseline: +0.3%

Memory: ✅ 54.765MB (SLO: <60.000MB -8.7%) vs baseline: +5.0%


✅ tracer-native

Time: ✅ 3.417ms (SLO: <3.650ms -6.4%) vs baseline: +0.4%

Memory: ✅ 54.787MB (SLO: <60.000MB -8.7%) vs baseline: +4.8%


flasksqli - 6/6

✅ appsec-enabled

Time: ✅ 2.096ms (SLO: <4.200ms 📉 -50.1%) vs baseline: ~same

Memory: ✅ 55.011MB (SLO: <66.000MB 📉 -16.7%) vs baseline: +4.8%


✅ iast-enabled

Time: ✅ 2.110ms (SLO: <2.800ms 📉 -24.6%) vs baseline: +0.2%

Memory: ✅ 55.031MB (SLO: <62.500MB 📉 -12.0%) vs baseline: +5.1%


✅ tracer-enabled

Time: ✅ 2.095ms (SLO: <2.250ms -6.9%) vs baseline: ~same

Memory: ✅ 55.011MB (SLO: <60.000MB -8.3%) vs baseline: +4.8%


forktime - 4/4

✅ baseline

Time: ✅ 1.984ms (SLO: <3.000ms 📉 -33.9%) vs baseline: +5.7%

Memory: ✅ 29.334MB (SLO: <33.000MB 📉 -11.1%) vs baseline: +5.1%


✅ configured

Time: ✅ 8.629ms (SLO: <13.000ms 📉 -33.6%) vs baseline: ~same

Memory: ✅ 54.995MB (SLO: <60.000MB -8.3%) vs baseline: +4.9%


httppropagationextract - 60/60

✅ all_styles_all_headers

Time: ✅ 78.861µs (SLO: <100.000µs 📉 -21.1%) vs baseline: +4.9%

Memory: ✅ 35.704MB (SLO: <38.000MB -6.0%) vs baseline: +4.9%


✅ b3_headers

Time: ✅ 12.829µs (SLO: <20.000µs 📉 -35.9%) vs baseline: +0.2%

Memory: ✅ 35.704MB (SLO: <38.000MB -6.0%) vs baseline: +5.0%


✅ b3_single_headers

Time: ✅ 11.856µs (SLO: <20.000µs 📉 -40.7%) vs baseline: -0.3%

Memory: ✅ 35.606MB (SLO: <38.000MB -6.3%) vs baseline: +4.7%


✅ datadog_tracecontext_tracestate_not_propagated_on_trace_id_no_match

Time: ✅ 60.643µs (SLO: <80.000µs 📉 -24.2%) vs baseline: +0.2%

Memory: ✅ 35.724MB (SLO: <38.000MB -6.0%) vs baseline: +5.2%


✅ datadog_tracecontext_tracestate_propagated_on_trace_id_match

Time: ✅ 62.369µs (SLO: <80.000µs 📉 -22.0%) vs baseline: ~same

Memory: ✅ 35.645MB (SLO: <38.000MB -6.2%) vs baseline: +5.0%


✅ empty_headers

Time: ✅ 1.312µs (SLO: <10.000µs 📉 -86.9%) vs baseline: +0.1%

Memory: ✅ 35.586MB (SLO: <38.000MB -6.4%) vs baseline: +4.8%


✅ full_t_id_datadog_headers

Time: ✅ 20.752µs (SLO: <30.000µs 📉 -30.8%) vs baseline: -0.5%

Memory: ✅ 35.645MB (SLO: <38.000MB -6.2%) vs baseline: +4.7%


✅ invalid_priority_header

Time: ✅ 5.920µs (SLO: <10.000µs 📉 -40.8%) vs baseline: ~same

Memory: ✅ 35.507MB (SLO: <38.000MB -6.6%) vs baseline: +4.5%


✅ invalid_span_id_header

Time: ✅ 5.912µs (SLO: <10.000µs 📉 -40.9%) vs baseline: -0.4%

Memory: ✅ 35.606MB (SLO: <38.000MB -6.3%) vs baseline: +4.7%


✅ invalid_tags_header

Time: ✅ 5.926µs (SLO: <10.000µs 📉 -40.7%) vs baseline: -0.2%

Memory: ✅ 35.704MB (SLO: <38.000MB -6.0%) vs baseline: +4.8%


✅ invalid_trace_id_header

Time: ✅ 5.928µs (SLO: <10.000µs 📉 -40.7%) vs baseline: ~same

Memory: ✅ 35.783MB (SLO: <38.000MB -5.8%) vs baseline: +5.4%


✅ large_header_no_matches

Time: ✅ 26.885µs (SLO: <30.000µs 📉 -10.4%) vs baseline: +0.4%

Memory: ✅ 35.743MB (SLO: <38.000MB -5.9%) vs baseline: +5.1%


✅ large_valid_headers_all

Time: ✅ 28.112µs (SLO: <40.000µs 📉 -29.7%) vs baseline: -0.2%

Memory: ✅ 35.586MB (SLO: <38.000MB -6.4%) vs baseline: +4.6%


✅ medium_header_no_matches

Time: ✅ 9.221µs (SLO: <20.000µs 📉 -53.9%) vs baseline: -0.2%

Memory: ✅ 35.645MB (SLO: <38.000MB -6.2%) vs baseline: +4.8%


✅ medium_valid_headers_all

Time: ✅ 10.679µs (SLO: <20.000µs 📉 -46.6%) vs baseline: +0.1%

Memory: ✅ 35.684MB (SLO: <38.000MB -6.1%) vs baseline: +5.2%


✅ none_propagation_style

Time: ✅ 1.398µs (SLO: <10.000µs 📉 -86.0%) vs baseline: -0.3%

Memory: ✅ 35.606MB (SLO: <38.000MB -6.3%) vs baseline: +4.6%


✅ tracecontext_headers

Time: ✅ 32.669µs (SLO: <40.000µs 📉 -18.3%) vs baseline: -0.6%

Memory: ✅ 35.684MB (SLO: <38.000MB -6.1%) vs baseline: +5.0%


✅ valid_headers_all

Time: ✅ 5.919µs (SLO: <10.000µs 📉 -40.8%) vs baseline: -0.3%

Memory: ✅ 35.645MB (SLO: <38.000MB -6.2%) vs baseline: +4.9%


✅ valid_headers_basic

Time: ✅ 5.483µs (SLO: <10.000µs 📉 -45.2%) vs baseline: -0.9%

Memory: ✅ 35.743MB (SLO: <38.000MB -5.9%) vs baseline: +5.1%


✅ wsgi_empty_headers

Time: ✅ 1.317µs (SLO: <10.000µs 📉 -86.8%) vs baseline: +0.8%

Memory: ✅ 35.724MB (SLO: <38.000MB -6.0%) vs baseline: +5.1%


✅ wsgi_invalid_priority_header

Time: ✅ 5.965µs (SLO: <10.000µs 📉 -40.4%) vs baseline: -0.3%

Memory: ✅ 35.625MB (SLO: <38.000MB -6.2%) vs baseline: +4.8%


✅ wsgi_invalid_span_id_header

Time: ✅ 1.309µs (SLO: <10.000µs 📉 -86.9%) vs baseline: +0.4%

Memory: ✅ 35.684MB (SLO: <38.000MB -6.1%) vs baseline: +4.8%


✅ wsgi_invalid_tags_header

Time: ✅ 5.968µs (SLO: <10.000µs 📉 -40.3%) vs baseline: ~same

Memory: ✅ 35.625MB (SLO: <38.000MB -6.2%) vs baseline: +4.7%


✅ wsgi_invalid_trace_id_header

Time: ✅ 5.947µs (SLO: <10.000µs 📉 -40.5%) vs baseline: -0.7%

Memory: ✅ 35.684MB (SLO: <38.000MB -6.1%) vs baseline: +5.0%


✅ wsgi_large_header_no_matches

Time: ✅ 28.018µs (SLO: <40.000µs 📉 -30.0%) vs baseline: -0.5%

Memory: ✅ 35.527MB (SLO: <38.000MB -6.5%) vs baseline: +4.3%


✅ wsgi_large_valid_headers_all

Time: ✅ 29.270µs (SLO: <40.000µs 📉 -26.8%) vs baseline: +0.2%

Memory: ✅ 35.606MB (SLO: <38.000MB -6.3%) vs baseline: +4.7%


✅ wsgi_medium_header_no_matches

Time: ✅ 9.531µs (SLO: <20.000µs 📉 -52.3%) vs baseline: ~same

Memory: ✅ 35.586MB (SLO: <38.000MB -6.4%) vs baseline: +4.9%


✅ wsgi_medium_valid_headers_all

Time: ✅ 11.063µs (SLO: <20.000µs 📉 -44.7%) vs baseline: +0.6%

Memory: ✅ 35.586MB (SLO: <38.000MB -6.4%) vs baseline: +4.7%


✅ wsgi_valid_headers_all

Time: ✅ 5.965µs (SLO: <10.000µs 📉 -40.3%) vs baseline: -0.5%

Memory: ✅ 35.547MB (SLO: <38.000MB -6.5%) vs baseline: +4.5%


✅ wsgi_valid_headers_basic

Time: ✅ 5.518µs (SLO: <10.000µs 📉 -44.8%) vs baseline: -0.5%

Memory: ✅ 35.724MB (SLO: <38.000MB -6.0%) vs baseline: +5.0%


httppropagationinject - 16/16

✅ ids_only

Time: ✅ 19.759µs (SLO: <30.000µs 📉 -34.1%) vs baseline: +4.0%

Memory: ✅ 35.665MB (SLO: <38.000MB -6.1%) vs baseline: +4.8%


✅ with_all

Time: ✅ 25.229µs (SLO: <40.000µs 📉 -36.9%) vs baseline: ~same

Memory: ✅ 35.743MB (SLO: <38.000MB -5.9%) vs baseline: +4.9%


✅ with_dd_origin

Time: ✅ 22.362µs (SLO: <30.000µs 📉 -25.5%) vs baseline: -1.0%

Memory: ✅ 35.665MB (SLO: <38.000MB -6.1%) vs baseline: +4.5%


✅ with_priority_and_origin

Time: ✅ 21.874µs (SLO: <40.000µs 📉 -45.3%) vs baseline: -0.8%

Memory: ✅ 35.704MB (SLO: <38.000MB -6.0%) vs baseline: +4.8%


✅ with_sampling_priority

Time: ✅ 18.982µs (SLO: <30.000µs 📉 -36.7%) vs baseline: -0.3%

Memory: ✅ 35.645MB (SLO: <38.000MB -6.2%) vs baseline: +4.7%


✅ with_tags

Time: ✅ 23.368µs (SLO: <40.000µs 📉 -41.6%) vs baseline: ~same

Memory: ✅ 35.704MB (SLO: <38.000MB -6.0%) vs baseline: +4.9%


✅ with_tags_invalid

Time: ✅ 24.622µs (SLO: <40.000µs 📉 -38.4%) vs baseline: -0.1%

Memory: ✅ 35.743MB (SLO: <38.000MB -5.9%) vs baseline: +5.0%


✅ with_tags_max_size

Time: ✅ 23.675µs (SLO: <40.000µs 📉 -40.8%) vs baseline: -0.2%

Memory: ✅ 35.684MB (SLO: <38.000MB -6.1%) vs baseline: +5.1%


iastaspectssplit - 12/12

✅ rsplit_aspect

Time: ✅ 154.057µs (SLO: <250.000µs 📉 -38.4%) vs baseline: +3.0%

Memory: ✅ 42.467MB (SLO: <46.000MB -7.7%) vs baseline: +3.8%


✅ rsplit_noaspect

Time: ✅ 157.041µs (SLO: <250.000µs 📉 -37.2%) vs baseline: -0.3%

Memory: ✅ 42.546MB (SLO: <46.000MB -7.5%) vs baseline: +4.8%


✅ split_aspect

Time: ✅ 150.066µs (SLO: <250.000µs 📉 -40.0%) vs baseline: +0.8%

Memory: ✅ 42.644MB (SLO: <46.000MB -7.3%) vs baseline: +4.0%


✅ split_noaspect

Time: ✅ 153.403µs (SLO: <250.000µs 📉 -38.6%) vs baseline: ~same

Memory: ✅ 42.487MB (SLO: <46.000MB -7.6%) vs baseline: +4.4%


✅ splitlines_aspect

Time: ✅ 146.900µs (SLO: <250.000µs 📉 -41.2%) vs baseline: +0.7%

Memory: ✅ 42.920MB (SLO: <46.000MB -6.7%) vs baseline: +5.8%


✅ splitlines_noaspect

Time: ✅ 150.382µs (SLO: <250.000µs 📉 -39.8%) vs baseline: +0.3%

Memory: ✅ 42.723MB (SLO: <46.000MB -7.1%) vs baseline: +5.3%


iastpropagation - 8/8

✅ no-propagation

Time: ✅ 48.635µs (SLO: <60.000µs 📉 -18.9%) vs baseline: +0.2%

Memory: ✅ 38.909MB (SLO: <42.000MB -7.4%) vs baseline: +4.5%


✅ propagation_enabled

Time: ✅ 135.761µs (SLO: <190.000µs 📉 -28.5%) vs baseline: -0.1%

Memory: ✅ 38.987MB (SLO: <42.000MB -7.2%) vs baseline: +4.8%


✅ propagation_enabled_100

Time: ✅ 1.569ms (SLO: <2.300ms 📉 -31.8%) vs baseline: -0.4%

Memory: ✅ 38.909MB (SLO: <42.000MB -7.4%) vs baseline: +4.5%


✅ propagation_enabled_1000

Time: ✅ 29.317ms (SLO: <34.550ms 📉 -15.1%) vs baseline: +1.5%

Memory: ✅ 39.105MB (SLO: <42.000MB -6.9%) vs baseline: +5.1%


otelsdkspan - 24/24

✅ add-event

Time: ✅ 38.862ms (SLO: <42.000ms -7.5%) vs baseline: +0.4%

Memory: ✅ 38.260MB (SLO: <40.750MB -6.1%) vs baseline: +5.7%


✅ add-link

Time: ✅ 34.935ms (SLO: <38.550ms -9.4%) vs baseline: +0.2%

Memory: ✅ 38.063MB (SLO: <40.750MB -6.6%) vs baseline: +4.8%


✅ add-metrics

Time: ✅ 217.242ms (SLO: <232.000ms -6.4%) vs baseline: -0.3%

Memory: ✅ 38.181MB (SLO: <40.750MB -6.3%) vs baseline: +5.3%


✅ add-tags

Time: ✅ 209.771ms (SLO: <221.600ms -5.3%) vs baseline: -0.3%

Memory: ✅ 38.004MB (SLO: <40.750MB -6.7%) vs baseline: +5.0%


✅ get-context

Time: ✅ 27.987ms (SLO: <31.300ms 📉 -10.6%) vs baseline: +0.3%

Memory: ✅ 38.122MB (SLO: <40.750MB -6.4%) vs baseline: +4.7%


✅ is-recording

Time: ✅ 27.910ms (SLO: <31.000ms -10.0%) vs baseline: +0.2%

Memory: ✅ 37.945MB (SLO: <40.750MB -6.9%) vs baseline: +4.6%


✅ record-exception

Time: ✅ 61.147ms (SLO: <65.850ms -7.1%) vs baseline: +0.1%

Memory: ✅ 38.122MB (SLO: <40.750MB -6.4%) vs baseline: +5.1%


✅ set-status

Time: ✅ 30.682ms (SLO: <34.150ms 📉 -10.2%) vs baseline: +0.4%

Memory: ✅ 38.044MB (SLO: <40.750MB -6.6%) vs baseline: +5.0%


✅ start

Time: ✅ 28.282ms (SLO: <30.150ms -6.2%) vs baseline: +1.8%

Memory: ✅ 38.181MB (SLO: <40.750MB -6.3%) vs baseline: +4.9%


✅ start-finish

Time: ✅ 32.734ms (SLO: <35.350ms -7.4%) vs baseline: ~same

Memory: ✅ 38.122MB (SLO: <40.750MB -6.4%) vs baseline: +5.1%


✅ start-finish-telemetry

Time: ✅ 32.616ms (SLO: <35.450ms -8.0%) vs baseline: -0.3%

Memory: ✅ 37.965MB (SLO: <40.750MB -6.8%) vs baseline: +4.6%


✅ update-name

Time: ✅ 29.884ms (SLO: <33.400ms 📉 -10.5%) vs baseline: ~same

Memory: ✅ 38.162MB (SLO: <40.750MB -6.4%) vs baseline: +5.2%


otelspan - 22/22

✅ add-event

Time: ✅ 39.421ms (SLO: <47.150ms 📉 -16.4%) vs baseline: +0.3%

Memory: ✅ 40.486MB (SLO: <47.000MB 📉 -13.9%) vs baseline: +4.8%


✅ add-metrics

Time: ✅ 249.746ms (SLO: <344.800ms 📉 -27.6%) vs baseline: -0.3%

Memory: ✅ 44.986MB (SLO: <47.500MB -5.3%) vs baseline: +4.9%


✅ add-tags

Time: ✅ 304.268ms (SLO: <330.000ms -7.8%) vs baseline: ~same

Memory: ✅ 44.974MB (SLO: <47.500MB -5.3%) vs baseline: +4.8%


✅ get-context

Time: ✅ 79.162ms (SLO: <92.350ms 📉 -14.3%) vs baseline: ~same

Memory: ✅ 40.953MB (SLO: <46.500MB 📉 -11.9%) vs baseline: +5.0%


✅ is-recording

Time: ✅ 35.634ms (SLO: <44.500ms 📉 -19.9%) vs baseline: +0.2%

Memory: ✅ 40.393MB (SLO: <47.500MB 📉 -15.0%) vs baseline: +4.5%


✅ record-exception

Time: ✅ 57.969ms (SLO: <67.650ms 📉 -14.3%) vs baseline: -0.5%

Memory: ✅ 41.086MB (SLO: <47.000MB 📉 -12.6%) vs baseline: +5.4%


✅ set-status

Time: ✅ 42.013ms (SLO: <50.400ms 📉 -16.6%) vs baseline: ~same

Memory: ✅ 40.359MB (SLO: <47.000MB 📉 -14.1%) vs baseline: +4.7%


✅ start

Time: ✅ 36.810ms (SLO: <43.450ms 📉 -15.3%) vs baseline: +4.9%

Memory: ✅ 40.388MB (SLO: <47.000MB 📉 -14.1%) vs baseline: +4.8%


✅ start-finish

Time: ✅ 82.930ms (SLO: <90.000ms -7.9%) vs baseline: ~same

Memory: ✅ 38.142MB (SLO: <46.500MB 📉 -18.0%) vs baseline: +4.9%


✅ start-finish-telemetry

Time: ✅ 84.430ms (SLO: <91.000ms -7.2%) vs baseline: -0.2%

Memory: ✅ 38.181MB (SLO: <46.500MB 📉 -17.9%) vs baseline: +4.9%


✅ update-name

Time: ✅ 36.740ms (SLO: <45.150ms 📉 -18.6%) vs baseline: +0.4%

Memory: ✅ 40.487MB (SLO: <47.000MB 📉 -13.9%) vs baseline: +4.6%


packagespackageforrootmodulemapping - 4/4

✅ cache_off

Time: ✅ 341.865ms (SLO: <354.300ms -3.5%) vs baseline: +0.2%

Memory: ✅ 41.260MB (SLO: <46.000MB 📉 -10.3%) vs baseline: +5.7%


✅ cache_on

Time: ✅ 0.383µs (SLO: <10.000µs 📉 -96.2%) vs baseline: -0.3%

Memory: ✅ 40.324MB (SLO: <46.000MB 📉 -12.3%) vs baseline: +5.0%


rand - 2/2

✅ rand128bits

Time: ✅ 0.191µs (SLO: <21.000µs 📉 -99.1%) vs baseline: ~same


✅ rand64bits

Time: ✅ 0.126µs (SLO: <15.000µs 📉 -99.2%) vs baseline: ~same


ratelimiter - 12/12

✅ defaults

Time: ✅ 2.325µs (SLO: <10.000µs 📉 -76.7%) vs baseline: +0.2%

Memory: ✅ 35.645MB (SLO: <38.000MB -6.2%) vs baseline: +5.2%


✅ high_rate_limit

Time: ✅ 2.405µs (SLO: <10.000µs 📉 -76.0%) vs baseline: +1.7%

Memory: ✅ 35.547MB (SLO: <38.000MB -6.5%) vs baseline: +4.6%


✅ long_window

Time: ✅ 2.324µs (SLO: <10.000µs 📉 -76.8%) vs baseline: +0.4%

Memory: ✅ 35.547MB (SLO: <38.000MB -6.5%) vs baseline: +4.5%


✅ low_rate_limit

Time: ✅ 2.348µs (SLO: <10.000µs 📉 -76.5%) vs baseline: +0.7%

Memory: ✅ 35.606MB (SLO: <38.000MB -6.3%) vs baseline: +5.0%


✅ no_rate_limit

Time: ✅ 0.823µs (SLO: <10.000µs 📉 -91.8%) vs baseline: +1.1%

Memory: ✅ 35.586MB (SLO: <38.000MB -6.4%) vs baseline: +4.9%


✅ short_window

Time: ✅ 2.458µs (SLO: <10.000µs 📉 -75.4%) vs baseline: +0.1%

Memory: ✅ 35.665MB (SLO: <38.000MB -6.1%) vs baseline: +4.9%


recursivecomputation - 8/8

✅ deep

Time: ✅ 309.373ms (SLO: <320.950ms -3.6%) vs baseline: ~same

Memory: ✅ 36.353MB (SLO: <38.750MB -6.2%) vs baseline: +4.6%


✅ deep-profiled

Time: ✅ 329.636ms (SLO: <359.150ms -8.2%) vs baseline: ~same

Memory: ✅ 42.448MB (SLO: <46.000MB -7.7%) vs baseline: +5.2%


✅ medium

Time: ✅ 7.037ms (SLO: <7.400ms -4.9%) vs baseline: -0.4%

Memory: ✅ 35.547MB (SLO: <38.000MB -6.5%) vs baseline: +5.0%


✅ shallow

Time: ✅ 0.970ms (SLO: <1.050ms -7.7%) vs baseline: +2.2%

Memory: ✅ 35.527MB (SLO: <38.000MB -6.5%) vs baseline: +4.7%


samplingrules - 8/8

✅ average_match

Time: ✅ 147.071µs (SLO: <290.000µs 📉 -49.3%) vs baseline: -1.1%

Memory: ✅ 35.665MB (SLO: <38.000MB -6.1%) vs baseline: +4.9%


✅ high_match

Time: ✅ 190.776µs (SLO: <480.000µs 📉 -60.3%) vs baseline: -0.4%

Memory: ✅ 35.547MB (SLO: <38.000MB -6.5%) vs baseline: +5.1%


✅ low_match

Time: ✅ 100.074µs (SLO: <120.000µs 📉 -16.6%) vs baseline: -0.6%

Memory: ✅ 700.815MB (SLO: <780.000MB 📉 -10.2%) vs baseline: +4.9%


✅ very_low_match

Time: ✅ 2.884ms (SLO: <8.500ms 📉 -66.1%) vs baseline: +0.6%

Memory: ✅ 77.977MB (SLO: <85.000MB -8.3%) vs baseline: +4.8%


sethttpmeta - 32/32

✅ all-disabled

Time: ✅ 10.521µs (SLO: <20.000µs 📉 -47.4%) vs baseline: +0.4%

Memory: ✅ 36.019MB (SLO: <38.750MB -7.0%) vs baseline: +4.8%


✅ all-enabled

Time: ✅ 40.927µs (SLO: <50.000µs 📉 -18.1%) vs baseline: +2.2%

Memory: ✅ 36.137MB (SLO: <38.750MB -6.7%) vs baseline: +5.2%


✅ collectipvariant_exists

Time: ✅ 40.861µs (SLO: <50.000µs 📉 -18.3%) vs baseline: +0.2%

Memory: ✅ 36.038MB (SLO: <38.750MB -7.0%) vs baseline: +4.6%


✅ no-collectipvariant

Time: ✅ 40.061µs (SLO: <50.000µs 📉 -19.9%) vs baseline: +0.3%

Memory: ✅ 36.038MB (SLO: <38.750MB -7.0%) vs baseline: +4.9%


✅ no-useragentvariant

Time: ✅ 38.792µs (SLO: <50.000µs 📉 -22.4%) vs baseline: +0.5%

Memory: ✅ 36.097MB (SLO: <38.750MB -6.8%) vs baseline: +4.7%


✅ obfuscation-no-query

Time: ✅ 40.585µs (SLO: <50.000µs 📉 -18.8%) vs baseline: +0.2%

Memory: ✅ 36.078MB (SLO: <38.750MB -6.9%) vs baseline: +5.1%


✅ obfuscation-regular-case-explicit-query

Time: ✅ 76.367µs (SLO: <90.000µs 📉 -15.1%) vs baseline: +0.1%

Memory: ✅ 36.313MB (SLO: <38.750MB -6.3%) vs baseline: +4.5%


✅ obfuscation-regular-case-implicit-query

Time: ✅ 76.589µs (SLO: <90.000µs 📉 -14.9%) vs baseline: -0.3%

Memory: ✅ 36.451MB (SLO: <38.750MB -5.9%) vs baseline: +5.1%


✅ obfuscation-send-querystring-disabled

Time: ✅ 155.357µs (SLO: <170.000µs -8.6%) vs baseline: +0.2%

Memory: ✅ 36.333MB (SLO: <38.750MB -6.2%) vs baseline: +4.6%


✅ obfuscation-worst-case-explicit-query

Time: ✅ 149.838µs (SLO: <160.000µs -6.4%) vs baseline: +0.3%

Memory: ✅ 36.274MB (SLO: <38.750MB -6.4%) vs baseline: +4.5%


✅ obfuscation-worst-case-implicit-query

Time: ✅ 155.667µs (SLO: <170.000µs -8.4%) vs baseline: ~same

Memory: ✅ 36.392MB (SLO: <38.750MB -6.1%) vs baseline: +4.8%


✅ useragentvariant_exists_1

Time: ✅ 39.546µs (SLO: <50.000µs 📉 -20.9%) vs baseline: +0.4%

Memory: ✅ 36.038MB (SLO: <38.750MB -7.0%) vs baseline: +4.2%


✅ useragentvariant_exists_2

Time: ✅ 40.647µs (SLO: <50.000µs 📉 -18.7%) vs baseline: -0.1%

Memory: ✅ 36.038MB (SLO: <38.750MB -7.0%) vs baseline: +5.0%


✅ useragentvariant_exists_3

Time: ✅ 40.039µs (SLO: <50.000µs 📉 -19.9%) vs baseline: +0.2%

Memory: ✅ 35.960MB (SLO: <38.750MB -7.2%) vs baseline: +4.4%


✅ useragentvariant_not_exists_1

Time: ✅ 39.541µs (SLO: <50.000µs 📉 -20.9%) vs baseline: +0.2%

Memory: ✅ 35.999MB (SLO: <38.750MB -7.1%) vs baseline: +4.8%


✅ useragentvariant_not_exists_2

Time: ✅ 39.559µs (SLO: <50.000µs 📉 -20.9%) vs baseline: +0.3%

Memory: ✅ 35.920MB (SLO: <38.750MB -7.3%) vs baseline: +4.6%


span - 26/26

✅ add-event

Time: ✅ 18.504ms (SLO: <22.500ms 📉 -17.8%) vs baseline: -0.9%

Memory: ✅ 37.618MB (SLO: <53.000MB 📉 -29.0%) vs baseline: +4.8%


✅ add-metrics

Time: ✅ 88.308ms (SLO: <93.500ms -5.6%) vs baseline: -0.3%

Memory: ✅ 42.290MB (SLO: <53.000MB 📉 -20.2%) vs baseline: +5.1%


✅ add-tags

Time: ✅ 144.133ms (SLO: <155.000ms -7.0%) vs baseline: -0.8%

Memory: ✅ 42.049MB (SLO: <53.000MB 📉 -20.7%) vs baseline: +4.5%


✅ get-context

Time: ✅ 17.008ms (SLO: <20.500ms 📉 -17.0%) vs baseline: ~same

Memory: ✅ 37.513MB (SLO: <53.000MB 📉 -29.2%) vs baseline: +4.7%


✅ is-recording

Time: ✅ 17.086ms (SLO: <20.500ms 📉 -16.7%) vs baseline: -0.2%

Memory: ✅ 37.513MB (SLO: <53.000MB 📉 -29.2%) vs baseline: +4.5%


✅ record-exception

Time: ✅ 38.156ms (SLO: <41.000ms -6.9%) vs baseline: +0.5%

Memory: ✅ 38.136MB (SLO: <53.000MB 📉 -28.0%) vs baseline: +5.1%


✅ set-status

Time: ✅ 18.912ms (SLO: <22.000ms 📉 -14.0%) vs baseline: +0.5%

Memory: ✅ 37.513MB (SLO: <53.000MB 📉 -29.2%) vs baseline: +4.7%


✅ start

Time: ✅ 17.990ms (SLO: <20.500ms 📉 -12.2%) vs baseline: +6.1%

Memory: ✅ 37.552MB (SLO: <53.000MB 📉 -29.1%) vs baseline: +5.0%


✅ start-finish

Time: ✅ 53.592ms (SLO: <56.000ms -4.3%) vs baseline: +0.5%

Memory: ✅ 35.566MB (SLO: <38.000MB -6.4%) vs baseline: +4.7%


✅ start-finish-telemetry

Time: ✅ 54.518ms (SLO: <58.000ms -6.0%) vs baseline: -0.8%

Memory: ✅ 35.547MB (SLO: <38.000MB -6.5%) vs baseline: +4.9%


✅ start-finish-traceid128

Time: ✅ 56.246ms (SLO: <60.000ms -6.3%) vs baseline: +0.1%

Memory: ✅ 35.507MB (SLO: <38.000MB -6.6%) vs baseline: +4.7%


✅ start-traceid128

Time: ✅ 16.940ms (SLO: <22.500ms 📉 -24.7%) vs baseline: ~same

Memory: ✅ 37.513MB (SLO: <53.000MB 📉 -29.2%) vs baseline: +5.0%


✅ update-name

Time: ✅ 17.600ms (SLO: <22.000ms 📉 -20.0%) vs baseline: +0.3%

Memory: ✅ 37.532MB (SLO: <53.000MB 📉 -29.2%) vs baseline: +4.5%


telemetryaddmetric - 30/30

✅ 1-count-metric-1-times

Time: ✅ 2.253µs (SLO: <20.000µs 📉 -88.7%) vs baseline: +7.0%

Memory: ✅ 35.566MB (SLO: <38.000MB -6.4%) vs baseline: +4.9%


✅ 1-count-metrics-100-times

Time: ✅ 148.184µs (SLO: <220.000µs 📉 -32.6%) vs baseline: -0.6%

Memory: ✅ 35.547MB (SLO: <38.000MB -6.5%) vs baseline: +4.9%


✅ 1-distribution-metric-1-times

Time: ✅ 2.446µs (SLO: <20.000µs 📉 -87.8%) vs baseline: -0.9%

Memory: ✅ 35.507MB (SLO: <38.000MB -6.6%) vs baseline: +4.9%


✅ 1-distribution-metrics-100-times

Time: ✅ 164.038µs (SLO: <230.000µs 📉 -28.7%) vs baseline: +0.3%

Memory: ✅ 35.645MB (SLO: <38.000MB -6.2%) vs baseline: +5.3%


✅ 1-gauge-metric-1-times

Time: ✅ 1.928µs (SLO: <20.000µs 📉 -90.4%) vs baseline: -1.2%

Memory: ✅ 35.645MB (SLO: <38.000MB -6.2%) vs baseline: +5.1%


✅ 1-gauge-metrics-100-times

Time: ✅ 136.173µs (SLO: <150.000µs -9.2%) vs baseline: -1.7%

Memory: ✅ 35.566MB (SLO: <38.000MB -6.4%) vs baseline: +4.8%


✅ 1-rate-metric-1-times

Time: ✅ 2.237µs (SLO: <20.000µs 📉 -88.8%) vs baseline: +1.0%

Memory: ✅ 35.606MB (SLO: <38.000MB -6.3%) vs baseline: +5.1%


✅ 1-rate-metrics-100-times

Time: ✅ 162.160µs (SLO: <250.000µs 📉 -35.1%) vs baseline: -0.6%

Memory: ✅ 35.645MB (SLO: <38.000MB -6.2%) vs baseline: +5.1%


✅ 100-count-metrics-100-times

Time: ✅ 15.133ms (SLO: <22.000ms 📉 -31.2%) vs baseline: -0.6%

Memory: ✅ 35.586MB (SLO: <38.000MB -6.4%) vs baseline: +4.9%


✅ 100-distribution-metrics-100-times

Time: ✅ 1.761ms (SLO: <2.550ms 📉 -30.9%) vs baseline: +1.0%

Memory: ✅ 35.606MB (SLO: <38.000MB -6.3%) vs baseline: +5.1%


✅ 100-gauge-metrics-100-times

Time: ✅ 1.403ms (SLO: <1.550ms -9.5%) vs baseline: ~same

Memory: ✅ 35.625MB (SLO: <38.000MB -6.2%) vs baseline: +5.0%


✅ 100-rate-metrics-100-times

Time: ✅ 1.691ms (SLO: <2.550ms 📉 -33.7%) vs baseline: ~same

Memory: ✅ 35.665MB (SLO: <38.000MB -6.1%) vs baseline: +5.1%


✅ flush-1-metric

Time: ✅ 3.625µs (SLO: <20.000µs 📉 -81.9%) vs baseline: +0.3%

Memory: ✅ 35.566MB (SLO: <38.000MB -6.4%) vs baseline: +5.2%


✅ flush-100-metrics

Time: ✅ 175.232µs (SLO: <250.000µs 📉 -29.9%) vs baseline: +0.1%

Memory: ✅ 35.606MB (SLO: <38.000MB -6.3%) vs baseline: +5.0%


✅ flush-1000-metrics

Time: ✅ 2.187ms (SLO: <2.500ms 📉 -12.5%) vs baseline: -0.5%

Memory: ✅ 36.294MB (SLO: <38.750MB -6.3%) vs baseline: +4.9%

ℹ️ Scenarios Missing SLO Configuration (46 scenarios)

The following scenarios exist in candidate data but have no SLO thresholds configured:

  • coreapiscenario-core_dispatch_listeners
  • coreapiscenario-core_dispatch_no_listeners
  • coreapiscenario-core_dispatch_with_results_listeners
  • coreapiscenario-core_dispatch_with_results_no_listeners
  • djangosimple-baseline
  • errortrackingdjangosimple-baseline
  • errortrackingflasksqli-baseline
  • flasksimple-baseline
  • flasksqli-baseline
  • iast_aspects-re_expand_aspect
  • iast_aspects-re_expand_noaspect
  • iast_aspects-re_findall_aspect
  • iast_aspects-re_findall_noaspect
  • iast_aspects-re_finditer_aspect
  • iast_aspects-re_finditer_noaspect
  • iast_aspects-re_fullmatch_aspect
  • iast_aspects-re_fullmatch_noaspect
  • iast_aspects-re_group_aspect
  • iast_aspects-re_group_noaspect
  • iast_aspects-re_groups_aspect
  • iast_aspects-re_groups_noaspect
  • iast_aspects-re_match_aspect
  • iast_aspects-re_match_noaspect
  • iast_aspects-re_search_aspect
  • iast_aspects-re_search_noaspect
  • iast_aspects-re_sub_aspect
  • iast_aspects-re_sub_noaspect
  • iast_aspects-re_subn_aspect
  • iast_aspects-re_subn_noaspect
  • sethttpmeta-obfuscation-disabled
  • startup-baseline
  • startup-baseline_django
  • startup-baseline_flask
  • startup-ddtrace_run
  • startup-ddtrace_run_appsec
  • startup-ddtrace_run_profiling
  • startup-ddtrace_run_runtime_metrics
  • startup-ddtrace_run_send_span
  • startup-ddtrace_run_telemetry_disabled
  • startup-ddtrace_run_telemetry_enabled
  • startup-import_ddtrace
  • startup-import_ddtrace_auto
  • startup-import_ddtrace_auto_django
  • startup-import_ddtrace_auto_flask
  • startup-import_ddtrace_django
  • startup-import_ddtrace_flask

@cit-pr-commenter-54b7da
Copy link

cit-pr-commenter-54b7da bot commented Feb 20, 2026

Codeowners resolved as

ddtrace/llmobs/_optimizers/__init__.py                                  @DataDog/ml-observability
ddtrace/llmobs/_optimizers/gepa_strategy.py                             @DataDog/ml-observability
ddtrace/llmobs/_prompt_optimization.py                                  @DataDog/ml-observability

@datadog-official

This comment has been minimized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant