Skip to content

fix(telemetry): show last attempted key labels for failed requests#3042

Open
loss-and-quick wants to merge 1 commit intomaximhq:mainfrom
loss-and-quick:fix/telemetry-selected-key-on-error
Open

fix(telemetry): show last attempted key labels for failed requests#3042
loss-and-quick wants to merge 1 commit intomaximhq:mainfrom
loss-and-quick:fix/telemetry-selected-key-on-error

Conversation

@loss-and-quick
Copy link
Copy Markdown
Contributor

@loss-and-quick loss-and-quick commented Apr 25, 2026

Summary

When an LLM request fails, core/bifrost.go intentionally clears BifrostContextKeySelectedKeyID/Name so selected_key only reflects a key that served a successful response. As a side effect, the telemetry plugin recorded empty strings for selected_key_id/selected_key_name labels on failed requests, making it impossible to attribute failures to specific keys in metrics dashboards.

Changes

  • In plugins/telemetry/main.go: when selectedKeyID is empty, fall back to the last entry in attemptTrail to recover key ID and name before recording metrics labels.
  • Design note: attemptTrail is the authoritative record of attempted keys per core/bifrost.go. The last entry represents the key associated with the final (failed) attempt. Metric labels for failed requests will now show this key rather than an empty string.

Type of change

  • Bug fix

Affected areas

  • Plugins

How to test

  1. Configure a provider with an invalid/expired key.
  2. Send a request through Bifrost — it should fail.
  3. Check Prometheus metrics (e.g. bifrost_requests_total) — verify selected_key_id and selected_key_name labels are populated for the failed request.
cd plugins/telemetry
go test ./...

Breaking changes

  • No

Related issues

Related to #3040 (same root cause, telemetry counterpart of the logging fix).

Security considerations

None. Key ID and name are already stored in attemptTrail; this change only reads from it.

Checklist

  • I read docs/contributing/README.md and followed the guidelines
  • I verified builds succeed (Go and UI)

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 25, 2026

📝 Walkthrough

Walkthrough

The PostLLMHook in the telemetry plugin now backfills selected_key_id and selected_key_name label values from the final attemptTrail entry when the context-provided selectedKeyID is empty, ensuring metric labels are consistently populated.

Changes

Cohort / File(s) Summary
Telemetry backfill logic
plugins/telemetry/main.go
Added conditional backfilling in PostLLMHook to extract selected key ID and name from the final attempt trail entry when context value is absent.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Possibly related issues

Poem

🐰 A label was lost in the metrics' deep warren,
But this clever patch made our telemetry sparren!
From trails left behind, the keys now take flight,
Selected and tracked, shining metrics bright! ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main change: fixing telemetry to show the last attempted key labels for failed requests, directly addressing the root problem.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The PR description comprehensively covers all required template sections with clear, specific details about the problem, solution, testing approach, and security implications.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 25, 2026

Confidence Score: 5/5

Safe to merge — single-file change with correct fallback logic and no risk of data loss or incorrect label overwriting on success paths.

The only finding is a P2 suggestion about adding unit tests; no logic errors, security issues, or correctness problems were identified. The fallback condition is gated on selectedKeyID == "", so successful requests are unaffected.

No files require special attention.

Important Files Changed

Filename Overview
plugins/telemetry/main.go Adds a fallback to populate selectedKeyID/selectedKeyName from the last attemptTrail entry when the context values are empty (i.e., on failed requests); logic is correct and correctly placed before label map construction.

Reviews (1): Last reviewed commit: "fix(telemetry): show last attempted key ..." | Re-trigger Greptile

Comment thread plugins/telemetry/main.go
Comment on lines +405 to +409
if selectedKeyID == "" && len(attemptTrail) > 0 {
last := attemptTrail[len(attemptTrail)-1]
selectedKeyID = last.KeyID
selectedKeyName = last.KeyName
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 No unit tests for the fallback path

The PR description references go test ./... in plugins/telemetry, but there are no *_test.go files in that directory. The fallback logic (lines 405–409) is straightforward, but a table-driven test covering (a) failed request with non-empty attemptTrail, (b) failed request with empty attemptTrail, and (c) successful request where selectedKeyID is already set would make the invariant explicit and catch regressions if KeyAttemptRecord changes shape in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant