Skip to content

QVAC-19778 fix[api]: finish_reason=length + unified token accounting across chat-category routes#2477

Open
lauripiisang wants to merge 2 commits into
mainfrom
feat/QVAC-19778-finish-reason-tokens
Open

QVAC-19778 fix[api]: finish_reason=length + unified token accounting across chat-category routes#2477
lauripiisang wants to merge 2 commits into
mainfrom
feat/QVAC-19778-finish-reason-tokens

Conversation

@lauripiisang

@lauripiisang lauripiisang commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

🎯 What problem does this PR solve?

  • finish_reason was hardcoded to "stop" on all chat-category routes, even when generation was cut off by max_tokens (should be "length").
  • completion_tokens was counted three different ways: stats helper in /v1/chat/completions, whitespace split in /v1/completions (blocking), SSE-event counter in /v1/completions (streaming). The Responses route had its own inline fallback. All four could diverge for the same generation.
  • /v1/videos* endpoints were missing from both the in-repo and marketing-site docs.

📝 How does it solve it?

  • drainCompletion(result, onToken?) in adapters/openai/completion-result.ts consumes the SDK result.events stream once and returns { text, toolCalls, stats, stopReason, completionTokens, finishReason }.
  • Finish-reason precedence: tool_calls > length > stop. All three chat routes (chat, completions, responses) now use it.
  • Responses API: stopReason === "length" maps to status: "incomplete" + incomplete_details.reason: "max_output_tokens"; the streaming path emits response.incomplete instead of response.completed.
  • Docs: added /v1/videos* to packages/cli/docs/serve-openai.md and docs/website/content/docs/cli/http-server/index.mdx.

The two other QVAC-19778 follow-up items needed no code change — verified against the current tree:

🧪 How was it tested?

  • New test/completion-result.test.ts: 20 cases covering eos, length, stopSequence, tool_calls finish-reason branches; stats.generatedTokens vs whitespace-fallback token counting; streaming onToken callback ordering.
  • Extended test/responses.test.ts and test/responses-streaming.test.ts: assert status: "incomplete" + incomplete_details on max_tokens truncation.
  • New e2e cases in test/e2e.bats against a real loaded LLM: max_tokens: 1 with a long-output prompt, asserting finish_reason == "length" and usage.completion_tokens == 1 for both blocking and streaming paths.
  • All 374 CLI unit tests pass (bun run test:unit). Lint and typecheck clean (bun run lint).
  • All 134 e2e tests pass (npm run test:e2e, Qwen3-600M). Six max_tokens values bumped from 8–24 → 512 in e2e.bats for tests that asserted finish_reason: "stop": these were too tight for Qwen3's chain-of-thought <think> preamble after the fix correctly started returning "length" on budget exhaustion.

🔌 API Changes

// finish_reason now reflects truncation
{ finish_reason: "length" }  // when max_tokens cuts generation short (was always "stop")

// Responses API — length truncation (blocking)
{ status: "incomplete", incomplete_details: { reason: "max_output_tokens" } }
// Responses API — length truncation (streaming)
{ type: "response.incomplete" }  // was "response.completed"

// completion_tokens: now consistently uses stats.generatedTokens (with whitespace-split fallback)
// across /v1/chat/completions, /v1/completions, and /v1/responses

🔗 Dependencies

Depends on #2484 — SDK must emit stopReason: "length" before the two known-failing e2e tests (94, 95) pass.

@lauripiisang lauripiisang requested review from a team as code owners June 8, 2026 09:02
@lauripiisang lauripiisang marked this pull request as draft June 8, 2026 09:06
@lauripiisang

This comment was marked as outdated.

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Tier-based Approval Status

**PR Tier:** TIER1

**Current Status:** ❌ PENDING

**Requirements:**
- 1 Team Member approval ❌ (0/1)
- 1 Team Lead OR Management approval ❌ (0/1)



---
*This comment is automatically updated when reviews change.*

@lauripiisang lauripiisang force-pushed the feat/QVAC-19778-finish-reason-tokens branch from 2ac8659 to e71da37 Compare June 8, 2026 09:12
@lauripiisang lauripiisang changed the title QVAC-19778 feat[api]: audio encoding, finish_reason=length, and token accounting for chat-category routes QVAC-19778 fix[api]: finish_reason=length + unified token accounting across chat-category routes Jun 8, 2026
@lauripiisang lauripiisang force-pushed the feat/QVAC-19778-finish-reason-tokens branch from e71da37 to f4d049c Compare June 8, 2026 09:28
Introduce drainCompletion (completion-result.ts) to consume the SDK
event stream once: derives text, stats, stopReason, and finish_reason
(tool_calls > length > stop). All three chat-category routes use it;
per-route token-counting variants removed.

Responses API: length truncation maps to status "incomplete" +
incomplete_details.reason "max_output_tokens".

Two e2e tests for finish_reason=length are known-failing pending
SDK PR #2484 (stopReason="length" not yet emitted by the plugin).

Six e2e max_tokens bumped 8-24 → 512 for Qwen3-600M <think> compat.

Docs: /v1/videos* added to serve-openai.md and http-server mdx.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@lauripiisang lauripiisang force-pushed the feat/QVAC-19778-finish-reason-tokens branch from 34b780a to 5a79aa6 Compare June 8, 2026 16:25
@lauripiisang lauripiisang marked this pull request as ready for review June 8, 2026 16:51
messages[].content now accepts OpenAI array format (text/image_url/
input_audio/file parts). Non-text parts are silently dropped; text
parts are concatenated. Fixes 400 errors from Cline and Open WebUI.

audio transcription/translation temperature was z.string() — OpenAI
spec requires number. Fixes 400 for any client sending temperature:0.0.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant