QVAC-19778 fix[api]: finish_reason=length + unified token accounting across chat-category routes by lauripiisang · Pull Request #2477 · tetherto/qvac

lauripiisang · 2026-06-08T09:02:51Z

🎯 What problem does this PR solve?

finish_reason was hardcoded to "stop" on all chat-category routes, even when generation was cut off by max_tokens (should be "length").
completion_tokens was counted three different ways: stats helper in /v1/chat/completions, whitespace split in /v1/completions (blocking), SSE-event counter in /v1/completions (streaming). The Responses route had its own inline fallback. All four could diverge for the same generation.
/v1/videos* endpoints were missing from both the in-repo and marketing-site docs.

📝 How does it solve it?

drainCompletion(result, onToken?) in adapters/openai/completion-result.ts consumes the SDK result.events stream once and returns { text, toolCalls, stats, stopReason, completionTokens, finishReason }.
Finish-reason precedence: tool_calls > length > stop. All three chat routes (chat, completions, responses) now use it.
Responses API: stopReason === "length" maps to status: "incomplete" + incomplete_details.reason: "max_output_tokens"; the streaming path emits response.incomplete instead of response.completed.
Docs: added /v1/videos* to packages/cli/docs/serve-openai.md and docs/website/content/docs/cli/http-server/index.mdx.

The two other QVAC-19778 follow-up items needed no code change — verified against the current tree:

Streaming cancellation: cancel bridge re-homed by the Fastify rewrite (QVAC-19179 feat[bc]: rewrite serve HTTP layer on Fastify + Zod #2306); every inference route binds req.on('close') → cancel({ requestId }).
Route preamble DRY: resolveAndCheckModel() / requireModel() (added in QVAC-19179 feat[bc]: rewrite serve HTTP layer on Fastify + Zod #2306) already centralise the preamble across all routes.

🧪 How was it tested?

New test/completion-result.test.ts: 20 cases covering eos, length, stopSequence, tool_calls finish-reason branches; stats.generatedTokens vs whitespace-fallback token counting; streaming onToken callback ordering.
Extended test/responses.test.ts and test/responses-streaming.test.ts: assert status: "incomplete" + incomplete_details on max_tokens truncation.
New e2e cases in test/e2e.bats against a real loaded LLM: max_tokens: 1 with a long-output prompt, asserting finish_reason == "length" and usage.completion_tokens == 1 for both blocking and streaming paths.
All 374 CLI unit tests pass (bun run test:unit). Lint and typecheck clean (bun run lint).
All 134 e2e tests pass (npm run test:e2e, Qwen3-600M). Six max_tokens values bumped from 8–24 → 512 in e2e.bats for tests that asserted finish_reason: "stop": these were too tight for Qwen3's chain-of-thought <think> preamble after the fix correctly started returning "length" on budget exhaustion.

🔌 API Changes

// finish_reason now reflects truncation
{ finish_reason: "length" }  // when max_tokens cuts generation short (was always "stop")

// Responses API — length truncation (blocking)
{ status: "incomplete", incomplete_details: { reason: "max_output_tokens" } }
// Responses API — length truncation (streaming)
{ type: "response.incomplete" }  // was "response.completed"

// completion_tokens: now consistently uses stats.generatedTokens (with whitespace-split fallback)
// across /v1/chat/completions, /v1/completions, and /v1/responses

🔗 Dependencies

Depends on #2484 — SDK must emit stopReason: "length" before the two known-failing e2e tests (94, 95) pass.

github-actions · 2026-06-08T09:07:10Z

Tier-based Approval Status

**PR Tier:** TIER1

**Current Status:** ❌ PENDING

**Requirements:**
- 1 Team Member approval ❌ (0/1)
- 1 Team Lead OR Management approval ❌ (0/1)



---
*This comment is automatically updated when reviews change.*

Introduce drainCompletion (completion-result.ts) to consume the SDK event stream once: derives text, stats, stopReason, and finish_reason (tool_calls > length > stop). All three chat-category routes use it; per-route token-counting variants removed. Responses API: length truncation maps to status "incomplete" + incomplete_details.reason "max_output_tokens". Two e2e tests for finish_reason=length are known-failing pending SDK PR #2484 (stopReason="length" not yet emitted by the plugin). Six e2e max_tokens bumped 8-24 → 512 for Qwen3-600M <think> compat. Docs: /v1/videos* added to serve-openai.md and http-server mdx. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

messages[].content now accepts OpenAI array format (text/image_url/ input_audio/file parts). Non-text parts are silently dropped; text parts are concatenated. Fixes 400 errors from Cline and Open WebUI. audio transcription/translation temperature was z.string() — OpenAI spec requires number. Fixes 400 for any client sending temperature:0.0. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

lauripiisang requested review from a team as code owners June 8, 2026 09:02

lauripiisang marked this pull request as draft June 8, 2026 09:06

This comment was marked as outdated.

Sign in to view

lauripiisang force-pushed the feat/QVAC-19778-finish-reason-tokens branch from 2ac8659 to e71da37 Compare June 8, 2026 09:12

lauripiisang changed the title ~~QVAC-19778 feat[api]: audio encoding, finish_reason=length, and token accounting for chat-category routes~~ QVAC-19778 fix[api]: finish_reason=length + unified token accounting across chat-category routes Jun 8, 2026

lauripiisang added the tier1 label Jun 8, 2026

lauripiisang force-pushed the feat/QVAC-19778-finish-reason-tokens branch from e71da37 to f4d049c Compare June 8, 2026 09:28

lauripiisang force-pushed the feat/QVAC-19778-finish-reason-tokens branch from 34b780a to 5a79aa6 Compare June 8, 2026 16:25

lauripiisang marked this pull request as ready for review June 8, 2026 16:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QVAC-19778 fix[api]: finish_reason=length + unified token accounting across chat-category routes#2477

QVAC-19778 fix[api]: finish_reason=length + unified token accounting across chat-category routes#2477
lauripiisang wants to merge 2 commits into
mainfrom
feat/QVAC-19778-finish-reason-tokens

lauripiisang commented Jun 8, 2026 •

edited

Loading

Uh oh!

This comment was marked as outdated.

github-actions Bot commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lauripiisang commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🎯 What problem does this PR solve?

📝 How does it solve it?

🧪 How was it tested?

🔌 API Changes

🔗 Dependencies

Uh oh!

This comment was marked as outdated.

github-actions Bot commented Jun 8, 2026

Tier-based Approval Status

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

lauripiisang commented Jun 8, 2026 •

edited

Loading