Skip to content

QVAC-20424 feat[api]: img2vid for POST /v1/videos#2481

Draft
lauripiisang wants to merge 9 commits into
mainfrom
worktree-qvac-20424
Draft

QVAC-20424 feat[api]: img2vid for POST /v1/videos#2481
lauripiisang wants to merge 9 commits into
mainfrom
worktree-qvac-20424

Conversation

@lauripiisang

@lauripiisang lauripiisang commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

🎯 What problem does this PR solve?

  • POST /v1/videos only supported text-to-video; callers had no way to animate a still image via the CLI OpenAI server.

📝 How does it solve it?

  • Accepts multipart/form-data on POST /v1/videos alongside the existing JSON body (no breaking change — JSON txt2vid continues to work unchanged).
  • Mode is inferred from the presence of an init_image file field: provided → img2vid, absent → txt2vid.
  • strength (0–1) controls denoise intensity for img2vid; coerced from string for multipart compatibility.
  • Invalid strength values return 400 invalid_strength.
  • Updated packages/cli/docs/serve-openai.md and docs/website/content/docs/ai-capabilities/video-generation.mdx to document both modes and the I2V model family.

🧪 How was it tested?

  • Unit tests cover schema acceptance/rejection of init_image and strength, extractVideoCreateParams mode selection, strength coercion and range validation (371/371 pass).
  • TypeScript compilation verified with the dev SDK build from the PR QVAC-19845 feat[bc|api]: add img2vid (image-to-video) support to video generation in SDK #2436 merge run. The dev build was installed locally as a package alias (not committed) — npm install @qvac/sdk@npm:@tetherto/sdk-mono@0.12.2-tmp.runid-27142936775 — and npx tsc --noEmit passed clean. package.json was restored before committing.
  • E2e / functional tests are not included — actual img2vid execution requires a model loaded with clipVisionModelSrc which needs hardware. These should be added as a follow-up.

🔌 API Changes

POST /v1/videos — new optional fields (multipart only for init_image):

// txt2vid (unchanged):
fetch('/v1/videos', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ model: 'wan', prompt: 'a cat surfing' })
})

// img2vid (new):
const form = new FormData()
form.append('model', 'wan')
form.append('prompt', 'the subject slowly turns and smiles')
form.append('init_image', imageBlob, 'frame.png')
form.append('strength', '0.85')
fetch('/v1/videos', { method: 'POST', body: form })

⚠️ Merge blocker

Do not merge until @qvac/sdk is published with img2vid support (from PR #2436). The CLI routes VideoClientParams at runtime through the SDK — without the released types and execution pipeline, img2vid requests will fail at the SDK call site. Once a public SDK release including those changes is available, update @qvac/sdk in packages/cli/package.json to that version and re-run bun run build before merging.

lauripiisang and others added 9 commits June 4, 2026 19:29
…ST /v1/audio/speech

Adds GET /v1/audio/voices and /v1/audio/models discovery endpoints (QVAC-17706
Open WebUI gap) alongside the encoding feature.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…iption, discovery endpoint smoke tests

- Correct audio/x-pcm → audio/L16; rate=<sr>; channels=1 in OpenAPI description
- Update serve-openai.md: response_format table, headers table, error table, route index, add /v1/audio/voices and /v1/audio/models sections
- Remove stale "wav + pcm only" caveat from README.md
- Add explicit BATS smoke tests for GET /v1/audio/models and GET /v1/audio/voices

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant