Skip to content

fix(anthropic): preserve base URL path prefix for PDF document requests#325

Open
wangwllu wants to merge 2 commits into
steipete:mainfrom
wangwllu:fix/anthropic-document-base-url-prefix
Open

fix(anthropic): preserve base URL path prefix for PDF document requests#325
wangwllu wants to merge 2 commits into
steipete:mainfrom
wangwllu:fix/anthropic-document-base-url-prefix

Conversation

@wangwllu

@wangwllu wangwllu commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Fixes #324.

Problem

completeAnthropicDocument() (the PDF document path for Anthropic models) built its request URL with:

const url = new URL("/v1/messages", baseUrl);

new URL("/v1/messages", base) treats the path as root-relative, discarding any path prefix on a custom ANTHROPIC_BASE_URL. A gateway exposed at https://host/anthropic would lose /anthropic and POST to https://host/v1/messages → 400/404. The default https://api.anthropic.com (no prefix) is unaffected, and the streaming/text path (via pi-ai, which concatenates baseURL + path) was never affected — only this hand-rolled fetch.

Fix

Join onto the base instead, mirroring completeGoogleDocument in src/llm/providers/google.ts which already does this correctly:

const url = new URL(`${baseUrl.replace(/\/$/, "")}/v1/messages`);
  • https://api.anthropic.comhttps://api.anthropic.com/v1/messages (unchanged)
  • https://host/anthropichttps://host/anthropic/v1/messages (fixed)
  • https://host/anthropic/ → same (trailing slash stripped)

This keeps the existing contract shared with google.ts: ANTHROPIC_BASE_URL is a provider/gateway base (host + optional prefix) onto which the code appends /v1/messages, not a fully-versioned Messages endpoint. The default remains exactly correct.

Test

Adds tests/llm.anthropic-document-url.test.ts (mirroring the tests/llm.google-document.test.ts mockFetch style) asserting the captured request URL for:

  • a path-prefixed override (https://gateway.example/anthropic),
  • a trailing-slash override,
  • the default base (no override).

Verification

  • vitest run — new test (3) + llm.google-document (4) pass.
  • typecheck (tsgo) + oxlint clean.
  • Built dist and tested end-to-end against a live Anthropic-compatible gateway exposed under /anthropic: PDF summarize went from Anthropic API error (400) (before) to a real summary (after).

Scope

One-line change in src/llm/providers/anthropic.ts + a regression test. I deliberately kept this minimal rather than refactoring google.ts and anthropic.ts into a shared joinUrl() helper; happy to do that as a follow-up if you'd prefer.


Behavior proof (redacted, generic example values)

A local logging proxy stood in for the gateway and printed the path it received before forwarding. Same PDF input, same config; only the build differs:

# BEFORE (current main): new URL("/v1/messages", base) drops the /anthropic prefix
ANTHROPIC_BASE_URL=http://<proxy>/anthropic  summarize acme.pdf
  GATEWAY RECEIVED PATH: /v1/messages
  GATEWAY RESPONSE: 400
  CLI: "Anthropic API error (400)."

# AFTER (this PR): prefix preserved
ANTHROPIC_BASE_URL=http://<proxy>/anthropic  summarize acme.pdf
  GATEWAY RECEIVED PATH: /anthropic/v1/messages
  GATEWAY RESPONSE: 200
  CLI: "## Acme Corp Q3 2026 Summary ..." (real summary)

(Tested against @anthropic-ai/sdk@0.91.1 / pi-ai@0.80.2, the versions this PR ships with.)

On the /v1-base shape

This PR intentionally does not special-case a base already ending in /v1. The streaming/text path stores ANTHROPIC_BASE_URL verbatim as the model baseUrl (resolveAnthropicModel, src/llm/providers/models.ts) and lets @anthropic-ai/sdk append /v1/messages, so it produces the same URLs the document path now produces — including /v1/v1/messages for a /v1 base. Verified end-to-end against the real SDK:

ANTHROPIC_BASE_URL Anthropic SDK / text path document path (this PR)
https://api.anthropic.com …/v1/messages same
https://anthropic.example/v1 …/v1/v1/messages same
https://gateway.example/anthropic …/anthropic/v1/messages same
https://gateway.example/anthropic/ …/anthropic/v1/messages same

So /v1 was never a working end-to-end contract on the dominant Anthropic path; the document path only appeared to tolerate it because the old new URL(absolute, base) discarded the path. A document-only /v1-as-root heuristic would make PDF summaries diverge from text/streaming for the same base. The added test documents this parity (including the /v1 case) so it's explicit, not accidental. Happy to normalize the base once for both paths as a separate change if maintainers prefer supporting endpoint-style /v1 overrides.

completeAnthropicDocument built its request URL with
new URL("/v1/messages", baseUrl), which treats the path as
root-relative and discards any prefix on a custom ANTHROPIC_BASE_URL.
A gateway exposed at https://host/anthropic would lose /anthropic and
POST to https://host/v1/messages, returning 400/404.

Join onto the base instead, mirroring completeGoogleDocument in
google.ts. The default https://api.anthropic.com path is unchanged.

Adds a regression test asserting the request URL for a path-prefixed
override, a trailing-slash override, and the default base.
@clawsweeper

clawsweeper Bot commented Jun 24, 2026

Copy link
Copy Markdown

Codex review: needs maintainer review before merge. Reviewed June 24, 2026, 11:18 AM ET / 15:18 UTC.

Summary
The PR changes Anthropic PDF document request URL construction to append /v1/messages onto the configured base URL and adds URL regression tests for prefixed, trailing-slash, default, and already-versioned Anthropic bases.

Reproducibility: yes. source inspection plus a focused Node URL check confirms current main drops /anthropic from a path-prefixed Anthropic document base. I did not run a live gateway PDF request, so this is source-reproducible rather than fully runtime-reproduced here.

Review metrics: 2 noteworthy metrics.

  • Changed files: 2 files affected. The patch is small, but it touches provider request routing where upgrade behavior matters.
  • URL cases covered: 4 Anthropic document URL cases added. The tests document prefixed, trailing-slash, already-versioned, and default base behavior for the maintainer contract decision.

Root-cause cluster
Relationship: fixed_by_candidate
Canonical: #324
Summary: This PR is the active fix candidate for the linked Anthropic PDF path-prefix issue.

Members:

Proposal only: this assessment does not dispatch repair, suppress jobs, mutate sibling items, close, or merge anything.

Merge readiness
Overall: 🐚 platinum hermit
Proof: 🦞 diamond lobster
Patch quality: 🐚 platinum hermit
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • [P2] Get maintainer confirmation on whether SDK/text-path parity for already-versioned Anthropic bases is the intended contract, or normalize both paths before merge.

Risk before merge

  • [P1] Merging intentionally changes Anthropic PDF document routing for an already-versioned ANTHROPIC_BASE_URL from /v1/messages to /v1/v1/messages; the PR argues this matches the text path, but maintainers need to confirm that as the supported contract.

Maintainer options:

  1. Accept SDK parity
    Merge the PR as written if maintainers want ANTHROPIC_BASE_URL to be a provider or gateway base that always receives /v1/messages from Anthropic request code.
  2. Normalize both Anthropic paths
    Before merge, introduce shared Anthropic base URL normalization so text and PDF paths handle endpoint-style /v1 bases the same way without duplicating /v1.
  3. Pause for provider contract review
    Hold the PR if maintainers want a broader provider base URL contract decision before changing PDF-only upgrade behavior.

Next step before merge

  • [P2] Maintainer review should decide the Anthropic base URL contract before merge; there is no separate narrow automated fix that is safe without that product/API choice.

Security
Cleared: The diff only changes Anthropic request URL construction and adds tests; it does not alter dependencies, CI, secrets, publishing, or downloaded code paths.

Review details

Best possible solution:

Preserve the prefixed-gateway PDF fix while making the Anthropic base URL contract explicit: merge as written for SDK/text-path parity, or normalize already-versioned bases consistently across both text and document paths before merge.

Do we have a high-confidence way to reproduce the issue?

Yes: source inspection plus a focused Node URL check confirms current main drops /anthropic from a path-prefixed Anthropic document base. I did not run a live gateway PDF request, so this is source-reproducible rather than fully runtime-reproduced here.

Is this the best way to solve the issue?

Mostly yes: the changed line is the narrow right place for the path-prefix bug and matches the existing Google document join pattern. The remaining question is whether SDK/text-path parity for already-versioned Anthropic bases is the contract maintainers want to ship.

AGENTS.md: found and applied where relevant.

Codex review notes: model internal, reasoning high; reviewed against 75c8fb594629.

Label changes

Label justifications:

  • P2: This is a provider-specific PDF request routing bug fix with limited blast radius to Anthropic-compatible custom gateways.
  • merge-risk: 🚨 compatibility: The PR intentionally changes how already-versioned Anthropic PDF base URL overrides resolve during upgrade.
  • rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🦞 diamond lobster and patch quality is 🐚 platinum hermit.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (terminal): The PR body includes redacted copied terminal/gateway output showing the before path, after path, HTTP result, and CLI summary result for the live PDF flow.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes redacted copied terminal/gateway output showing the before path, after path, HTTP result, and CLI summary result for the live PDF flow.
Evidence reviewed

What I checked:

  • Repository policy read: The target AGENTS.md was read fully; its read-only checkout guidance and validation notes were applied to this review. (AGENTS.md:1, 75c8fb594629)
  • Current main drops path prefixes: Current main constructs the Anthropic document request URL with new URL("/v1/messages", baseUrl), which uses a root-relative path against the configured base. (src/llm/providers/anthropic.ts:186, 75c8fb594629)
  • Focused URL behavior check: Node resolves new URL("/v1/messages", "https://host/anthropic") to https://host/v1/messages, while the PR-style append preserves https://host/anthropic/v1/messages and turns an already-versioned base into /v1/v1/messages. (75c8fb594629)
  • Existing comparable provider pattern: The Google document path appends its API route onto a trailing-slash-normalized configured base URL, preserving base path prefixes. (src/llm/providers/google.ts:161, 75c8fb594629)
  • Text path keeps Anthropic override verbatim: resolveAnthropicModel stores the resolved ANTHROPIC_BASE_URL override directly as model.baseUrl, supporting the contributor's parity argument for text/streaming requests. (src/llm/providers/models.ts:270, 75c8fb594629)
  • PR diff and tests: The diff changes only completeAnthropicDocument URL construction and adds tests for path-prefixed, trailing-slash, already-versioned, and default Anthropic document base URLs. (tests/llm.anthropic-document-url.test.ts:30, de43f352e530)

Likely related people:

  • steipete: Recent Anthropic provider history includes custom-gateway support work, and local blame for the current provider implementation points to the v0.20.1 release/import authored by Peter Steinberger. (role: recent area contributor; confidence: high; commits: 70b030d24743, cfecb4aef0b8, 0589a2d6bd35; files: src/llm/providers/anthropic.ts, src/llm/providers/models.ts, src/llm/generate-text-document.ts)
  • wanglu241: Recent custom Anthropic gateway reasoning work touched the same provider file and custom ANTHROPIC_BASE_URL behavior. (role: adjacent custom-gateway contributor; confidence: medium; commits: 77bb0ddc0c83; files: src/llm/providers/anthropic.ts)
  • wangwllu: Prior merged Anthropic provider work touched thinking/custom-base routing, separate from this PR's authorship. (role: adjacent provider contributor; confidence: medium; commits: f495679c7a85; files: src/llm/providers/anthropic.ts, tests/llm.anthropic-reasoning.test.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P2 Normal priority bug or improvement with limited blast radius. merge-risk: 🚨 compatibility 🚨 Merging this PR could break existing users, config, migrations, defaults, or upgrades. labels Jun 24, 2026
… path

Adds a regression case asserting that an ANTHROPIC_BASE_URL already
ending in /v1 resolves to /v1/v1/messages, matching what the streaming
text path produces via @anthropic-ai/sdk (which concatenates
baseURL + /v1/messages). This documents that the document path keeps
parity with the SDK path rather than special-casing /v1 only for PDFs.
@wangwllu

Copy link
Copy Markdown
Contributor Author

Thanks for the review. Addressed both points:

  1. Proof — added redacted terminal/gateway output to the PR body showing the live repro: before, the document path posts to /v1/messages and the gateway returns 400; after, it posts to /anthropic/v1/messages and returns 200 with a real summary.

  2. The /v1 "regression" — I don't think this PR introduces one. The normal Anthropic text/streaming path doesn't use this hand-rolled fetch; it resolves the model via resolveAnthropicModel (src/llm/providers/models.ts), which keeps ANTHROPIC_BASE_URL verbatim as the model baseUrl, then goes through pi-ai → @anthropic-ai/sdk, whose buildURL concatenates baseURL + /v1/messages. I verified the effective URLs end-to-end against the real SDK (table in the PR body): a https://anthropic.example/v1 base already yields …/v1/v1/messages on the text path today. The document path only looked like it tolerated /v1 because the old new URL("/v1/messages", base) discarded the path entirely. After this PR the document path is byte-for-byte identical to the SDK/text path for all four base shapes.

So a document-only /v1-as-root heuristic would make PDF summaries diverge from text/streaming summaries for the same configured base, and wouldn't fix the text path anyway. I kept the fix minimal + added a test documenting the /v1 parity explicitly. If you'd rather support endpoint-style /v1 overrides, I'm happy to do that as a separate change that normalizes the base once for both paths.

@wangwllu

Copy link
Copy Markdown
Contributor Author

@clawsweeper re-review

@clawsweeper

clawsweeper Bot commented Jun 24, 2026

Copy link
Copy Markdown

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

@clawsweeper clawsweeper Bot added proof: sufficient Contributor real behavior proof is sufficient. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. and removed rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels Jun 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge-risk: 🚨 compatibility 🚨 Merging this PR could break existing users, config, migrations, defaults, or upgrades. P2 Normal priority bug or improvement with limited blast radius. proof: sufficient Contributor real behavior proof is sufficient. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PDF document path drops ANTHROPIC_BASE_URL path prefix (new URL("/v1/messages", base)) → 400 on sub-path gateways

1 participant