fix: avoid Groq token-limit 413 for small prompts by CarlosAlexandredevv · Pull Request #449 · Gitlawb/openclaude

CarlosAlexandredevv · 2026-04-06T18:59:30Z

Summary

Fixes Groq requests that failed with HTTP 413 due to token budget overflow (TPM), even for short prompts.
Adds Groq-specific payload compaction by estimated prompt tokens.
Caps max_tokens dynamically for Groq based on estimated prompt size.
Improves 413 error mapping: when provider response indicates token/rate-limit overflow, show a rate-limit guidance message instead of generic "Request too large".

Why

A short prompt like oi could still fail with 413 on Groq because the outgoing request (tools + context + completion budget) exceeded token limits, not necessarily byte-size upload limits.

Validation

bun test src/services/api/openaiShim.test.ts
bun run build
bun run start --print "oi" (reproduced previously failing path; now succeeds)

Closes #337

Copilot

Pull request overview

This PR addresses Groq HTTP 413 failures caused by token budget overflow (TPM) by shrinking Groq-bound OpenAI-compatible requests and dynamically reducing completion token budgets, plus improving how certain 413 responses are surfaced to users.

Changes:

Add Groq detection and request payload compaction (strip tool schema descriptions, trim messages, and disable tools as needed).
Dynamically clamp Groq max_tokens based on an estimated prompt token size.
Improve 413 error mapping to show token/rate-limit guidance when the provider response indicates token budget overflow.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
src/services/api/openaiShim.ts	Adds Groq-specific payload compaction and `max_tokens` clamping before sending requests.
src/services/api/openaiShim.test.ts	Adds tests asserting Groq payload compaction and `max_tokens` clamping behavior.
src/services/api/errors.ts	Improves 413 handling to map token/rate-limit-style 413s to a more helpful message.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-06T19:05:26Z

src/services/api/openaiShim.ts

+    while (promptTokens > GROQ_TARGET_PROMPT_TOKENS) {
+      const firstNonSystemIndex = messages.findIndex(
+        (message, index) =>
+          message.role !== 'system' && index < messages.length - 1,
+      )
+      if (firstNonSystemIndex === -1) break
+
+      messages.splice(firstNonSystemIndex, 1)
+      body.messages = messages


The message-trimming loop removes the first non-system message without accounting for tool-call/message pairing (assistant tool_calls ↔ subsequent role:'tool' messages). This can leave orphaned tool results or tool calls in body.messages, which OpenAI-compatible APIs typically reject (400) and would negate the intended 413 mitigation. Consider trimming whole “turn” segments and preserving tool_call/tool_result adjacency (e.g., when removing an assistant message with tool_calls, also remove the following tool messages for those ids; or only remove complete user+assistant(+tool) groups from the front).

Copilot · 2026-04-06T19:05:27Z

src/services/api/openaiShim.ts

+function estimateJsonBytes(value: unknown): number {
+  return new TextEncoder().encode(JSON.stringify(value)).length
+}
+
+function estimateGroqPromptTokens(value: unknown): number {


estimateJsonBytes allocates a new TextEncoder and stringifies the entire payload on every call, and compactPayloadForGroq calls this repeatedly (including inside a loop). For large payloads this can become a noticeable CPU/memory hotspot. Consider reusing a module-scoped TextEncoder and reducing full-body JSON.stringify calls (e.g., estimate only the prompt-bearing fields or cache the serialized form between compaction steps).

fix: handle Groq token-limit 413 requests

5a00fdc

Copilot AI review requested due to automatic review settings April 6, 2026 18:59

Copilot started reviewing on behalf of CarlosAlexandredevv April 6, 2026 19:00 View session

Copilot AI reviewed Apr 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: avoid Groq token-limit 413 for small prompts#449

fix: avoid Groq token-limit 413 for small prompts#449
CarlosAlexandredevv wants to merge 1 commit intoGitlawb:mainfrom
CarlosAlexandredevv:fix/groq-token-limit-337

CarlosAlexandredevv commented Apr 6, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 6, 2026

Uh oh!

Copilot AI Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

CarlosAlexandredevv commented Apr 6, 2026

Summary

Why

Validation

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants