Skip to content

Conversation

@MohammadErfan-Jabbari
Copy link

Summary

Fixes streaming tool call detection for clients like Claude Code when using Gemini models via the antigravity translator.

The Problem:
When streaming responses with tool calls, the finish_reason was being incorrectly overwritten:

  1. Chunk 1 contains functionCall → should set finish_reason: "tool_calls"
  2. Chunk 2 contains finishReason: "STOP" + usage → overwrites to finish_reason: "stop"

Clients like Claude Code see the final "stop" and think the conversation ended normally, breaking the tool call flow.

Root Cause:
The hasFunctionCall variable was local to each chunk processing - no memory across chunks. When chunk 2 has finishReason: "STOP" but no functionCall, it overwrites the previous correct value.

The Fix:

  • Add SawToolCall bool to track tool calls across the entire stream
  • Add UpstreamFinishReason string to cache the finish reason
  • Only emit finish_reason on the final chunk (detected by presence of both finishReason AND usageMetadata)
  • Apply priority: tool_calls > max_tokens > stop

Changes

  • internal/translator/antigravity/openai/chat-completions/antigravity_openai_response.go: State tracking and final chunk logic
  • internal/translator/antigravity/openai/chat-completions/antigravity_openai_response_test.go: 5 unit tests (new file)

Test Plan

  • Unit tests pass (go test ./internal/translator/antigravity/openai/chat-completions/...)
  • Manual testing with Claude Code + Gemini models via local server
  • Verified tool calls complete successfully
  • Verified normal text responses still get finish_reason: "stop"
  • Verified MAX_TOKENS responses get finish_reason: "max_tokens"

Test Results

Scenario Expected Actual
Tool call stream "tool_calls" "tool_calls"
Normal text stream "stop" "stop"
MAX_TOKENS "max_tokens" "max_tokens"
Intermediate chunks null null

Related Issues

This addresses the streaming tool call issues reported when using Claude Code with Gemini models through CLIProxyAPI.

🤖 Generated with Claude Code

…chunks

When streaming responses with tool calls, the finish_reason was being
overwritten. The upstream sends functionCall in chunk 1, then
finishReason: STOP in chunk 2. The old code would set finish_reason
from every chunk, causing "tool_calls" to be overwritten by "stop".

This broke clients like Claude Code that rely on finish_reason to
detect when tool calls are complete.

Changes:
- Add SawToolCall bool to track tool calls across entire stream
- Add UpstreamFinishReason to cache the finish reason
- Only emit finish_reason on final chunk (has both finishReason + usage)
- Priority: tool_calls > max_tokens > stop

Includes 5 unit tests covering:
- Tool calls not overwritten by subsequent STOP
- Normal text gets "stop"
- MAX_TOKENS without tool calls gets "max_tokens"
- Tool calls take priority over MAX_TOKENS
- Intermediate chunks have no finish_reason

Fixes streaming tool call detection for Claude Code + Gemini models.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @MohammadErfan-Jabbari, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a critical bug in the antigravity translator that led to incorrect finish_reason reporting during streaming responses, particularly impacting tool call detection for clients like Claude Code when using Gemini models. The solution involves enhancing state management across streaming chunks and refining the logic for determining and emitting the final finish_reason with a defined priority, ensuring robust and accurate API translation.

Highlights

  • Corrected Finish Reason Logic: The pull request fixes an issue where the finish_reason was incorrectly overwritten during streaming responses, especially for tool calls, causing clients like Claude Code to misinterpret the stream's conclusion.
  • State Tracking Across Chunks: Introduced SawToolCall and UpstreamFinishReason variables to convertCliResponseToOpenAIChatParams to maintain state across streaming chunks, ensuring accurate finish_reason determination.
  • Delayed Finish Reason Emission: The finish_reason is now only emitted on the final streaming chunk (identified by the presence of both finishReason and usageMetadata), preventing premature or incorrect reporting.
  • Finish Reason Priority: Implemented a priority system for finish_reason: tool_calls takes precedence over max_tokens, which in turn takes precedence over stop.
  • Comprehensive Unit Tests: Added a new test file with 5 unit tests to cover various scenarios, including tool call preservation, normal text, max tokens, priority handling, and intermediate chunk behavior.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly fixes a bug where the finish_reason for tool calls was being overwritten in streaming responses. The approach of using state variables (SawToolCall, UpstreamFinishReason) to persist information across chunks is sound. The logic to determine the final finish_reason is only executed on the final chunk and correctly prioritizes tool_calls. The addition of comprehensive unit tests is excellent and thoroughly covers the fix and related edge cases. I've included a couple of suggestions to enhance code readability and maintainability.

Comment on lines 84 to 86
if finishReasonResult := gjson.GetBytes(rawJSON, "response.candidates.0.finishReason"); finishReasonResult.Exists() {
template, _ = sjson.Set(template, "choices.0.finish_reason", strings.ToLower(finishReasonResult.String()))
template, _ = sjson.Set(template, "choices.0.native_finish_reason", strings.ToLower(finishReasonResult.String()))
(*param).(*convertCliResponseToOpenAIChatParams).UpstreamFinishReason = strings.ToUpper(finishReasonResult.String())
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The repeated type assertion (*param).(*convertCliResponseToOpenAIChatParams) is used in several places (here, line 141, etc.), which can make the code harder to read. You've already extracted this into a params variable on line 194. It would be cleaner to perform the type assertion once at the beginning of the function, right after the *param nil check, and reuse the resulting variable throughout the function. This would improve readability and reduce redundancy.

Comment on lines +10 to +103
func TestFinishReasonToolCallsNotOverwritten(t *testing.T) {
ctx := context.Background()
var param any

// Chunk 1: Contains functionCall - should set SawToolCall = true
chunk1 := []byte(`{"response":{"candidates":[{"content":{"parts":[{"functionCall":{"name":"list_files","args":{"path":"."}}}]}}]}}`)
result1 := ConvertAntigravityResponseToOpenAI(ctx, "model", nil, nil, chunk1, &param)

// Verify chunk1 has no finish_reason (null)
if len(result1) != 1 {
t.Fatalf("Expected 1 result from chunk1, got %d", len(result1))
}
fr1 := gjson.Get(result1[0], "choices.0.finish_reason")
if fr1.Exists() && fr1.String() != "" && fr1.Type.String() != "Null" {
t.Errorf("Expected finish_reason to be null in chunk1, got: %v", fr1.String())
}

// Chunk 2: Contains finishReason STOP + usage (final chunk, no functionCall)
// This simulates what the upstream sends AFTER the tool call chunk
chunk2 := []byte(`{"response":{"candidates":[{"finishReason":"STOP"}],"usageMetadata":{"promptTokenCount":10,"candidatesTokenCount":20,"totalTokenCount":30}}}`)
result2 := ConvertAntigravityResponseToOpenAI(ctx, "model", nil, nil, chunk2, &param)

// Verify chunk2 has finish_reason: "tool_calls" (not "stop")
if len(result2) != 1 {
t.Fatalf("Expected 1 result from chunk2, got %d", len(result2))
}
fr2 := gjson.Get(result2[0], "choices.0.finish_reason").String()
if fr2 != "tool_calls" {
t.Errorf("Expected finish_reason 'tool_calls', got: %s", fr2)
}

// Verify native_finish_reason is lowercase upstream value
nfr2 := gjson.Get(result2[0], "choices.0.native_finish_reason").String()
if nfr2 != "stop" {
t.Errorf("Expected native_finish_reason 'stop', got: %s", nfr2)
}
}

func TestFinishReasonStopForNormalText(t *testing.T) {
ctx := context.Background()
var param any

// Chunk 1: Text content only
chunk1 := []byte(`{"response":{"candidates":[{"content":{"parts":[{"text":"Hello world"}]}}]}}`)
ConvertAntigravityResponseToOpenAI(ctx, "model", nil, nil, chunk1, &param)

// Chunk 2: Final chunk with STOP
chunk2 := []byte(`{"response":{"candidates":[{"finishReason":"STOP"}],"usageMetadata":{"promptTokenCount":10,"candidatesTokenCount":5,"totalTokenCount":15}}}`)
result2 := ConvertAntigravityResponseToOpenAI(ctx, "model", nil, nil, chunk2, &param)

// Verify finish_reason is "stop" (no tool calls were made)
fr := gjson.Get(result2[0], "choices.0.finish_reason").String()
if fr != "stop" {
t.Errorf("Expected finish_reason 'stop', got: %s", fr)
}
}

func TestFinishReasonMaxTokens(t *testing.T) {
ctx := context.Background()
var param any

// Chunk 1: Text content
chunk1 := []byte(`{"response":{"candidates":[{"content":{"parts":[{"text":"Hello"}]}}]}}`)
ConvertAntigravityResponseToOpenAI(ctx, "model", nil, nil, chunk1, &param)

// Chunk 2: Final chunk with MAX_TOKENS
chunk2 := []byte(`{"response":{"candidates":[{"finishReason":"MAX_TOKENS"}],"usageMetadata":{"promptTokenCount":10,"candidatesTokenCount":100,"totalTokenCount":110}}}`)
result2 := ConvertAntigravityResponseToOpenAI(ctx, "model", nil, nil, chunk2, &param)

// Verify finish_reason is "max_tokens"
fr := gjson.Get(result2[0], "choices.0.finish_reason").String()
if fr != "max_tokens" {
t.Errorf("Expected finish_reason 'max_tokens', got: %s", fr)
}
}

func TestToolCallTakesPriorityOverMaxTokens(t *testing.T) {
ctx := context.Background()
var param any

// Chunk 1: Contains functionCall
chunk1 := []byte(`{"response":{"candidates":[{"content":{"parts":[{"functionCall":{"name":"test","args":{}}}]}}]}}`)
ConvertAntigravityResponseToOpenAI(ctx, "model", nil, nil, chunk1, &param)

// Chunk 2: Final chunk with MAX_TOKENS (but we had a tool call, so tool_calls should win)
chunk2 := []byte(`{"response":{"candidates":[{"finishReason":"MAX_TOKENS"}],"usageMetadata":{"promptTokenCount":10,"candidatesTokenCount":100,"totalTokenCount":110}}}`)
result2 := ConvertAntigravityResponseToOpenAI(ctx, "model", nil, nil, chunk2, &param)

// Verify finish_reason is "tool_calls" (takes priority over max_tokens)
fr := gjson.Get(result2[0], "choices.0.finish_reason").String()
if fr != "tool_calls" {
t.Errorf("Expected finish_reason 'tool_calls', got: %s", fr)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

These tests are great and cover all the necessary scenarios. To improve maintainability and reduce code duplication, consider refactoring them into a single table-driven test. Each entry in the table could represent a test case with a name, a sequence of input chunks, and the expected final finish_reason and native_finish_reason. This would make the test file more concise and adding new test cases in the future would be easier.

Ptah-CT pushed a commit to Ptah-CT/CLIProxyAPIPlusPlus that referenced this pull request Jan 8, 2026
…chunks

Cherry-picked from upstream PR router-for-me#874
Fixes incorrect finish_reason reporting during streaming for tool calls with Gemini
piexian added a commit to piexian/CLIProxyAPI that referenced this pull request Jan 20, 2026
合并内容:
- router-for-me#874: 修复 antigravity 流式响应中 finish_reason tool_calls 被覆盖的问题
- feat/usage-statistics-persistence: 新增 Kiro、GitHub Copilot 支持及用量持久化功能

额外修复:
- 补充 auth_files.go 缺失的 'encoding/hex' 导入
- Dockerfile 预创建 usage_stats.json 并设置写入权限
- docker-compose.yml 添加 usage_stats.json 卷挂载

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant