feat(caching): implement Claude prompt caching with multi-turn support #1295

SchneeMart · 2026-01-29T00:00:35Z

Add ensureCacheControl() to auto-inject cache breakpoints
Cache tools (last tool), system (last element), and messages (2nd-to-last user turn)
Add prompt-caching-2024-07-31 beta header
Include verification test for caching logic

Enables up to 90% cost reduction on cached tokens.

Written with Claude Opus 4.5

- Add ensureCacheControl() to auto-inject cache breakpoints - Cache tools (last tool), system (last element), and messages (2nd-to-last user turn) - Add prompt-caching-2024-07-31 beta header - Include verification test for caching logic Enables up to 90% cost reduction on cached tokens. Co-Authored-By: Claude Opus 4.5 <[email protected]>

gemini-code-assist · 2026-01-29T00:00:50Z

Summary of Changes

Hello @SchneeMart, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant optimization for Claude API interactions by implementing automatic prompt caching. It intelligently injects cache_control directives into request payloads for tools, system prompts, and multi-turn messages, ensuring that frequently used prompt components are cached. This enhancement aims to drastically reduce API costs and improve efficiency for applications interacting with Claude models, particularly those with repetitive or conversational patterns.

Highlights

Automatic Cache Control Injection: Implemented an ensureCacheControl() function to automatically inject cache_control breakpoints into Claude API requests, optimizing prompt caching.
Multi-Turn Caching Support: Configured caching for the last tool in the tools array, the last element in the system array, and the second-to-last user turn in messages to support multi-turn conversations effectively.
Beta Header Activation: Added the prompt-caching-2024-07-31 beta header to enable the new caching feature within the Claude API calls.
Comprehensive Testing: Included a new verification test file (caching_verify_test.go) with extensive unit tests to validate the ensureCacheControl logic across various prompt structures and scenarios.
Cost Reduction: This feature is projected to enable up to a 90% cost reduction on cached tokens by leveraging Anthropic's caching mechanisms.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a valuable prompt caching feature for Claude, which will help reduce costs. The implementation correctly identifies the parts of the prompt to cache and adds comprehensive verification tests.

I've identified a critical bug in the error handling when modifying the JSON payload, which could lead to corrupted data being sent to the API. I've also provided several suggestions to enhance code clarity and maintainability by simplifying some of the JSON parsing logic. Additionally, the new test file contains some elements that should be cleaned up, and its language should be aligned with the rest of the codebase for consistency.

gemini-code-assist · 2026-01-29T00:04:56Z

internal/runtime/executor/claude_executor.go

+	if content.IsArray() {
+		// Add cache_control to the last content block of this message
+		contentCount := int(content.Get("#").Int())
+		if contentCount > 0 {
+			cacheControlPath := fmt.Sprintf("messages.%d.content.%d.cache_control", secondToLastUserIdx, contentCount-1)
+			var err error
+			payload, err = sjson.SetBytes(payload, cacheControlPath, map[string]string{"type": "ephemeral"})
+			if err != nil {
+				log.Warnf("failed to inject cache_control into messages: %v", err)
+			}
+		}
+	} else if content.Type == gjson.String {
+		// Convert string content to array with cache_control
+		text := content.String()
+		newContent := []map[string]interface{}{
+			{
+				"type": "text",
+				"text": text,
+				"cache_control": map[string]string{
+					"type": "ephemeral",
+				},
+			},
+		}
+		var err error
+		payload, err = sjson.SetBytes(payload, contentPath, newContent)
+		if err != nil {
+			log.Warnf("failed to inject cache_control into message string content: %v", err)
+		}
+	}


There's a potential bug in how errors from sjson.SetBytes are handled. According to the sjson documentation, if an error occurs, the destination byte slice's contents are undefined. The current code logs the error but continues execution, potentially using a corrupted payload. This could lead to malformed requests being sent to the upstream API.

To fix this, you should check the error and return the original, unmodified payload immediately if an error occurs. This ensures that a failure during this optimization step doesn't break the request.

if content.IsArray() { // Add cache_control to the last content block of this message contentCount := int(content.Get("#").Int()) if contentCount > 0 { cacheControlPath := fmt.Sprintf("messages.%d.content.%d.cache_control", secondToLastUserIdx, contentCount-1) newPayload, err := sjson.SetBytes(payload, cacheControlPath, map[string]string{"type": "ephemeral"}) if err != nil { log.Warnf("failed to inject cache_control into messages: %v", err) return payload } payload = newPayload } } else if content.Type == gjson.String { // Convert string content to array with cache_control text := content.String() newContent := []map[string]interface{}{ { "type": "text", "text": text, "cache_control": map[string]string{ "type": "ephemeral", }, }, } newPayload, err := sjson.SetBytes(payload, contentPath, newContent) if err != nil { log.Warnf("failed to inject cache_control into message string content: %v", err) return payload } payload = newPayload }

gemini-code-assist · 2026-01-29T00:04:56Z

internal/runtime/executor/claude_executor.go

+	// Add cache_control to the last tool
+	lastToolPath := fmt.Sprintf("tools.%d.cache_control", toolCount-1)
+	var err error
+	payload, err = sjson.SetBytes(payload, lastToolPath, map[string]string{"type": "ephemeral"})
+	if err != nil {
+		log.Warnf("failed to inject cache_control into tools array: %v", err)
+	}
+
+	return payload


There's an issue with error handling for sjson.SetBytes. If this call fails, the payload variable becomes undefined, but the function proceeds to return it. This can lead to sending corrupted data upstream.

You should handle the error by returning the original payload immediately, preventing the undefined payload from being used.

// Add cache_control to the last tool lastToolPath := fmt.Sprintf("tools.%d.cache_control", toolCount-1) newPayload, err := sjson.SetBytes(payload, lastToolPath, map[string]string{"type": "ephemeral"}) if err != nil { log.Warnf("failed to inject cache_control into tools array: %v", err) return payload } return newPayload

gemini-code-assist · 2026-01-29T00:04:56Z

internal/runtime/executor/claude_executor.go

+		// Add cache_control to the last system element
+		lastSystemPath := fmt.Sprintf("system.%d.cache_control", count-1)
+		var err error
+		payload, err = sjson.SetBytes(payload, lastSystemPath, map[string]string{"type": "ephemeral"})
+		if err != nil {
+			log.Warnf("failed to inject cache_control into system array: %v", err)
+		}
+	} else if system.Type == gjson.String {
+		// Convert string system prompt to array with cache_control
+		// "system": "text" -> "system": [{"type": "text", "text": "text", "cache_control": {"type": "ephemeral"}}]
+		text := system.String()
+		newSystem := []map[string]interface{}{
+			{
+				"type": "text",
+				"text": text,
+				"cache_control": map[string]string{
+					"type": "ephemeral",
+				},
+			},
+		}
+		var err error
+		payload, err = sjson.SetBytes(payload, "system", newSystem)
+		if err != nil {
+			log.Warnf("failed to inject cache_control into system string: %v", err)
+		}
+	}
+
+	return payload


This function also has the same error handling issue with sjson.SetBytes as noted in other comments. If sjson.SetBytes fails, the payload becomes undefined and is returned at the end of the function. This could cause malformed requests.

Please ensure that on error, the original, unmodified payload is returned.

// Add cache_control to the last system element lastSystemPath := fmt.Sprintf("system.%d.cache_control", count-1) newPayload, err := sjson.SetBytes(payload, lastSystemPath, map[string]string{"type": "ephemeral"}) if err != nil { log.Warnf("failed to inject cache_control into system array: %v", err) return payload } payload = newPayload } else if system.Type == gjson.String { // Convert string system prompt to array with cache_control // "system": "text" -> "system": [{"type": "text", "text": "text", "cache_control": {"type": "ephemeral"}}] text := system.String() newSystem := []map[string]interface{}{ { "type": "text", "text": text, "cache_control": map[string]string{ "type": "ephemeral", }, }, } newPayload, err := sjson.SetBytes(payload, "system", newSystem) if err != nil { log.Warnf("failed to inject cache_control into system string: %v", err) return payload } payload = newPayload } return payload

gemini-code-assist · 2026-01-29T00:04:56Z

internal/runtime/executor/caching_verify_test.go

+func TestEnsureCacheControl(t *testing.T) {
+	// Test Fall 1: System Prompt als String
+	t.Run("String System Prompt", func(t *testing.T) {
+		input := []byte(`{"model": "claude-3-5-sonnet", "system": "Dies ist ein langer System Prompt", "messages": []}`)
+		output := ensureCacheControl(input)
+
+		res := gjson.GetBytes(output, "system.0.cache_control.type")
+		if res.String() != "ephemeral" {
+			t.Errorf("cache_control nicht im System-String gefunden. Output: %s", string(output))
+		}
+	})
+
+	// Test Fall 2: System Prompt als Array
+	t.Run("Array System Prompt", func(t *testing.T) {
+		input := []byte(`{"model": "claude-3-5-sonnet", "system": [{"type": "text", "text": "Teil 1"}, {"type": "text", "text": "Teil 2"}], "messages": []}`)
+		output := ensureCacheControl(input)
+
+		// cache_control sollte nur am LETZTEN Element sein
+		res0 := gjson.GetBytes(output, "system.0.cache_control")
+		res1 := gjson.GetBytes(output, "system.1.cache_control.type")
+
+		if res0.Exists() {
+			t.Errorf("cache_control sollte NICHT am ersten Element sein")
+		}
+		if res1.String() != "ephemeral" {
+			t.Errorf("cache_control nicht am letzten System-Element gefunden. Output: %s", string(output))
+		}
+	})
+
+	// Test Fall 3: Tools werden gecached
+	t.Run("Tools Caching", func(t *testing.T) {
+		input := []byte(`{
+			"model": "claude-3-5-sonnet",
+			"tools": [
+				{"name": "tool1", "description": "First tool", "input_schema": {"type": "object"}},
+				{"name": "tool2", "description": "Second tool", "input_schema": {"type": "object"}}
+			],
+			"system": "System prompt",
+			"messages": []
+		}`)
+		output := ensureCacheControl(input)
+
+		// cache_control sollte nur am LETZTEN Tool sein
+		tool0Cache := gjson.GetBytes(output, "tools.0.cache_control")
+		tool1Cache := gjson.GetBytes(output, "tools.1.cache_control.type")
+
+		if tool0Cache.Exists() {
+			t.Errorf("cache_control sollte NICHT am ersten Tool sein")
+		}
+		if tool1Cache.String() != "ephemeral" {
+			t.Errorf("cache_control nicht am letzten Tool gefunden. Output: %s", string(output))
+		}
+
+		// System sollte auch cache_control haben
+		systemCache := gjson.GetBytes(output, "system.0.cache_control.type")
+		if systemCache.String() != "ephemeral" {
+			t.Errorf("cache_control nicht im System gefunden. Output: %s", string(output))
+		}
+	})
+
+	// Test Fall 4: Tools und System sind UNABHÄNGIGE Breakpoints
+	// Per Anthropic Docs: Bis zu 4 Breakpoints erlaubt, Tools und System werden separat gecached
+	t.Run("Independent Cache Breakpoints", func(t *testing.T) {
+		input := []byte(`{
+			"model": "claude-3-5-sonnet",
+			"tools": [
+				{"name": "tool1", "description": "First tool", "input_schema": {"type": "object"}, "cache_control": {"type": "ephemeral"}}
+			],
+			"system": [{"type": "text", "text": "System"}],
+			"messages": []
+		}`)
+		output := ensureCacheControl(input)
+
+		// Tool hat bereits cache_control - sollte nicht geändert werden
+		tool0Cache := gjson.GetBytes(output, "tools.0.cache_control.type")
+		if tool0Cache.String() != "ephemeral" {
+			t.Errorf("Existierendes cache_control wurde fälschlicherweise entfernt")
+		}
+
+		// System SOLLTE cache_control bekommen, weil es ein UNABHÄNGIGER Breakpoint ist
+		// Tools und System sind separate Cache-Ebenen in der Hierarchie
+		systemCache := gjson.GetBytes(output, "system.0.cache_control.type")
+		if systemCache.String() != "ephemeral" {
+			t.Errorf("System sollte eigenen cache_control Breakpoint haben (unabhängig von Tools)")
+		}
+	})
+
+	// Test Fall 5: Nur Tools, kein System
+	t.Run("Only Tools No System", func(t *testing.T) {
+		input := []byte(`{
+			"model": "claude-3-5-sonnet",
+			"tools": [
+				{"name": "tool1", "description": "Tool", "input_schema": {"type": "object"}}
+			],
+			"messages": [{"role": "user", "content": "Hi"}]
+		}`)
+		output := ensureCacheControl(input)
+
+		toolCache := gjson.GetBytes(output, "tools.0.cache_control.type")
+		if toolCache.String() != "ephemeral" {
+			t.Errorf("cache_control nicht am Tool gefunden. Output: %s", string(output))
+		}
+	})
+
+	// Test Fall 6: Viele Tools (Claude Code Szenario)
+	t.Run("Many Tools (Claude Code Scenario)", func(t *testing.T) {
+		// Simuliere Claude Code mit vielen Tools
+		toolsJSON := `[`
+		for i := 0; i < 50; i++ {
+			if i > 0 {
+				toolsJSON += ","
+			}
+			toolsJSON += fmt.Sprintf(`{"name": "tool%d", "description": "Tool %d", "input_schema": {"type": "object"}}`, i, i)
+		}
+		toolsJSON += `]`
+
+		input := []byte(fmt.Sprintf(`{
+			"model": "claude-3-5-sonnet",
+			"tools": %s,
+			"system": [{"type": "text", "text": "You are Claude Code"}],
+			"messages": [{"role": "user", "content": "Hello"}]
+		}`, toolsJSON))
+
+		output := ensureCacheControl(input)
+
+		// Nur das letzte Tool (index 49) sollte cache_control haben
+		for i := 0; i < 49; i++ {
+			path := fmt.Sprintf("tools.%d.cache_control", i)
+			if gjson.GetBytes(output, path).Exists() {
+				t.Errorf("Tool %d sollte KEIN cache_control haben", i)
+			}
+		}
+
+		lastToolCache := gjson.GetBytes(output, "tools.49.cache_control.type")
+		if lastToolCache.String() != "ephemeral" {
+			t.Errorf("Letztes Tool (49) hat kein cache_control")
+		}
+
+		// System sollte auch cache_control haben
+		systemCache := gjson.GetBytes(output, "system.0.cache_control.type")
+		if systemCache.String() != "ephemeral" {
+			t.Errorf("System hat kein cache_control")
+		}
+
+		fmt.Println("Test 6 (50 Tools) erfolgreich - cache_control nur am letzten Tool!")
+	})
+
+	// Test Fall 7: Leeres Tools-Array
+	t.Run("Empty Tools Array", func(t *testing.T) {
+		input := []byte(`{"model": "claude-3-5-sonnet", "tools": [], "system": "Test", "messages": []}`)
+		output := ensureCacheControl(input)
+
+		// System sollte trotzdem cache_control bekommen
+		systemCache := gjson.GetBytes(output, "system.0.cache_control.type")
+		if systemCache.String() != "ephemeral" {
+			t.Errorf("System sollte cache_control haben auch bei leerem Tools-Array")
+		}
+	})
+}
+
+// TestCacheControlOrder prüft die korrekte Reihenfolge: tools -> system -> messages
+func TestCacheControlOrder(t *testing.T) {
+	input := []byte(`{
+		"model": "claude-sonnet-4",
+		"tools": [
+			{"name": "Read", "description": "Read file", "input_schema": {"type": "object", "properties": {"path": {"type": "string"}}}},
+			{"name": "Write", "description": "Write file", "input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "content": {"type": "string"}}}}
+		],
+		"system": [
+			{"type": "text", "text": "You are Claude Code, Anthropic's official CLI for Claude."},
+			{"type": "text", "text": "Additional instructions here..."}
+		],
+		"messages": [
+			{"role": "user", "content": "Hello"}
+		]
+	}`)
+
+	output := ensureCacheControl(input)
+
+	// Verifiziere die Cache-Breakpoints
+	// 1. Letztes Tool hat cache_control
+	if gjson.GetBytes(output, "tools.1.cache_control.type").String() != "ephemeral" {
+		t.Error("Letztes Tool sollte cache_control haben")
+	}
+
+	// 2. Erstes Tool hat KEIN cache_control
+	if gjson.GetBytes(output, "tools.0.cache_control").Exists() {
+		t.Error("Erstes Tool sollte KEIN cache_control haben")
+	}
+
+	// 3. Letztes System-Element hat cache_control
+	if gjson.GetBytes(output, "system.1.cache_control.type").String() != "ephemeral" {
+		t.Error("Letztes System-Element sollte cache_control haben")
+	}
+
+	// 4. Erstes System-Element hat KEIN cache_control
+	if gjson.GetBytes(output, "system.0.cache_control").Exists() {
+		t.Error("Erstes System-Element sollte KEIN cache_control haben")
+	}
+
+	fmt.Println("Cache-Reihenfolge korrekt: tools -> system")
+}


For consistency with the rest of the codebase, which is in English, it would be best to write these tests in English as well. This includes test names (e.g., t.Run names), comments, and error messages in t.Errorf.

For example:
t.Run("String System Prompt", ...)
t.Errorf("cache_control not found in system string. Output: %s", string(output))

gemini-code-assist · 2026-01-29T00:04:56Z

internal/runtime/executor/caching_verify_test.go

+			t.Errorf("System hat kein cache_control")
+		}
+
+		fmt.Println("Test 6 (50 Tools) erfolgreich - cache_control nur am letzten Tool!")


fmt.Println statements should not be present in test code as they pollute test output. Please remove this line. A similar statement on line 210 should also be removed. If you need to output information during a test, use t.Log or t.Logf, which are only shown when tests are run with the -v flag.

gemini-code-assist · 2026-01-29T00:04:56Z

internal/runtime/executor/claude_executor.go

+	// Check if ANY message content already has cache_control
+	hasCacheControlInMessages := false
+	messages.ForEach(func(_, msg gjson.Result) bool {
+		content := msg.Get("content")
+		if content.IsArray() {
+			content.ForEach(func(_, item gjson.Result) bool {
+				if item.Get("cache_control").Exists() {
+					hasCacheControlInMessages = true
+					return false
+				}
+				return true
+			})
+		}
+		return !hasCacheControlInMessages
+	})
+	if hasCacheControlInMessages {
+		return payload
+	}


The logic to check for an existing cache_control in messages can be significantly simplified by using a gjson path query. This improves readability and reduces the amount of code.

The path messages.#.content.#[cache_control] will efficiently check if any message has a content block with cache_control.

// Check if ANY message content already has cache_control if gjson.GetBytes(payload, "messages.#.content.#[cache_control]").Exists() { return payload }

gemini-code-assist · 2026-01-29T00:04:56Z

internal/runtime/executor/claude_executor.go

+	// Check if ANY tool already has cache_control - if so, don't modify tools
+	hasCacheControlInTools := false
+	tools.ForEach(func(_, tool gjson.Result) bool {
+		if tool.Get("cache_control").Exists() {
+			hasCacheControlInTools = true
+			return false
+		}
+		return true
+	})
+	if hasCacheControlInTools {
+		return payload
+	}


This loop to check for an existing cache_control can be simplified to a single line using a gjson path query. This makes the code more concise and easier to understand.

// Check if ANY tool already has cache_control - if so, don't modify tools if gjson.GetBytes(payload, "tools.#[cache_control]").Exists() { return payload }

gemini-code-assist · 2026-01-29T00:04:56Z

internal/runtime/executor/claude_executor.go

+		// Check if ANY system element already has cache_control
+		hasCacheControlInSystem := false
+		system.ForEach(func(_, item gjson.Result) bool {
+			if item.Get("cache_control").Exists() {
+				hasCacheControlInSystem = true
+				return false
+			}
+			return true
+		})
+		if hasCacheControlInSystem {
+			return payload
+		}


The loop for checking if cache_control exists in the system prompt can be simplified using a gjson path query. This will make the code more readable and concise.

// Check if ANY system element already has cache_control if gjson.GetBytes(payload, "system.#[cache_control]").Exists() { return payload }

gemini-code-assist bot reviewed Jan 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(caching): implement Claude prompt caching with multi-turn support #1295

feat(caching): implement Claude prompt caching with multi-turn support #1295

Uh oh!

SchneeMart commented Jan 29, 2026

Uh oh!

gemini-code-assist bot commented Jan 29, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 29, 2026

Uh oh!

gemini-code-assist bot Jan 29, 2026

Uh oh!

gemini-code-assist bot Jan 29, 2026

Uh oh!

gemini-code-assist bot Jan 29, 2026

Uh oh!

gemini-code-assist bot Jan 29, 2026

Uh oh!

gemini-code-assist bot Jan 29, 2026

Uh oh!

gemini-code-assist bot Jan 29, 2026

Uh oh!

gemini-code-assist bot Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

feat(caching): implement Claude prompt caching with multi-turn support #1295

Are you sure you want to change the base?

feat(caching): implement Claude prompt caching with multi-turn support #1295

Uh oh!

Conversation

SchneeMart commented Jan 29, 2026

Uh oh!

gemini-code-assist bot commented Jan 29, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant