-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
feat(caching): implement Claude prompt caching with multi-turn support #1295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Add ensureCacheControl() to auto-inject cache breakpoints - Cache tools (last tool), system (last element), and messages (2nd-to-last user turn) - Add prompt-caching-2024-07-31 beta header - Include verification test for caching logic Enables up to 90% cost reduction on cached tokens. Co-Authored-By: Claude Opus 4.5 <[email protected]>
Summary of ChangesHello @SchneeMart, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a significant optimization for Claude API interactions by implementing automatic prompt caching. It intelligently injects Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a valuable prompt caching feature for Claude, which will help reduce costs. The implementation correctly identifies the parts of the prompt to cache and adds comprehensive verification tests.
I've identified a critical bug in the error handling when modifying the JSON payload, which could lead to corrupted data being sent to the API. I've also provided several suggestions to enhance code clarity and maintainability by simplifying some of the JSON parsing logic. Additionally, the new test file contains some elements that should be cleaned up, and its language should be aligned with the rest of the codebase for consistency.
| if content.IsArray() { | ||
| // Add cache_control to the last content block of this message | ||
| contentCount := int(content.Get("#").Int()) | ||
| if contentCount > 0 { | ||
| cacheControlPath := fmt.Sprintf("messages.%d.content.%d.cache_control", secondToLastUserIdx, contentCount-1) | ||
| var err error | ||
| payload, err = sjson.SetBytes(payload, cacheControlPath, map[string]string{"type": "ephemeral"}) | ||
| if err != nil { | ||
| log.Warnf("failed to inject cache_control into messages: %v", err) | ||
| } | ||
| } | ||
| } else if content.Type == gjson.String { | ||
| // Convert string content to array with cache_control | ||
| text := content.String() | ||
| newContent := []map[string]interface{}{ | ||
| { | ||
| "type": "text", | ||
| "text": text, | ||
| "cache_control": map[string]string{ | ||
| "type": "ephemeral", | ||
| }, | ||
| }, | ||
| } | ||
| var err error | ||
| payload, err = sjson.SetBytes(payload, contentPath, newContent) | ||
| if err != nil { | ||
| log.Warnf("failed to inject cache_control into message string content: %v", err) | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a potential bug in how errors from sjson.SetBytes are handled. According to the sjson documentation, if an error occurs, the destination byte slice's contents are undefined. The current code logs the error but continues execution, potentially using a corrupted payload. This could lead to malformed requests being sent to the upstream API.
To fix this, you should check the error and return the original, unmodified payload immediately if an error occurs. This ensures that a failure during this optimization step doesn't break the request.
if content.IsArray() {
// Add cache_control to the last content block of this message
contentCount := int(content.Get("#").Int())
if contentCount > 0 {
cacheControlPath := fmt.Sprintf("messages.%d.content.%d.cache_control", secondToLastUserIdx, contentCount-1)
newPayload, err := sjson.SetBytes(payload, cacheControlPath, map[string]string{"type": "ephemeral"})
if err != nil {
log.Warnf("failed to inject cache_control into messages: %v", err)
return payload
}
payload = newPayload
}
} else if content.Type == gjson.String {
// Convert string content to array with cache_control
text := content.String()
newContent := []map[string]interface{}{
{
"type": "text",
"text": text,
"cache_control": map[string]string{
"type": "ephemeral",
},
},
}
newPayload, err := sjson.SetBytes(payload, contentPath, newContent)
if err != nil {
log.Warnf("failed to inject cache_control into message string content: %v", err)
return payload
}
payload = newPayload
}| // Add cache_control to the last tool | ||
| lastToolPath := fmt.Sprintf("tools.%d.cache_control", toolCount-1) | ||
| var err error | ||
| payload, err = sjson.SetBytes(payload, lastToolPath, map[string]string{"type": "ephemeral"}) | ||
| if err != nil { | ||
| log.Warnf("failed to inject cache_control into tools array: %v", err) | ||
| } | ||
|
|
||
| return payload |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's an issue with error handling for sjson.SetBytes. If this call fails, the payload variable becomes undefined, but the function proceeds to return it. This can lead to sending corrupted data upstream.
You should handle the error by returning the original payload immediately, preventing the undefined payload from being used.
// Add cache_control to the last tool
lastToolPath := fmt.Sprintf("tools.%d.cache_control", toolCount-1)
newPayload, err := sjson.SetBytes(payload, lastToolPath, map[string]string{"type": "ephemeral"})
if err != nil {
log.Warnf("failed to inject cache_control into tools array: %v", err)
return payload
}
return newPayload| // Add cache_control to the last system element | ||
| lastSystemPath := fmt.Sprintf("system.%d.cache_control", count-1) | ||
| var err error | ||
| payload, err = sjson.SetBytes(payload, lastSystemPath, map[string]string{"type": "ephemeral"}) | ||
| if err != nil { | ||
| log.Warnf("failed to inject cache_control into system array: %v", err) | ||
| } | ||
| } else if system.Type == gjson.String { | ||
| // Convert string system prompt to array with cache_control | ||
| // "system": "text" -> "system": [{"type": "text", "text": "text", "cache_control": {"type": "ephemeral"}}] | ||
| text := system.String() | ||
| newSystem := []map[string]interface{}{ | ||
| { | ||
| "type": "text", | ||
| "text": text, | ||
| "cache_control": map[string]string{ | ||
| "type": "ephemeral", | ||
| }, | ||
| }, | ||
| } | ||
| var err error | ||
| payload, err = sjson.SetBytes(payload, "system", newSystem) | ||
| if err != nil { | ||
| log.Warnf("failed to inject cache_control into system string: %v", err) | ||
| } | ||
| } | ||
|
|
||
| return payload |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function also has the same error handling issue with sjson.SetBytes as noted in other comments. If sjson.SetBytes fails, the payload becomes undefined and is returned at the end of the function. This could cause malformed requests.
Please ensure that on error, the original, unmodified payload is returned.
// Add cache_control to the last system element
lastSystemPath := fmt.Sprintf("system.%d.cache_control", count-1)
newPayload, err := sjson.SetBytes(payload, lastSystemPath, map[string]string{"type": "ephemeral"})
if err != nil {
log.Warnf("failed to inject cache_control into system array: %v", err)
return payload
}
payload = newPayload
} else if system.Type == gjson.String {
// Convert string system prompt to array with cache_control
// "system": "text" -> "system": [{"type": "text", "text": "text", "cache_control": {"type": "ephemeral"}}]
text := system.String()
newSystem := []map[string]interface{}{
{
"type": "text",
"text": text,
"cache_control": map[string]string{
"type": "ephemeral",
},
},
}
newPayload, err := sjson.SetBytes(payload, "system", newSystem)
if err != nil {
log.Warnf("failed to inject cache_control into system string: %v", err)
return payload
}
payload = newPayload
}
return payload| func TestEnsureCacheControl(t *testing.T) { | ||
| // Test Fall 1: System Prompt als String | ||
| t.Run("String System Prompt", func(t *testing.T) { | ||
| input := []byte(`{"model": "claude-3-5-sonnet", "system": "Dies ist ein langer System Prompt", "messages": []}`) | ||
| output := ensureCacheControl(input) | ||
|
|
||
| res := gjson.GetBytes(output, "system.0.cache_control.type") | ||
| if res.String() != "ephemeral" { | ||
| t.Errorf("cache_control nicht im System-String gefunden. Output: %s", string(output)) | ||
| } | ||
| }) | ||
|
|
||
| // Test Fall 2: System Prompt als Array | ||
| t.Run("Array System Prompt", func(t *testing.T) { | ||
| input := []byte(`{"model": "claude-3-5-sonnet", "system": [{"type": "text", "text": "Teil 1"}, {"type": "text", "text": "Teil 2"}], "messages": []}`) | ||
| output := ensureCacheControl(input) | ||
|
|
||
| // cache_control sollte nur am LETZTEN Element sein | ||
| res0 := gjson.GetBytes(output, "system.0.cache_control") | ||
| res1 := gjson.GetBytes(output, "system.1.cache_control.type") | ||
|
|
||
| if res0.Exists() { | ||
| t.Errorf("cache_control sollte NICHT am ersten Element sein") | ||
| } | ||
| if res1.String() != "ephemeral" { | ||
| t.Errorf("cache_control nicht am letzten System-Element gefunden. Output: %s", string(output)) | ||
| } | ||
| }) | ||
|
|
||
| // Test Fall 3: Tools werden gecached | ||
| t.Run("Tools Caching", func(t *testing.T) { | ||
| input := []byte(`{ | ||
| "model": "claude-3-5-sonnet", | ||
| "tools": [ | ||
| {"name": "tool1", "description": "First tool", "input_schema": {"type": "object"}}, | ||
| {"name": "tool2", "description": "Second tool", "input_schema": {"type": "object"}} | ||
| ], | ||
| "system": "System prompt", | ||
| "messages": [] | ||
| }`) | ||
| output := ensureCacheControl(input) | ||
|
|
||
| // cache_control sollte nur am LETZTEN Tool sein | ||
| tool0Cache := gjson.GetBytes(output, "tools.0.cache_control") | ||
| tool1Cache := gjson.GetBytes(output, "tools.1.cache_control.type") | ||
|
|
||
| if tool0Cache.Exists() { | ||
| t.Errorf("cache_control sollte NICHT am ersten Tool sein") | ||
| } | ||
| if tool1Cache.String() != "ephemeral" { | ||
| t.Errorf("cache_control nicht am letzten Tool gefunden. Output: %s", string(output)) | ||
| } | ||
|
|
||
| // System sollte auch cache_control haben | ||
| systemCache := gjson.GetBytes(output, "system.0.cache_control.type") | ||
| if systemCache.String() != "ephemeral" { | ||
| t.Errorf("cache_control nicht im System gefunden. Output: %s", string(output)) | ||
| } | ||
| }) | ||
|
|
||
| // Test Fall 4: Tools und System sind UNABHÄNGIGE Breakpoints | ||
| // Per Anthropic Docs: Bis zu 4 Breakpoints erlaubt, Tools und System werden separat gecached | ||
| t.Run("Independent Cache Breakpoints", func(t *testing.T) { | ||
| input := []byte(`{ | ||
| "model": "claude-3-5-sonnet", | ||
| "tools": [ | ||
| {"name": "tool1", "description": "First tool", "input_schema": {"type": "object"}, "cache_control": {"type": "ephemeral"}} | ||
| ], | ||
| "system": [{"type": "text", "text": "System"}], | ||
| "messages": [] | ||
| }`) | ||
| output := ensureCacheControl(input) | ||
|
|
||
| // Tool hat bereits cache_control - sollte nicht geändert werden | ||
| tool0Cache := gjson.GetBytes(output, "tools.0.cache_control.type") | ||
| if tool0Cache.String() != "ephemeral" { | ||
| t.Errorf("Existierendes cache_control wurde fälschlicherweise entfernt") | ||
| } | ||
|
|
||
| // System SOLLTE cache_control bekommen, weil es ein UNABHÄNGIGER Breakpoint ist | ||
| // Tools und System sind separate Cache-Ebenen in der Hierarchie | ||
| systemCache := gjson.GetBytes(output, "system.0.cache_control.type") | ||
| if systemCache.String() != "ephemeral" { | ||
| t.Errorf("System sollte eigenen cache_control Breakpoint haben (unabhängig von Tools)") | ||
| } | ||
| }) | ||
|
|
||
| // Test Fall 5: Nur Tools, kein System | ||
| t.Run("Only Tools No System", func(t *testing.T) { | ||
| input := []byte(`{ | ||
| "model": "claude-3-5-sonnet", | ||
| "tools": [ | ||
| {"name": "tool1", "description": "Tool", "input_schema": {"type": "object"}} | ||
| ], | ||
| "messages": [{"role": "user", "content": "Hi"}] | ||
| }`) | ||
| output := ensureCacheControl(input) | ||
|
|
||
| toolCache := gjson.GetBytes(output, "tools.0.cache_control.type") | ||
| if toolCache.String() != "ephemeral" { | ||
| t.Errorf("cache_control nicht am Tool gefunden. Output: %s", string(output)) | ||
| } | ||
| }) | ||
|
|
||
| // Test Fall 6: Viele Tools (Claude Code Szenario) | ||
| t.Run("Many Tools (Claude Code Scenario)", func(t *testing.T) { | ||
| // Simuliere Claude Code mit vielen Tools | ||
| toolsJSON := `[` | ||
| for i := 0; i < 50; i++ { | ||
| if i > 0 { | ||
| toolsJSON += "," | ||
| } | ||
| toolsJSON += fmt.Sprintf(`{"name": "tool%d", "description": "Tool %d", "input_schema": {"type": "object"}}`, i, i) | ||
| } | ||
| toolsJSON += `]` | ||
|
|
||
| input := []byte(fmt.Sprintf(`{ | ||
| "model": "claude-3-5-sonnet", | ||
| "tools": %s, | ||
| "system": [{"type": "text", "text": "You are Claude Code"}], | ||
| "messages": [{"role": "user", "content": "Hello"}] | ||
| }`, toolsJSON)) | ||
|
|
||
| output := ensureCacheControl(input) | ||
|
|
||
| // Nur das letzte Tool (index 49) sollte cache_control haben | ||
| for i := 0; i < 49; i++ { | ||
| path := fmt.Sprintf("tools.%d.cache_control", i) | ||
| if gjson.GetBytes(output, path).Exists() { | ||
| t.Errorf("Tool %d sollte KEIN cache_control haben", i) | ||
| } | ||
| } | ||
|
|
||
| lastToolCache := gjson.GetBytes(output, "tools.49.cache_control.type") | ||
| if lastToolCache.String() != "ephemeral" { | ||
| t.Errorf("Letztes Tool (49) hat kein cache_control") | ||
| } | ||
|
|
||
| // System sollte auch cache_control haben | ||
| systemCache := gjson.GetBytes(output, "system.0.cache_control.type") | ||
| if systemCache.String() != "ephemeral" { | ||
| t.Errorf("System hat kein cache_control") | ||
| } | ||
|
|
||
| fmt.Println("Test 6 (50 Tools) erfolgreich - cache_control nur am letzten Tool!") | ||
| }) | ||
|
|
||
| // Test Fall 7: Leeres Tools-Array | ||
| t.Run("Empty Tools Array", func(t *testing.T) { | ||
| input := []byte(`{"model": "claude-3-5-sonnet", "tools": [], "system": "Test", "messages": []}`) | ||
| output := ensureCacheControl(input) | ||
|
|
||
| // System sollte trotzdem cache_control bekommen | ||
| systemCache := gjson.GetBytes(output, "system.0.cache_control.type") | ||
| if systemCache.String() != "ephemeral" { | ||
| t.Errorf("System sollte cache_control haben auch bei leerem Tools-Array") | ||
| } | ||
| }) | ||
| } | ||
|
|
||
| // TestCacheControlOrder prüft die korrekte Reihenfolge: tools -> system -> messages | ||
| func TestCacheControlOrder(t *testing.T) { | ||
| input := []byte(`{ | ||
| "model": "claude-sonnet-4", | ||
| "tools": [ | ||
| {"name": "Read", "description": "Read file", "input_schema": {"type": "object", "properties": {"path": {"type": "string"}}}}, | ||
| {"name": "Write", "description": "Write file", "input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "content": {"type": "string"}}}} | ||
| ], | ||
| "system": [ | ||
| {"type": "text", "text": "You are Claude Code, Anthropic's official CLI for Claude."}, | ||
| {"type": "text", "text": "Additional instructions here..."} | ||
| ], | ||
| "messages": [ | ||
| {"role": "user", "content": "Hello"} | ||
| ] | ||
| }`) | ||
|
|
||
| output := ensureCacheControl(input) | ||
|
|
||
| // Verifiziere die Cache-Breakpoints | ||
| // 1. Letztes Tool hat cache_control | ||
| if gjson.GetBytes(output, "tools.1.cache_control.type").String() != "ephemeral" { | ||
| t.Error("Letztes Tool sollte cache_control haben") | ||
| } | ||
|
|
||
| // 2. Erstes Tool hat KEIN cache_control | ||
| if gjson.GetBytes(output, "tools.0.cache_control").Exists() { | ||
| t.Error("Erstes Tool sollte KEIN cache_control haben") | ||
| } | ||
|
|
||
| // 3. Letztes System-Element hat cache_control | ||
| if gjson.GetBytes(output, "system.1.cache_control.type").String() != "ephemeral" { | ||
| t.Error("Letztes System-Element sollte cache_control haben") | ||
| } | ||
|
|
||
| // 4. Erstes System-Element hat KEIN cache_control | ||
| if gjson.GetBytes(output, "system.0.cache_control").Exists() { | ||
| t.Error("Erstes System-Element sollte KEIN cache_control haben") | ||
| } | ||
|
|
||
| fmt.Println("Cache-Reihenfolge korrekt: tools -> system") | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency with the rest of the codebase, which is in English, it would be best to write these tests in English as well. This includes test names (e.g., t.Run names), comments, and error messages in t.Errorf.
For example:
t.Run("String System Prompt", ...)
t.Errorf("cache_control not found in system string. Output: %s", string(output))
| t.Errorf("System hat kein cache_control") | ||
| } | ||
|
|
||
| fmt.Println("Test 6 (50 Tools) erfolgreich - cache_control nur am letzten Tool!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // Check if ANY message content already has cache_control | ||
| hasCacheControlInMessages := false | ||
| messages.ForEach(func(_, msg gjson.Result) bool { | ||
| content := msg.Get("content") | ||
| if content.IsArray() { | ||
| content.ForEach(func(_, item gjson.Result) bool { | ||
| if item.Get("cache_control").Exists() { | ||
| hasCacheControlInMessages = true | ||
| return false | ||
| } | ||
| return true | ||
| }) | ||
| } | ||
| return !hasCacheControlInMessages | ||
| }) | ||
| if hasCacheControlInMessages { | ||
| return payload | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic to check for an existing cache_control in messages can be significantly simplified by using a gjson path query. This improves readability and reduces the amount of code.
The path messages.#.content.#[cache_control] will efficiently check if any message has a content block with cache_control.
// Check if ANY message content already has cache_control
if gjson.GetBytes(payload, "messages.#.content.#[cache_control]").Exists() {
return payload
}| // Check if ANY tool already has cache_control - if so, don't modify tools | ||
| hasCacheControlInTools := false | ||
| tools.ForEach(func(_, tool gjson.Result) bool { | ||
| if tool.Get("cache_control").Exists() { | ||
| hasCacheControlInTools = true | ||
| return false | ||
| } | ||
| return true | ||
| }) | ||
| if hasCacheControlInTools { | ||
| return payload | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This loop to check for an existing cache_control can be simplified to a single line using a gjson path query. This makes the code more concise and easier to understand.
// Check if ANY tool already has cache_control - if so, don't modify tools
if gjson.GetBytes(payload, "tools.#[cache_control]").Exists() {
return payload
}| // Check if ANY system element already has cache_control | ||
| hasCacheControlInSystem := false | ||
| system.ForEach(func(_, item gjson.Result) bool { | ||
| if item.Get("cache_control").Exists() { | ||
| hasCacheControlInSystem = true | ||
| return false | ||
| } | ||
| return true | ||
| }) | ||
| if hasCacheControlInSystem { | ||
| return payload | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The loop for checking if cache_control exists in the system prompt can be simplified using a gjson path query. This will make the code more readable and concise.
// Check if ANY system element already has cache_control
if gjson.GetBytes(payload, "system.#[cache_control]").Exists() {
return payload
}
Enables up to 90% cost reduction on cached tokens.
Written with Claude Opus 4.5