Skip to content

Conversation

@SchneeMart
Copy link

  • Add ensureCacheControl() to auto-inject cache breakpoints
  • Cache tools (last tool), system (last element), and messages (2nd-to-last user turn)
  • Add prompt-caching-2024-07-31 beta header
  • Include verification test for caching logic

Enables up to 90% cost reduction on cached tokens.

Written with Claude Opus 4.5

- Add ensureCacheControl() to auto-inject cache breakpoints
- Cache tools (last tool), system (last element), and messages (2nd-to-last user turn)
- Add prompt-caching-2024-07-31 beta header
- Include verification test for caching logic

Enables up to 90% cost reduction on cached tokens.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @SchneeMart, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant optimization for Claude API interactions by implementing automatic prompt caching. It intelligently injects cache_control directives into request payloads for tools, system prompts, and multi-turn messages, ensuring that frequently used prompt components are cached. This enhancement aims to drastically reduce API costs and improve efficiency for applications interacting with Claude models, particularly those with repetitive or conversational patterns.

Highlights

  • Automatic Cache Control Injection: Implemented an ensureCacheControl() function to automatically inject cache_control breakpoints into Claude API requests, optimizing prompt caching.
  • Multi-Turn Caching Support: Configured caching for the last tool in the tools array, the last element in the system array, and the second-to-last user turn in messages to support multi-turn conversations effectively.
  • Beta Header Activation: Added the prompt-caching-2024-07-31 beta header to enable the new caching feature within the Claude API calls.
  • Comprehensive Testing: Included a new verification test file (caching_verify_test.go) with extensive unit tests to validate the ensureCacheControl logic across various prompt structures and scenarios.
  • Cost Reduction: This feature is projected to enable up to a 90% cost reduction on cached tokens by leveraging Anthropic's caching mechanisms.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable prompt caching feature for Claude, which will help reduce costs. The implementation correctly identifies the parts of the prompt to cache and adds comprehensive verification tests.

I've identified a critical bug in the error handling when modifying the JSON payload, which could lead to corrupted data being sent to the API. I've also provided several suggestions to enhance code clarity and maintainability by simplifying some of the JSON parsing logic. Additionally, the new test file contains some elements that should be cleaned up, and its language should be aligned with the rest of the codebase for consistency.

Comment on lines +1078 to +1106
if content.IsArray() {
// Add cache_control to the last content block of this message
contentCount := int(content.Get("#").Int())
if contentCount > 0 {
cacheControlPath := fmt.Sprintf("messages.%d.content.%d.cache_control", secondToLastUserIdx, contentCount-1)
var err error
payload, err = sjson.SetBytes(payload, cacheControlPath, map[string]string{"type": "ephemeral"})
if err != nil {
log.Warnf("failed to inject cache_control into messages: %v", err)
}
}
} else if content.Type == gjson.String {
// Convert string content to array with cache_control
text := content.String()
newContent := []map[string]interface{}{
{
"type": "text",
"text": text,
"cache_control": map[string]string{
"type": "ephemeral",
},
},
}
var err error
payload, err = sjson.SetBytes(payload, contentPath, newContent)
if err != nil {
log.Warnf("failed to inject cache_control into message string content: %v", err)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There's a potential bug in how errors from sjson.SetBytes are handled. According to the sjson documentation, if an error occurs, the destination byte slice's contents are undefined. The current code logs the error but continues execution, potentially using a corrupted payload. This could lead to malformed requests being sent to the upstream API.

To fix this, you should check the error and return the original, unmodified payload immediately if an error occurs. This ensures that a failure during this optimization step doesn't break the request.

    if content.IsArray() {
        // Add cache_control to the last content block of this message
        contentCount := int(content.Get("#").Int())
        if contentCount > 0 {
            cacheControlPath := fmt.Sprintf("messages.%d.content.%d.cache_control", secondToLastUserIdx, contentCount-1)
            newPayload, err := sjson.SetBytes(payload, cacheControlPath, map[string]string{"type": "ephemeral"})
            if err != nil {
                log.Warnf("failed to inject cache_control into messages: %v", err)
                return payload
            }
            payload = newPayload
        }
    } else if content.Type == gjson.String {
        // Convert string content to array with cache_control
        text := content.String()
        newContent := []map[string]interface{}{
            {
                "type": "text",
                "text": text,
                "cache_control": map[string]string{
                    "type": "ephemeral",
                },
            },
        }
        newPayload, err := sjson.SetBytes(payload, contentPath, newContent)
        if err != nil {
            log.Warnf("failed to inject cache_control into message string content: %v", err)
            return payload
        }
        payload = newPayload
    }

Comment on lines +1138 to +1146
// Add cache_control to the last tool
lastToolPath := fmt.Sprintf("tools.%d.cache_control", toolCount-1)
var err error
payload, err = sjson.SetBytes(payload, lastToolPath, map[string]string{"type": "ephemeral"})
if err != nil {
log.Warnf("failed to inject cache_control into tools array: %v", err)
}

return payload
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There's an issue with error handling for sjson.SetBytes. If this call fails, the payload variable becomes undefined, but the function proceeds to return it. This can lead to sending corrupted data upstream.

You should handle the error by returning the original payload immediately, preventing the undefined payload from being used.

    // Add cache_control to the last tool
    lastToolPath := fmt.Sprintf("tools.%d.cache_control", toolCount-1)
    newPayload, err := sjson.SetBytes(payload, lastToolPath, map[string]string{"type": "ephemeral"})
    if err != nil {
        log.Warnf("failed to inject cache_control into tools array: %v", err)
        return payload
    }

    return newPayload

Comment on lines +1177 to +1204
// Add cache_control to the last system element
lastSystemPath := fmt.Sprintf("system.%d.cache_control", count-1)
var err error
payload, err = sjson.SetBytes(payload, lastSystemPath, map[string]string{"type": "ephemeral"})
if err != nil {
log.Warnf("failed to inject cache_control into system array: %v", err)
}
} else if system.Type == gjson.String {
// Convert string system prompt to array with cache_control
// "system": "text" -> "system": [{"type": "text", "text": "text", "cache_control": {"type": "ephemeral"}}]
text := system.String()
newSystem := []map[string]interface{}{
{
"type": "text",
"text": text,
"cache_control": map[string]string{
"type": "ephemeral",
},
},
}
var err error
payload, err = sjson.SetBytes(payload, "system", newSystem)
if err != nil {
log.Warnf("failed to inject cache_control into system string: %v", err)
}
}

return payload
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This function also has the same error handling issue with sjson.SetBytes as noted in other comments. If sjson.SetBytes fails, the payload becomes undefined and is returned at the end of the function. This could cause malformed requests.

Please ensure that on error, the original, unmodified payload is returned.

        // Add cache_control to the last system element
        lastSystemPath := fmt.Sprintf("system.%d.cache_control", count-1)
        newPayload, err := sjson.SetBytes(payload, lastSystemPath, map[string]string{"type": "ephemeral"})
        if err != nil {
            log.Warnf("failed to inject cache_control into system array: %v", err)
            return payload
        }
        payload = newPayload
    } else if system.Type == gjson.String {
        // Convert string system prompt to array with cache_control
        // "system": "text" -> "system": [{"type": "text", "text": "text", "cache_control": {"type": "ephemeral"}}]
        text := system.String()
        newSystem := []map[string]interface{}{
            {
                "type": "text",
                "text": text,
                "cache_control": map[string]string{
                    "type": "ephemeral",
                },
            },
        }
        newPayload, err := sjson.SetBytes(payload, "system", newSystem)
        if err != nil {
            log.Warnf("failed to inject cache_control into system string: %v", err)
            return payload
        }
        payload = newPayload
    }

    return payload

Comment on lines +10 to +211
func TestEnsureCacheControl(t *testing.T) {
// Test Fall 1: System Prompt als String
t.Run("String System Prompt", func(t *testing.T) {
input := []byte(`{"model": "claude-3-5-sonnet", "system": "Dies ist ein langer System Prompt", "messages": []}`)
output := ensureCacheControl(input)

res := gjson.GetBytes(output, "system.0.cache_control.type")
if res.String() != "ephemeral" {
t.Errorf("cache_control nicht im System-String gefunden. Output: %s", string(output))
}
})

// Test Fall 2: System Prompt als Array
t.Run("Array System Prompt", func(t *testing.T) {
input := []byte(`{"model": "claude-3-5-sonnet", "system": [{"type": "text", "text": "Teil 1"}, {"type": "text", "text": "Teil 2"}], "messages": []}`)
output := ensureCacheControl(input)

// cache_control sollte nur am LETZTEN Element sein
res0 := gjson.GetBytes(output, "system.0.cache_control")
res1 := gjson.GetBytes(output, "system.1.cache_control.type")

if res0.Exists() {
t.Errorf("cache_control sollte NICHT am ersten Element sein")
}
if res1.String() != "ephemeral" {
t.Errorf("cache_control nicht am letzten System-Element gefunden. Output: %s", string(output))
}
})

// Test Fall 3: Tools werden gecached
t.Run("Tools Caching", func(t *testing.T) {
input := []byte(`{
"model": "claude-3-5-sonnet",
"tools": [
{"name": "tool1", "description": "First tool", "input_schema": {"type": "object"}},
{"name": "tool2", "description": "Second tool", "input_schema": {"type": "object"}}
],
"system": "System prompt",
"messages": []
}`)
output := ensureCacheControl(input)

// cache_control sollte nur am LETZTEN Tool sein
tool0Cache := gjson.GetBytes(output, "tools.0.cache_control")
tool1Cache := gjson.GetBytes(output, "tools.1.cache_control.type")

if tool0Cache.Exists() {
t.Errorf("cache_control sollte NICHT am ersten Tool sein")
}
if tool1Cache.String() != "ephemeral" {
t.Errorf("cache_control nicht am letzten Tool gefunden. Output: %s", string(output))
}

// System sollte auch cache_control haben
systemCache := gjson.GetBytes(output, "system.0.cache_control.type")
if systemCache.String() != "ephemeral" {
t.Errorf("cache_control nicht im System gefunden. Output: %s", string(output))
}
})

// Test Fall 4: Tools und System sind UNABHÄNGIGE Breakpoints
// Per Anthropic Docs: Bis zu 4 Breakpoints erlaubt, Tools und System werden separat gecached
t.Run("Independent Cache Breakpoints", func(t *testing.T) {
input := []byte(`{
"model": "claude-3-5-sonnet",
"tools": [
{"name": "tool1", "description": "First tool", "input_schema": {"type": "object"}, "cache_control": {"type": "ephemeral"}}
],
"system": [{"type": "text", "text": "System"}],
"messages": []
}`)
output := ensureCacheControl(input)

// Tool hat bereits cache_control - sollte nicht geändert werden
tool0Cache := gjson.GetBytes(output, "tools.0.cache_control.type")
if tool0Cache.String() != "ephemeral" {
t.Errorf("Existierendes cache_control wurde fälschlicherweise entfernt")
}

// System SOLLTE cache_control bekommen, weil es ein UNABHÄNGIGER Breakpoint ist
// Tools und System sind separate Cache-Ebenen in der Hierarchie
systemCache := gjson.GetBytes(output, "system.0.cache_control.type")
if systemCache.String() != "ephemeral" {
t.Errorf("System sollte eigenen cache_control Breakpoint haben (unabhängig von Tools)")
}
})

// Test Fall 5: Nur Tools, kein System
t.Run("Only Tools No System", func(t *testing.T) {
input := []byte(`{
"model": "claude-3-5-sonnet",
"tools": [
{"name": "tool1", "description": "Tool", "input_schema": {"type": "object"}}
],
"messages": [{"role": "user", "content": "Hi"}]
}`)
output := ensureCacheControl(input)

toolCache := gjson.GetBytes(output, "tools.0.cache_control.type")
if toolCache.String() != "ephemeral" {
t.Errorf("cache_control nicht am Tool gefunden. Output: %s", string(output))
}
})

// Test Fall 6: Viele Tools (Claude Code Szenario)
t.Run("Many Tools (Claude Code Scenario)", func(t *testing.T) {
// Simuliere Claude Code mit vielen Tools
toolsJSON := `[`
for i := 0; i < 50; i++ {
if i > 0 {
toolsJSON += ","
}
toolsJSON += fmt.Sprintf(`{"name": "tool%d", "description": "Tool %d", "input_schema": {"type": "object"}}`, i, i)
}
toolsJSON += `]`

input := []byte(fmt.Sprintf(`{
"model": "claude-3-5-sonnet",
"tools": %s,
"system": [{"type": "text", "text": "You are Claude Code"}],
"messages": [{"role": "user", "content": "Hello"}]
}`, toolsJSON))

output := ensureCacheControl(input)

// Nur das letzte Tool (index 49) sollte cache_control haben
for i := 0; i < 49; i++ {
path := fmt.Sprintf("tools.%d.cache_control", i)
if gjson.GetBytes(output, path).Exists() {
t.Errorf("Tool %d sollte KEIN cache_control haben", i)
}
}

lastToolCache := gjson.GetBytes(output, "tools.49.cache_control.type")
if lastToolCache.String() != "ephemeral" {
t.Errorf("Letztes Tool (49) hat kein cache_control")
}

// System sollte auch cache_control haben
systemCache := gjson.GetBytes(output, "system.0.cache_control.type")
if systemCache.String() != "ephemeral" {
t.Errorf("System hat kein cache_control")
}

fmt.Println("Test 6 (50 Tools) erfolgreich - cache_control nur am letzten Tool!")
})

// Test Fall 7: Leeres Tools-Array
t.Run("Empty Tools Array", func(t *testing.T) {
input := []byte(`{"model": "claude-3-5-sonnet", "tools": [], "system": "Test", "messages": []}`)
output := ensureCacheControl(input)

// System sollte trotzdem cache_control bekommen
systemCache := gjson.GetBytes(output, "system.0.cache_control.type")
if systemCache.String() != "ephemeral" {
t.Errorf("System sollte cache_control haben auch bei leerem Tools-Array")
}
})
}

// TestCacheControlOrder prüft die korrekte Reihenfolge: tools -> system -> messages
func TestCacheControlOrder(t *testing.T) {
input := []byte(`{
"model": "claude-sonnet-4",
"tools": [
{"name": "Read", "description": "Read file", "input_schema": {"type": "object", "properties": {"path": {"type": "string"}}}},
{"name": "Write", "description": "Write file", "input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "content": {"type": "string"}}}}
],
"system": [
{"type": "text", "text": "You are Claude Code, Anthropic's official CLI for Claude."},
{"type": "text", "text": "Additional instructions here..."}
],
"messages": [
{"role": "user", "content": "Hello"}
]
}`)

output := ensureCacheControl(input)

// Verifiziere die Cache-Breakpoints
// 1. Letztes Tool hat cache_control
if gjson.GetBytes(output, "tools.1.cache_control.type").String() != "ephemeral" {
t.Error("Letztes Tool sollte cache_control haben")
}

// 2. Erstes Tool hat KEIN cache_control
if gjson.GetBytes(output, "tools.0.cache_control").Exists() {
t.Error("Erstes Tool sollte KEIN cache_control haben")
}

// 3. Letztes System-Element hat cache_control
if gjson.GetBytes(output, "system.1.cache_control.type").String() != "ephemeral" {
t.Error("Letztes System-Element sollte cache_control haben")
}

// 4. Erstes System-Element hat KEIN cache_control
if gjson.GetBytes(output, "system.0.cache_control").Exists() {
t.Error("Erstes System-Element sollte KEIN cache_control haben")
}

fmt.Println("Cache-Reihenfolge korrekt: tools -> system")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For consistency with the rest of the codebase, which is in English, it would be best to write these tests in English as well. This includes test names (e.g., t.Run names), comments, and error messages in t.Errorf.

For example:
t.Run("String System Prompt", ...)
t.Errorf("cache_control not found in system string. Output: %s", string(output))

t.Errorf("System hat kein cache_control")
}

fmt.Println("Test 6 (50 Tools) erfolgreich - cache_control nur am letzten Tool!")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

fmt.Println statements should not be present in test code as they pollute test output. Please remove this line. A similar statement on line 210 should also be removed. If you need to output information during a test, use t.Log or t.Logf, which are only shown when tests are run with the -v flag.

Comment on lines +1038 to +1055
// Check if ANY message content already has cache_control
hasCacheControlInMessages := false
messages.ForEach(func(_, msg gjson.Result) bool {
content := msg.Get("content")
if content.IsArray() {
content.ForEach(func(_, item gjson.Result) bool {
if item.Get("cache_control").Exists() {
hasCacheControlInMessages = true
return false
}
return true
})
}
return !hasCacheControlInMessages
})
if hasCacheControlInMessages {
return payload
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic to check for an existing cache_control in messages can be significantly simplified by using a gjson path query. This improves readability and reduces the amount of code.

The path messages.#.content.#[cache_control] will efficiently check if any message has a content block with cache_control.

    // Check if ANY message content already has cache_control
    if gjson.GetBytes(payload, "messages.#.content.#[cache_control]").Exists() {
        return payload
    }

Comment on lines +1125 to +1136
// Check if ANY tool already has cache_control - if so, don't modify tools
hasCacheControlInTools := false
tools.ForEach(func(_, tool gjson.Result) bool {
if tool.Get("cache_control").Exists() {
hasCacheControlInTools = true
return false
}
return true
})
if hasCacheControlInTools {
return payload
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This loop to check for an existing cache_control can be simplified to a single line using a gjson path query. This makes the code more concise and easier to understand.

    // Check if ANY tool already has cache_control - if so, don't modify tools
    if gjson.GetBytes(payload, "tools.#[cache_control]").Exists() {
        return payload
    }

Comment on lines +1164 to +1175
// Check if ANY system element already has cache_control
hasCacheControlInSystem := false
system.ForEach(func(_, item gjson.Result) bool {
if item.Get("cache_control").Exists() {
hasCacheControlInSystem = true
return false
}
return true
})
if hasCacheControlInSystem {
return payload
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The loop for checking if cache_control exists in the system prompt can be simplified using a gjson path query. This will make the code more readable and concise.

        // Check if ANY system element already has cache_control
        if gjson.GetBytes(payload, "system.#[cache_control]").Exists() {
            return payload
        }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant