Skip to content

Conversation

@FNZ1961
Copy link

@FNZ1961 FNZ1961 commented Jan 26, 2026

Summary

  • Add OpenAI-compatible /v1/embeddings endpoint that translates requests to Gemini embedContent API
  • Implement round-robin API key selection for load balancing across multiple Gemini API keys
  • Add usage tracking for embedding requests to Dashboard statistics
  • Fix async context race condition in logger_plugin that caused false failure reports

Changes

New Features

  • Embedding Endpoint: /v1/embeddings accepts OpenAI-format requests and returns OpenAI-format responses
  • Model Mapping: Automatically maps OpenAI embedding models (text-embedding-ada-002, text-embedding-3-small/large) to Gemini text-embedding-004
  • API Key Load Balancing: Round-robin selection across configured gemini-api-key entries
  • Usage Tracking: Embedding requests now appear in Dashboard model statistics

Bug Fixes

  • Logger Plugin: Remove resolveSuccess(ctx) check that caused false failures due to async context race condition

Files Changed

  • sdk/api/handlers/openai/openai_embedding_handlers.go - Main embedding handler
  • sdk/api/handlers/openai/embedding_types.go - Request/response types
  • internal/config/sdk_config.go - Added EmbeddingAPIKeys field
  • internal/api/server.go - Populate EmbeddingAPIKeys from config
  • internal/usage/logger_plugin.go - Fix async race condition

Test Plan

  • Verify embedding requests return valid OpenAI-format responses
  • Verify round-robin distributes requests across multiple API keys
  • Verify Dashboard model statistics show successful embedding requests
  • Verify existing Chat/Completion functionality unchanged

FNZ1961 and others added 3 commits January 26, 2026 01:30
Add support for the OpenAI Embeddings API by bridging requests to
Google Gemini's embedContent API.

Changes:
- sdk/api/handlers/openai/embedding_types.go: Data structures for
  OpenAI and Gemini embedding request/response formats
- sdk/api/handlers/openai/openai_embedding_handlers.go: Handler
  implementation with concurrent batch processing
- internal/api/server.go: Register /v1/embeddings route

Features:
- OpenAI /v1/embeddings request/response format compatibility
- Automatic model mapping (text-embedding-ada-002 -> text-embedding-004)
- Batch input support with concurrent Gemini API calls
- Token usage estimation (Gemini doesn't return token counts)
- Support for Authorization header and x-api-key authentication

This enables OpenAI SDK clients (like graphiti-core, LangChain) to use
embedding functionality through CLIProxyAPI.

Note: Gemini text-embedding-004 returns 768-dimensional vectors.
Downstream applications may need to configure their embedding dimension
accordingly (e.g., set EMBEDDING_DIM=768 for graphiti-core).
- Add shared embeddingHTTPClient at package level with connection pooling
- Implement cancellable context to cancel all goroutines on first error
- Use atomic.AddInt64 for thread-safe token counting
- Rename extractAPIKey to extractEmbeddingAPIKey to avoid conflicts
- Rename callGeminiEmbedContent to callGeminiEmbedContentWithClient

This addresses the following review issues:
1. HTTP client reuse (High) - fixed with shared client
2. Error cancellation (High) - fixed with context.WithCancel
3. Architecture consistency - improved function naming

Co-Authored-By: Claude <[email protected]>
- Add round-robin API key selection for Embedding requests
- Add usage.PublishRecord for Dashboard statistics
- Include APIKey field in usage record for proper grouping
- Fix async context race condition in logger_plugin
- Add EmbeddingAPIKeys field to SDKConfig for runtime key access
- Copy GeminiKey to SDKConfig.EmbeddingAPIKeys on server init/update

This enables:
- Load balancing across multiple Gemini API keys
- Proper usage tracking in Dashboard
- Support for dummy client keys with server-side credentials

Co-Authored-By: Claude <[email protected]>
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @FNZ1961, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the API gateway by introducing an OpenAI-compatible /v1/embeddings endpoint, which seamlessly integrates with the Gemini embedContent API. It improves resilience and scalability through round-robin API key load balancing for Gemini requests and provides better operational visibility by incorporating embedding usage into dashboard statistics. Additionally, a critical bug in the logging mechanism related to asynchronous context handling has been addressed, ensuring more accurate failure reporting.

Highlights

  • OpenAI-compatible Embeddings Endpoint: A new /v1/embeddings endpoint has been added, designed to accept OpenAI-formatted requests and translate them to the Gemini embedContent API, returning responses in an OpenAI-compatible format.
  • API Key Load Balancing: The system now supports round-robin selection for Gemini API keys, enabling load balancing across multiple configured keys for embedding requests.
  • Usage Tracking for Embeddings: Embedding requests are now tracked and reported in the Dashboard statistics, providing visibility into their usage.
  • Logger Plugin Race Condition Fix: An asynchronous context race condition in the logger_plugin that caused false failure reports has been resolved by directly trusting the record.Failed status.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces a new OpenAI-compatible /v1/embeddings endpoint, translating requests to the Gemini embedContent API. It also includes API key load balancing and usage tracking. The fix for the async context race condition in logger_plugin.go is a welcome improvement. The overall implementation of the new embedding endpoint is well-structured, handling input parsing, API key selection, concurrent requests, and error handling effectively.

Comment on lines 236 to 244
// Copy Gemini API keys to SDKConfig.EmbeddingAPIKeys for embedding handler access
if len(cfg.GeminiKey) > 0 {
keys := make([]string, 0, len(cfg.GeminiKey))
for _, gk := range cfg.GeminiKey {
if gk.APIKey != "" {
keys = append(keys, gk.APIKey)
}
}
cfg.SDKConfig.EmbeddingAPIKeys = keys
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for copying Gemini API keys to SDKConfig.EmbeddingAPIKeys is duplicated in both the NewServer function (lines 236-244) and the UpdateClients function (lines 994-1005). To improve maintainability and avoid potential inconsistencies, this logic should be extracted into a dedicated helper function and called from both locations.

func (s *Server) populateEmbeddingAPIKeys(cfg *config.Config) {
	if len(cfg.GeminiKey) > 0 {
		keys := make([]string, 0, len(cfg.GeminiKey))
		for _, gk := range cfg.GeminiKey {
			if gk.APIKey != "" {
				keys = append(keys, gk.APIKey)
			}
		}
		cfg.SDKConfig.EmbeddingAPIKeys = keys
	} else {
		cfg.SDKConfig.EmbeddingAPIKeys = nil
	}
}

// In NewServer:
// ...
// envAdminPassword = strings.TrimSpace(envAdminPassword)
// envManagementSecret := envAdminPasswordSet && envAdminPassword != ""
// s.populateEmbeddingAPIKeys(cfg)
//
// // Create server instance
// s := &Server{
// ...

// In UpdateClients:
// ...
// s.oldConfigYaml, _ = yaml.Marshal(cfg)
// s.populateEmbeddingAPIKeys(cfg)
// s.handlers.UpdateClients(&cfg.SDKConfig)
// ...

Comment on lines 171 to 173
// 直接信任 record.Failed,不再依赖 gin context 状态
// 原因:usage records 是异步处理的,此时 gin context 可能已失效或被回收
// 这要求所有 Handler/Executor 必须正确设置 Failed 字段
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comments on these lines are in Chinese. For consistency and maintainability within an English codebase, all comments should be in English.

Suggested change
// 直接信任 record.Failed,不再依赖 gin context 状态
// 原因:usage records 是异步处理的,此时 gin context 可能已失效或被回收
// 这要求所有 Handler/Executor 必须正确设置 Failed 字段
// Directly trust record.Failed, no longer relying on gin context status.
// Reason: usage records are processed asynchronously, and the gin context may be invalid or reclaimed by then.
// This requires all Handler/Executor to correctly set the Failed field.

Content: GeminiContent{
Parts: []GeminiPart{{Text: text}},
},
TaskType: "RETRIEVAL_DOCUMENT", // Optimized for document retrieval
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The TaskType for the Gemini embedContent API is currently hardcoded to "RETRIEVAL_DOCUMENT". While this is a common use case for embeddings, it might be beneficial to make this configurable or dynamically determined in the future if other task types become relevant or if there's a way to infer it from the OpenAI request.

- Extract duplicated EmbeddingAPIKeys population logic into
  populateEmbeddingAPIKeys helper function (server.go)
- Translate Chinese comments to English for codebase consistency
  (logger_plugin.go)

Co-Authored-By: Claude <[email protected]>
@dinhkarate
Copy link
Contributor

I like this, I need this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants