-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
feat: add OpenAI-compatible /v1/embeddings endpoint with API key load balancing #1241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Add support for the OpenAI Embeddings API by bridging requests to Google Gemini's embedContent API. Changes: - sdk/api/handlers/openai/embedding_types.go: Data structures for OpenAI and Gemini embedding request/response formats - sdk/api/handlers/openai/openai_embedding_handlers.go: Handler implementation with concurrent batch processing - internal/api/server.go: Register /v1/embeddings route Features: - OpenAI /v1/embeddings request/response format compatibility - Automatic model mapping (text-embedding-ada-002 -> text-embedding-004) - Batch input support with concurrent Gemini API calls - Token usage estimation (Gemini doesn't return token counts) - Support for Authorization header and x-api-key authentication This enables OpenAI SDK clients (like graphiti-core, LangChain) to use embedding functionality through CLIProxyAPI. Note: Gemini text-embedding-004 returns 768-dimensional vectors. Downstream applications may need to configure their embedding dimension accordingly (e.g., set EMBEDDING_DIM=768 for graphiti-core).
- Add shared embeddingHTTPClient at package level with connection pooling - Implement cancellable context to cancel all goroutines on first error - Use atomic.AddInt64 for thread-safe token counting - Rename extractAPIKey to extractEmbeddingAPIKey to avoid conflicts - Rename callGeminiEmbedContent to callGeminiEmbedContentWithClient This addresses the following review issues: 1. HTTP client reuse (High) - fixed with shared client 2. Error cancellation (High) - fixed with context.WithCancel 3. Architecture consistency - improved function naming Co-Authored-By: Claude <[email protected]>
- Add round-robin API key selection for Embedding requests - Add usage.PublishRecord for Dashboard statistics - Include APIKey field in usage record for proper grouping - Fix async context race condition in logger_plugin - Add EmbeddingAPIKeys field to SDKConfig for runtime key access - Copy GeminiKey to SDKConfig.EmbeddingAPIKeys on server init/update This enables: - Load balancing across multiple Gemini API keys - Proper usage tracking in Dashboard - Support for dummy client keys with server-side credentials Co-Authored-By: Claude <[email protected]>
Summary of ChangesHello @FNZ1961, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the API gateway by introducing an OpenAI-compatible Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
The pull request introduces a new OpenAI-compatible /v1/embeddings endpoint, translating requests to the Gemini embedContent API. It also includes API key load balancing and usage tracking. The fix for the async context race condition in logger_plugin.go is a welcome improvement. The overall implementation of the new embedding endpoint is well-structured, handling input parsing, API key selection, concurrent requests, and error handling effectively.
internal/api/server.go
Outdated
| // Copy Gemini API keys to SDKConfig.EmbeddingAPIKeys for embedding handler access | ||
| if len(cfg.GeminiKey) > 0 { | ||
| keys := make([]string, 0, len(cfg.GeminiKey)) | ||
| for _, gk := range cfg.GeminiKey { | ||
| if gk.APIKey != "" { | ||
| keys = append(keys, gk.APIKey) | ||
| } | ||
| } | ||
| cfg.SDKConfig.EmbeddingAPIKeys = keys |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic for copying Gemini API keys to SDKConfig.EmbeddingAPIKeys is duplicated in both the NewServer function (lines 236-244) and the UpdateClients function (lines 994-1005). To improve maintainability and avoid potential inconsistencies, this logic should be extracted into a dedicated helper function and called from both locations.
func (s *Server) populateEmbeddingAPIKeys(cfg *config.Config) {
if len(cfg.GeminiKey) > 0 {
keys := make([]string, 0, len(cfg.GeminiKey))
for _, gk := range cfg.GeminiKey {
if gk.APIKey != "" {
keys = append(keys, gk.APIKey)
}
}
cfg.SDKConfig.EmbeddingAPIKeys = keys
} else {
cfg.SDKConfig.EmbeddingAPIKeys = nil
}
}
// In NewServer:
// ...
// envAdminPassword = strings.TrimSpace(envAdminPassword)
// envManagementSecret := envAdminPasswordSet && envAdminPassword != ""
// s.populateEmbeddingAPIKeys(cfg)
//
// // Create server instance
// s := &Server{
// ...
// In UpdateClients:
// ...
// s.oldConfigYaml, _ = yaml.Marshal(cfg)
// s.populateEmbeddingAPIKeys(cfg)
// s.handlers.UpdateClients(&cfg.SDKConfig)
// ...
internal/usage/logger_plugin.go
Outdated
| // 直接信任 record.Failed,不再依赖 gin context 状态 | ||
| // 原因:usage records 是异步处理的,此时 gin context 可能已失效或被回收 | ||
| // 这要求所有 Handler/Executor 必须正确设置 Failed 字段 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comments on these lines are in Chinese. For consistency and maintainability within an English codebase, all comments should be in English.
| // 直接信任 record.Failed,不再依赖 gin context 状态 | |
| // 原因:usage records 是异步处理的,此时 gin context 可能已失效或被回收 | |
| // 这要求所有 Handler/Executor 必须正确设置 Failed 字段 | |
| // Directly trust record.Failed, no longer relying on gin context status. | |
| // Reason: usage records are processed asynchronously, and the gin context may be invalid or reclaimed by then. | |
| // This requires all Handler/Executor to correctly set the Failed field. |
| Content: GeminiContent{ | ||
| Parts: []GeminiPart{{Text: text}}, | ||
| }, | ||
| TaskType: "RETRIEVAL_DOCUMENT", // Optimized for document retrieval |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TaskType for the Gemini embedContent API is currently hardcoded to "RETRIEVAL_DOCUMENT". While this is a common use case for embeddings, it might be beneficial to make this configurable or dynamically determined in the future if other task types become relevant or if there's a way to infer it from the OpenAI request.
- Extract duplicated EmbeddingAPIKeys population logic into populateEmbeddingAPIKeys helper function (server.go) - Translate Chinese comments to English for codebase consistency (logger_plugin.go) Co-Authored-By: Claude <[email protected]>
|
I like this, I need this |
Summary
/v1/embeddingsendpoint that translates requests to Gemini embedContent APIChanges
New Features
/v1/embeddingsaccepts OpenAI-format requests and returns OpenAI-format responsesgemini-api-keyentriesBug Fixes
resolveSuccess(ctx)check that caused false failures due to async context race conditionFiles Changed
sdk/api/handlers/openai/openai_embedding_handlers.go- Main embedding handlersdk/api/handlers/openai/embedding_types.go- Request/response typesinternal/config/sdk_config.go- AddedEmbeddingAPIKeysfieldinternal/api/server.go- PopulateEmbeddingAPIKeysfrom configinternal/usage/logger_plugin.go- Fix async race conditionTest Plan