feat: add OpenAI-compatible /v1/embeddings endpoint with API key load balancing #1241

FNZ1961 · 2026-01-26T08:13:47Z

Summary

Add OpenAI-compatible /v1/embeddings endpoint that translates requests to Gemini embedContent API
Implement round-robin API key selection for load balancing across multiple Gemini API keys
Add usage tracking for embedding requests to Dashboard statistics
Fix async context race condition in logger_plugin that caused false failure reports

Changes

New Features

Embedding Endpoint: /v1/embeddings accepts OpenAI-format requests and returns OpenAI-format responses
Model Mapping: Automatically maps OpenAI embedding models (text-embedding-ada-002, text-embedding-3-small/large) to Gemini text-embedding-004
API Key Load Balancing: Round-robin selection across configured gemini-api-key entries
Usage Tracking: Embedding requests now appear in Dashboard model statistics

Bug Fixes

Logger Plugin: Remove resolveSuccess(ctx) check that caused false failures due to async context race condition

Files Changed

sdk/api/handlers/openai/openai_embedding_handlers.go - Main embedding handler
sdk/api/handlers/openai/embedding_types.go - Request/response types
internal/config/sdk_config.go - Added EmbeddingAPIKeys field
internal/api/server.go - Populate EmbeddingAPIKeys from config
internal/usage/logger_plugin.go - Fix async race condition

Test Plan

Verify embedding requests return valid OpenAI-format responses
Verify round-robin distributes requests across multiple API keys
Verify Dashboard model statistics show successful embedding requests
Verify existing Chat/Completion functionality unchanged

Add support for the OpenAI Embeddings API by bridging requests to Google Gemini's embedContent API. Changes: - sdk/api/handlers/openai/embedding_types.go: Data structures for OpenAI and Gemini embedding request/response formats - sdk/api/handlers/openai/openai_embedding_handlers.go: Handler implementation with concurrent batch processing - internal/api/server.go: Register /v1/embeddings route Features: - OpenAI /v1/embeddings request/response format compatibility - Automatic model mapping (text-embedding-ada-002 -> text-embedding-004) - Batch input support with concurrent Gemini API calls - Token usage estimation (Gemini doesn't return token counts) - Support for Authorization header and x-api-key authentication This enables OpenAI SDK clients (like graphiti-core, LangChain) to use embedding functionality through CLIProxyAPI. Note: Gemini text-embedding-004 returns 768-dimensional vectors. Downstream applications may need to configure their embedding dimension accordingly (e.g., set EMBEDDING_DIM=768 for graphiti-core).

- Add shared embeddingHTTPClient at package level with connection pooling - Implement cancellable context to cancel all goroutines on first error - Use atomic.AddInt64 for thread-safe token counting - Rename extractAPIKey to extractEmbeddingAPIKey to avoid conflicts - Rename callGeminiEmbedContent to callGeminiEmbedContentWithClient This addresses the following review issues: 1. HTTP client reuse (High) - fixed with shared client 2. Error cancellation (High) - fixed with context.WithCancel 3. Architecture consistency - improved function naming Co-Authored-By: Claude <[email protected]>

- Add round-robin API key selection for Embedding requests - Add usage.PublishRecord for Dashboard statistics - Include APIKey field in usage record for proper grouping - Fix async context race condition in logger_plugin - Add EmbeddingAPIKeys field to SDKConfig for runtime key access - Copy GeminiKey to SDKConfig.EmbeddingAPIKeys on server init/update This enables: - Load balancing across multiple Gemini API keys - Proper usage tracking in Dashboard - Support for dummy client keys with server-side credentials Co-Authored-By: Claude <[email protected]>

gemini-code-assist · 2026-01-26T08:14:05Z

Summary of Changes

Hello @FNZ1961, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the API gateway by introducing an OpenAI-compatible /v1/embeddings endpoint, which seamlessly integrates with the Gemini embedContent API. It improves resilience and scalability through round-robin API key load balancing for Gemini requests and provides better operational visibility by incorporating embedding usage into dashboard statistics. Additionally, a critical bug in the logging mechanism related to asynchronous context handling has been addressed, ensuring more accurate failure reporting.

Highlights

OpenAI-compatible Embeddings Endpoint: A new /v1/embeddings endpoint has been added, designed to accept OpenAI-formatted requests and translate them to the Gemini embedContent API, returning responses in an OpenAI-compatible format.
API Key Load Balancing: The system now supports round-robin selection for Gemini API keys, enabling load balancing across multiple configured keys for embedding requests.
Usage Tracking for Embeddings: Embedding requests are now tracked and reported in the Dashboard statistics, providing visibility into their usage.
Logger Plugin Race Condition Fix: An asynchronous context race condition in the logger_plugin that caused false failure reports has been resolved by directly trusting the record.Failed status.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

The pull request introduces a new OpenAI-compatible /v1/embeddings endpoint, translating requests to the Gemini embedContent API. It also includes API key load balancing and usage tracking. The fix for the async context race condition in logger_plugin.go is a welcome improvement. The overall implementation of the new embedding endpoint is well-structured, handling input parsing, API key selection, concurrent requests, and error handling effectively.

gemini-code-assist · 2026-01-26T08:15:47Z

internal/api/server.go

+	// Copy Gemini API keys to SDKConfig.EmbeddingAPIKeys for embedding handler access
+	if len(cfg.GeminiKey) > 0 {
+		keys := make([]string, 0, len(cfg.GeminiKey))
+		for _, gk := range cfg.GeminiKey {
+			if gk.APIKey != "" {
+				keys = append(keys, gk.APIKey)
+			}
+		}
+		cfg.SDKConfig.EmbeddingAPIKeys = keys


The logic for copying Gemini API keys to SDKConfig.EmbeddingAPIKeys is duplicated in both the NewServer function (lines 236-244) and the UpdateClients function (lines 994-1005). To improve maintainability and avoid potential inconsistencies, this logic should be extracted into a dedicated helper function and called from both locations.

func (s *Server) populateEmbeddingAPIKeys(cfg *config.Config) { if len(cfg.GeminiKey) > 0 { keys := make([]string, 0, len(cfg.GeminiKey)) for _, gk := range cfg.GeminiKey { if gk.APIKey != "" { keys = append(keys, gk.APIKey) } } cfg.SDKConfig.EmbeddingAPIKeys = keys } else { cfg.SDKConfig.EmbeddingAPIKeys = nil } } // In NewServer: // ... // envAdminPassword = strings.TrimSpace(envAdminPassword) // envManagementSecret := envAdminPasswordSet && envAdminPassword != "" // s.populateEmbeddingAPIKeys(cfg) // // // Create server instance // s := &Server{ // ... // In UpdateClients: // ... // s.oldConfigYaml, _ = yaml.Marshal(cfg) // s.populateEmbeddingAPIKeys(cfg) // s.handlers.UpdateClients(&cfg.SDKConfig) // ...

gemini-code-assist · 2026-01-26T08:15:47Z

internal/usage/logger_plugin.go

+	// 直接信任 record.Failed，不再依赖 gin context 状态
+	// 原因：usage records 是异步处理的，此时 gin context 可能已失效或被回收
+	// 这要求所有 Handler/Executor 必须正确设置 Failed 字段


The comments on these lines are in Chinese. For consistency and maintainability within an English codebase, all comments should be in English.

Suggested change

// 直接信任 record.Failed，不再依赖 gin context 状态

// 原因：usage records 是异步处理的，此时 gin context 可能已失效或被回收

// 这要求所有 Handler/Executor 必须正确设置 Failed 字段

// Directly trust record.Failed, no longer relying on gin context status.

// Reason: usage records are processed asynchronously, and the gin context may be invalid or reclaimed by then.

// This requires all Handler/Executor to correctly set the Failed field.

gemini-code-assist · 2026-01-26T08:15:47Z

sdk/api/handlers/openai/openai_embedding_handlers.go

+		Content: GeminiContent{
+			Parts: []GeminiPart{{Text: text}},
+		},
+		TaskType: "RETRIEVAL_DOCUMENT", // Optimized for document retrieval


The TaskType for the Gemini embedContent API is currently hardcoded to "RETRIEVAL_DOCUMENT". While this is a common use case for embeddings, it might be beneficial to make this configurable or dynamically determined in the future if other task types become relevant or if there's a way to infer it from the OpenAI request.

- Extract duplicated EmbeddingAPIKeys population logic into populateEmbeddingAPIKeys helper function (server.go) - Translate Chinese comments to English for codebase consistency (logger_plugin.go) Co-Authored-By: Claude <[email protected]>

dinhkarate · 2026-01-29T12:15:22Z

I like this, I need this

FNZ1961 and others added 3 commits January 26, 2026 01:30

gemini-code-assist bot reviewed Jan 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add OpenAI-compatible /v1/embeddings endpoint with API key load balancing #1241

feat: add OpenAI-compatible /v1/embeddings endpoint with API key load balancing #1241

FNZ1961 commented Jan 26, 2026

Uh oh!

gemini-code-assist bot commented Jan 26, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 26, 2026

Uh oh!

gemini-code-assist bot Jan 26, 2026

Uh oh!

gemini-code-assist bot Jan 26, 2026

Uh oh!

dinhkarate commented Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

feat: add OpenAI-compatible /v1/embeddings endpoint with API key load balancing #1241

Are you sure you want to change the base?

feat: add OpenAI-compatible /v1/embeddings endpoint with API key load balancing #1241

Conversation

FNZ1961 commented Jan 26, 2026

Summary

Changes

New Features

Bug Fixes

Files Changed

Test Plan

Uh oh!

gemini-code-assist bot commented Jan 26, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

dinhkarate commented Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants