Add hybrid-rrf pipeline check and improve embedding error reporting#26936
Add hybrid-rrf pipeline check and improve embedding error reporting#26936
Conversation
…n status validation - Add isHybridSearchPipelineAvailable() to OpenSearchVectorService and SearchRepository to verify the hybrid-rrf search pipeline exists in OpenSearch - Update getEmbeddingsValidation in SystemRepository to also validate the pipeline when embeddings are enabled, failing with actionable message if missing - Retry vector service initialization during validation to surface the actual failure reason (embedding client vs OpenSearch vector service) instead of the generic "not configured properly" message Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR enhances semantic search health validation for OpenSearch-backed embeddings by checking for the required hybrid-rrf search pipeline and by improving initialization error reporting to better distinguish embedding client vs. vector service failures.
Changes:
- Add a
hybrid-rrfpipeline existence check (via GET/_search/pipeline/hybrid-rrf) and surface an actionable “reindex to create it” message when missing. - Update embeddings system validation to retry vector service initialization during health checks and produce more specific failure messages.
- Add unit tests for
OpenSearchVectorService.isHybridSearchPipelineAvailable()covering success, 404, and exception cases.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/SystemRepository.java | Extends embeddings health validation to retry init and validate hybrid pipeline availability with improved messaging. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/SearchRepository.java | Exposes isHybridSearchPipelineAvailable() by delegating to OpenSearchVectorService when applicable. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/vector/OpenSearchVectorService.java | Adds isHybridSearchPipelineAvailable() that checks the hybrid-rrf search pipeline via the OpenSearch generic client. |
| openmetadata-service/src/test/java/org/openmetadata/service/search/vector/OpenSearchVectorServiceTest.java | Adds unit tests for the new pipeline-availability method. |
openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/SystemRepository.java
Outdated
Show resolved
Hide resolved
...ta-service/src/main/java/org/openmetadata/service/search/vector/OpenSearchVectorService.java
Outdated
Show resolved
Hide resolved
openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/SystemRepository.java
Outdated
Show resolved
Hide resolved
| if (searchRepository.getVectorIndexService() == null) { | ||
| return retryInitAndReportError( | ||
| searchRepository, embeddingsValidation, description, configMessage); | ||
| } | ||
|
|
||
| try { | ||
| StepValidation embeddingResult = | ||
| validateEmbeddingGeneration( | ||
| searchRepository.getEmbeddingClient(), | ||
| embeddingsValidation, | ||
| description, | ||
| configMessage); | ||
|
|
||
| if (Boolean.FALSE.equals(embeddingResult.getPassed())) { | ||
| return embeddingResult; | ||
| } | ||
|
|
||
| return validateHybridSearchPipeline( | ||
| searchRepository, embeddingResult, description, configMessage); | ||
| } catch (Exception e) { |
There was a problem hiding this comment.
The embeddings health-check logic now includes additional branches (retrying initializeVectorSearchService(), differentiating embedding-client vs vector-service initialization, and validating hybrid-rrf pipeline presence). There don’t appear to be unit tests covering these new paths; adding tests for (1) pipeline exists, (2) pipeline missing (404), and (3) pipeline check failure (exception/401/5xx) would help prevent regressions in health-check reporting.
openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/SystemRepository.java
Outdated
Show resolved
Hide resolved
openmetadata-service/src/main/java/org/openmetadata/service/search/SearchRepository.java
Outdated
Show resolved
Hide resolved
- Fix bug: retry path now validates embedding generation before checking the hybrid pipeline (gitar) - Return Optional<String> from checkHybridSearchPipeline instead of boolean, differentiating 404 (missing) from 5xx/connectivity errors with specific messages (copilot) - Include initialization exception message in StepValidation so operators don't need to hunt logs (copilot) - Return Optional.empty() for non-OpenSearch implementations since hybrid pipeline is OpenSearch-specific (gitar) - Add test for 5xx response case in pipeline check Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
initializeVectorSearchService() catches exceptions internally and never rethrows, so the try-catch in retryInitAndReportError never captured the error. Store the exception message in vectorServiceInitError field and read it in the validation to include in the health check message. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
openmetadata-service/src/main/java/org/openmetadata/service/search/SearchRepository.java
Show resolved
Hide resolved
openmetadata-service/src/main/java/org/openmetadata/service/search/SearchRepository.java
Show resolved
Hide resolved
| @Getter private EmbeddingClient embeddingClient; | ||
| @Getter private VectorIndexService vectorIndexService; | ||
| @Getter private VectorEmbeddingHandler vectorEmbeddingHandler; | ||
| @Getter private String vectorServiceInitError; |
There was a problem hiding this comment.
vectorServiceInitError is written inside a synchronized method (initializeVectorSearchService) but read via an unsynchronized getter. Without volatile (or synchronized access), other threads may not observe the updated value, which can lead to stale/missing error details in validations/logs. Consider making this field volatile (and optionally clearing it on successful init to avoid stale errors).
| @Getter private String vectorServiceInitError; | |
| @Getter private volatile String vectorServiceInitError; |
| StepValidation embeddingsValidation, | ||
| String description, | ||
| String configMessage) { | ||
| searchRepository.initializeVectorSearchService(); |
There was a problem hiding this comment.
retryInitAndReportError() triggers initializeVectorSearchService() as part of the system validation endpoint. When initialization keeps failing (e.g., misconfigured embedding provider), repeated calls to /system/validate will keep retrying expensive init logic and may also perform side effects (e.g., pipeline creation/handler registration attempts), potentially spamming logs and impacting the embedding provider/OpenSearch. Consider throttling retries (e.g., retry once per process lifetime or with a backoff timestamp) and reporting the last captured init error instead of re-initializing on every validation call.
| searchRepository.initializeVectorSearchService(); | |
| // Avoid repeatedly triggering expensive vector search initialization from the | |
| // validation endpoint when we already know initialization has failed. | |
| String previousInitError = searchRepository.getVectorServiceInitError(); | |
| if (previousInitError == null) { | |
| // Either initialization has not been attempted yet or it has not recorded an error, | |
| // so we allow a single retry attempt. | |
| searchRepository.initializeVectorSearchService(); | |
| } else { | |
| LOG.debug( | |
| "Skipping vector search service initialization retry during system validation " | |
| + "because a previous attempt failed with error: {}", | |
| previousInitError); | |
| } |
| public Optional<String> checkHybridSearchPipeline() { | ||
| if (vectorIndexService instanceof OpenSearchVectorService openSearchVectorService) { | ||
| return openSearchVectorService.checkHybridSearchPipeline(); | ||
| } | ||
| return Optional.empty(); | ||
| } |
There was a problem hiding this comment.
PR description mentions adding isHybridSearchPipelineAvailable() on OpenSearchVectorService/SearchRepository, but the implementation adds checkHybridSearchPipeline() returning Optional<String> instead. To avoid confusion for reviewers and future maintainers, please either update the PR description (and any referenced docs/issues) or rename/adjust the method to match the intended API contract.
Cover the new validation branches: pipeline available, pipeline missing (404), pipeline 5xx, embedding client init failure with error message, vector service init failure with embedding client OK, retry with no recovery, and Elasticsearch not supported. Make getEmbeddingsValidation package-private for testability. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Prevents stale error state after a successful retry. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| StepValidation getEmbeddingsValidation(OpenMetadataApplicationConfig applicationConfig) { | ||
| StepValidation embeddingsValidation = new StepValidation(); |
There was a problem hiding this comment.
getEmbeddingsValidation was changed from private to package-private to enable tests. To avoid widening SystemRepository's surface area, consider keeping it private and testing via the public validateSystem() path, or annotate the method with @VisibleForTesting to document the intent.
| cfg.getNaturalLanguageSearch().getEmbeddingProvider(), | ||
| embeddingClient.getDimension()); | ||
| } catch (Exception e) { | ||
| this.vectorServiceInitError = e.getMessage(); |
There was a problem hiding this comment.
vectorServiceInitError is populated with e.getMessage(), which can be null for some exceptions, resulting in unhelpful validation output (e.g., "Error: null."). Consider storing e.toString() (or message + exception class) so the health check always has actionable details.
| this.vectorServiceInitError = e.getMessage(); | |
| this.vectorServiceInitError = e.toString(); |
| } catch (Exception e) { | ||
| LOG.error("Failed to check hybrid search pipeline '{}'", HYBRID_PIPELINE_NAME, e); | ||
| return Optional.of("Failed to check hybrid search pipeline: " + e.getMessage()); | ||
| } |
There was a problem hiding this comment.
The error returned on exception uses e.getMessage(), which can be null and produce an unhelpful message. Consider including e.toString() (or exception class + message) so the pipeline check failure is always informative.
| when(appConfig.getElasticSearchConfiguration()).thenReturn(esConfig); | ||
| when(esConfig.getNaturalLanguageSearch()).thenReturn(nlpConfig); | ||
| when(nlpConfig.getEmbeddingProvider()).thenReturn("bedrock"); | ||
| when(nlpConfig.getBedrock()).thenReturn(null); |
There was a problem hiding this comment.
The test sets nlpConfig.getBedrock() to null while embeddingProvider is "bedrock". This triggers an exception path in getEmbeddingConfigurationMessage() (and an error-level log) on every test run, which can add noise and reduce test signal. Prefer mocking a minimal Bedrock config object (or using a provider whose config is stubbed) to keep tests clean and representative.
| when(nlpConfig.getBedrock()).thenReturn(null); | |
| NaturalLanguageSearchConfiguration.BedrockConfiguration bedrockConfig = | |
| mock(NaturalLanguageSearchConfiguration.BedrockConfiguration.class); | |
| when(nlpConfig.getBedrock()).thenReturn(bedrockConfig); |
- Add volatile to vectorServiceInitError for thread-safe reads from the unsynchronized getter - Skip calling initializeVectorSearchService() during validation when a previous init error is already recorded, avoiding expensive retries and log spam on every /system/validate call Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Code Review ✅ Approved 4 resolved / 4 findingsAdds hybrid-rrf pipeline check and improves embedding error reporting in status validation by storing vector service init errors and clearing them on successful retry, with unit tests covering the validation paths. All findings resolved including retry path validation and cross-backend compatibility. ✅ 4 resolved✅ Bug: Retry path skips embedding generation validation
✅ Quality: isHybridSearchPipelineAvailable returns false for non-OpenSearch
✅ Quality: Success message mentions hybrid pipeline for non-OpenSearch backends
✅ Bug: vectorServiceInitError not cleared on successful retry
OptionsDisplay: compact → Showing less information. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
|
🟡 Playwright Results — all passed (24 flaky)✅ 3443 passed · ❌ 0 failed · 🟡 24 flaky · ⏭️ 223 skipped
🟡 24 flaky test(s) (passed on retry)
How to debug locally# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip # view trace |



Summary
hybrid-rrfsearch pipeline in the system health check when embeddings are enabled. If the pipeline is missing, the status reports an actionable error ("Run a reindex to create it"), differentiating 404 from 5xx/connectivity errors.checkHybridSearchPipeline()toOpenSearchVectorServiceandSearchRepository, returningOptional<String>(empty = OK, present = error detail) to check pipeline existence via a GET request.initializeVectorSearchService()when a previous init error is already recorded.Closes https://github.com/open-metadata/ai-platform/issues/473
Test plan
checkHybridSearchPipeline(200 OK, 404, 5xx, and exception cases)SystemRepositoryembeddings validation paths (7 tests covering all branches)mvn spotless:apply- confirmed formatting is clean