-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Streaming agg planning fallback enhancement #20510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the
📝 WalkthroughWalkthroughThis PR refines streaming aggregation optimization for Terms aggregations with ordinal-based string sources by adding specialized cost estimation logic that analyzes segment metadata to determine when streaming should be disabled based on cardinality and match-all query detection, along with corresponding integration and unit tests. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
- low cardinality string term - match all + not match segment with deleted docs Signed-off-by: bowenlan-amzn <[email protected]>
652c6e4 to
be5a04e
Compare
...er/src/test/java/org/opensearch/search/aggregations/FactoryStreamingCostEstimationTests.java
Outdated
Show resolved
Hide resolved
|
❌ Gradle check result for be5a04e: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
when explicit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In
`@server/src/main/java/org/opensearch/search/aggregations/bucket/terms/TermsAggregatorFactory.java`:
- Around line 717-728: The streaming-cost logic in TermsAggregatorFactory
(estimateStreamingCost / the block handling valuesSource) ignores an explicit
executionHint, causing streamable costs despite user-specified hints; update the
method to short-circuit to StreamingCostMetrics.nonStreamable() when an
executionHint is present (e.g., executionHint != null &&
!executionHint.isEmpty() or matches known hints like "map"/"global_ordinals")
before the WithOrdinals / Numeric checks, so honor the execution_hint and avoid
returning streamable costs when the user requested a specific execution mode.
In
`@server/src/test/java/org/opensearch/search/aggregations/FactoryStreamingCostEstimationTests.java`:
- Around line 684-704: The BooleanQuery wrapper in createAggregatorFactory can
rewrite back to MatchAllDocsQuery; replace the nonMatchAllQuery construction so
it cannot be rewritten (e.g., use a TermQuery on a stable constant/indexed field
or a ConstantScoreQuery wrapping a TermQuery) instead of new
BooleanQuery.Builder().add(new MatchAllDocsQuery(), MUST).build(), or after
creating nonMatchAllQuery
assert/searchContext.getQueryShardContext().rewrite(...) does not produce a
MatchAllDocsQuery to ensure the query stays non-match-all; update references to
nonMatchAllQuery and the createAggregatorFactory method accordingly.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
plugins/arrow-flight-rpc/src/internalClusterTest/java/org/opensearch/streaming/aggregation/SubAggregationIT.javaserver/src/main/java/org/opensearch/search/aggregations/bucket/terms/TermsAggregatorFactory.javaserver/src/test/java/org/opensearch/search/aggregations/FactoryStreamingCostEstimationTests.java
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2026-01-13T17:40:27.167Z
Learnt from: reta
Repo: opensearch-project/OpenSearch PR: 20411
File: server/src/main/java/org/opensearch/index/codec/CodecService.java:112-133
Timestamp: 2026-01-13T17:40:27.167Z
Learning: Avoid capturing or evaluating a supplier (e.g., this::defaultCodec) upfront when passing it to a registry during object construction. If registries may replace defaults during iteration (as in EnginePlugin.getAdditionalCodecs), pass the supplier itself and only resolve it at use time. This ensures dynamic behavior is preserved during initialization and prevents premature binding of defaults in codecs/registry setup. This pattern should apply to similar initialization paths in Java server code where registries may mutate defaults during construction.
Applied to files:
server/src/main/java/org/opensearch/search/aggregations/bucket/terms/TermsAggregatorFactory.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: gradle-check
🔇 Additional comments (10)
plugins/arrow-flight-rpc/src/internalClusterTest/java/org/opensearch/streaming/aggregation/SubAggregationIT.java (2)
50-50: Import update looks fine.Static import aligns with the new query usage in the tests.
294-298: Verify thatexistsQuery("field1")does not rewrite toMatchAllDocsQueryfor fully populated fields.The test data shows all 90 documents contain
field1, making it fully populated. Lucene'sFieldExistsQuerycan optimize toMatchAllDocsQueryin such cases (similar to the_idfield behavior seen inIdFieldMapper). If the query rewrites toMatchAllDocsQuery, it may disable streaming and make this assertion brittle. Confirm the rewritten query remainsFieldExistsQuery(or equivalent non-match-all); if it collapses, consider using a query that cannot rewrite to match-all, such as a range query on the field or a term query with a known value.server/src/main/java/org/opensearch/search/aggregations/bucket/terms/TermsAggregatorFactory.java (3)
35-40: Imports for the new per‑segment logic look good.
69-70: No review comment for this minor import tweak.
731-793: Helper logic is clear and targeted.The low‑cardinality and clean‑segment checks are easy to follow and scoped to ordinals only.
server/src/test/java/org/opensearch/search/aggregations/FactoryStreamingCostEstimationTests.java (5)
18-24: No issues in the added imports.
43-45: No review comment for this minor static import tweak.
529-665: Great coverage of streaming fallback scenarios.These tests cover low cardinality, match‑all with clean/deleted segments, and non‑match‑all paths well.
709-719: Helper for explicit query injection looks good.
721-732: Helper for min‑bucket‑count control is clear.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| if (valuesSource instanceof WithOrdinals ordinalsValuesSource) { | ||
| if (shouldDisableStreamingForOrdinals(searchContext, ordinalsValuesSource)) { | ||
| return StreamingCostMetrics.nonStreamable(); | ||
| } | ||
| return new StreamingCostMetrics(true, segmentTopN); | ||
| } | ||
|
|
||
| if (valuesSource instanceof ValuesSource.Numeric) { | ||
| return new StreamingCostMetrics(true, segmentTopN); | ||
| } | ||
|
|
||
| return StreamingCostMetrics.nonStreamable(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honor explicit execution_hint in streaming cost estimation.
estimateStreamingCost can still return streamable even when executionHint is explicitly set (e.g., map/global_ordinals), but the factory will then force a non‑streaming execution mode. This can make the planner pick streaming despite user intent. Consider short‑circuiting to non‑streamable when a hint is present.
🛠️ Proposed fix
public StreamingCostMetrics estimateStreamingCost(SearchContext searchContext) {
ValuesSource valuesSource = config.getValuesSource();
int segmentTopN = computeSegmentTopN(searchContext, bucketCountThresholds, order);
+
+ if (executionHint != null) {
+ return StreamingCostMetrics.nonStreamable();
+ }🤖 Prompt for AI Agents
In
`@server/src/main/java/org/opensearch/search/aggregations/bucket/terms/TermsAggregatorFactory.java`
around lines 717 - 728, The streaming-cost logic in TermsAggregatorFactory
(estimateStreamingCost / the block handling valuesSource) ignores an explicit
executionHint, causing streamable costs despite user-specified hints; update the
method to short-circuit to StreamingCostMetrics.nonStreamable() when an
executionHint is present (e.g., executionHint != null &&
!executionHint.isEmpty() or matches known hints like "map"/"global_ordinals")
before the WithOrdinals / Numeric checks, so honor the execution_hint and avoid
returning streamable costs when the user requested a specific execution mode.
...er/src/test/java/org/opensearch/search/aggregations/FactoryStreamingCostEstimationTests.java
Show resolved
Hide resolved
|
❌ Gradle check result for 297a70e: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
297a70e to
bf5b7e6
Compare
|
❌ Gradle check result for bf5b7e6: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: bowenlan-amzn <[email protected]>
bf5b7e6 to
bcba3d5
Compare
|
❌ Gradle check result for bcba3d5: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Description
Streaming agg planning fallback when
query in the search context seems won't be null and if so it will be parsed to match all
OpenSearch/server/src/main/java/org/opensearch/search/DefaultSearchContext.java
Line 448 in 2d7e8d9
Related Issues
Resolves #[Issue number to be closed when this PR is merged]
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.