-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Term query I/O concurrency optimization #20416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Term query I/O concurrency optimization #20416
Conversation
📝 WalkthroughWalkthroughA new functional interface Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ❌ 3❌ Failed checks (2 warnings, 1 inconclusive)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
server/src/main/java/org/opensearch/common/util/ThrowingSupplier.java (1)
13-15: Clean functional interface for IO-throwing suppliers.The interface is well-designed and appropriately placed. Consider adding a brief Javadoc to clarify the intended use case (e.g., deferred I/O operations that may throw):
📝 Optional: Add Javadoc
+/** + * A supplier of results that may throw an {@link IOException}. + * Useful for deferred I/O operations where checked exceptions must propagate. + * + * @param <T> the type of results supplied by this supplier + */ @FunctionalInterface public interface ThrowingSupplier<T> { T get() throws IOException; }server/src/main/java/org/opensearch/search/query/TopDocsCollectorContext.java (1)
472-488: Sentinel valuehitCount = 0may cause confusion.When
deferShortcutTotalHitCountis true,hitCountis set to0to ensure the shortcut path (else branch at line 483) is taken while deferring actual I/O. However,0is also a valid document count (e.g., for an empty index), making this sentinel usage non-obvious.Consider adding a clarifying comment or extracting a named constant:
📝 Suggested clarification
boolean deferShortcutTotalHitCount = deferShortcutTotalHitCount(hasFilterCollector, reader.hasDeletions(), query); if (deferShortcutTotalHitCount) { + // Use 0 as a placeholder to take the shortcut path; actual count computed lazily in supplier this.hitCount = 0; } else { this.hitCount = hasFilterCollector ? -1 : shortcutTotalHitCount(reader, query); }
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
server/src/main/java/org/opensearch/common/util/ThrowingSupplier.javaserver/src/main/java/org/opensearch/search/query/TopDocsCollectorContext.java
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2026-01-13T17:40:27.167Z
Learnt from: reta
Repo: opensearch-project/OpenSearch PR: 20411
File: server/src/main/java/org/opensearch/index/codec/CodecService.java:112-133
Timestamp: 2026-01-13T17:40:27.167Z
Learning: Avoid capturing or evaluating a supplier (e.g., this::defaultCodec) upfront when passing it to a registry during object construction. If registries may replace defaults during iteration (as in EnginePlugin.getAdditionalCodecs), pass the supplier itself and only resolve it at use time. This ensures dynamic behavior is preserved during initialization and prevents premature binding of defaults in codecs/registry setup. This pattern should apply to similar initialization paths in Java server code where registries may mutate defaults during construction.
Applied to files:
server/src/main/java/org/opensearch/common/util/ThrowingSupplier.javaserver/src/main/java/org/opensearch/search/query/TopDocsCollectorContext.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: gradle-check
- GitHub Check: Mend Security Check
🔇 Additional comments (5)
server/src/main/java/org/opensearch/search/query/TopDocsCollectorContext.java (5)
417-417: Type change toThrowingSupplierenables deferred I/O with proper exception propagation.The field type change from
Supplier<TotalHits>toThrowingSupplier<TotalHits>correctly allows the deferredshortcutTotalHitCountcomputation to propagateIOException.
633-633: Method signature correctly propagatesIOException.The
throws IOExceptiondeclaration is necessary sincetotalHitsSupplier.get()(line 645) can now throw. Callers inreduceWithandpostProcessalready handle this.
827-833: Deferral logic is correctly scoped toTermQuerywithout deletions.The method correctly limits deferral to
TermQuerycases where:
- No filter collector is present (implicit counts are valid)
- No deletions exist (required for accurate docFreq counts)
MatchAllDocsQueryis not deferred sincereader.numDocs()is in-memory.FieldExistsQueryis not deferred due to its more complex I/O pattern. This is appropriate.
835-852: Query unwrapping logic is correct and properly handles nested wrappers.The loop correctly removes scoring-only wrappers that don't affect document counts:
ConstantScoreQuery: Wraps query with constant scoreBoostQuery: Applies boost factorApproximateScoreQuery: OpenSearch-specific approximate scoring wrapperThe inline comment clearly explains the purpose, and the method properly unwraps nested wrapper combinations (e.g., ConstantScoreQuery wrapping BoostQuery).
520-533: Remove the CachedSupplier wrapping suggestion—it's impractical due to type incompatibility.The observation that
totalHitsSupplierlacksCachedSupplierwrapping is correct, but unliketopDocsSupplier(line 486), wrapping it would require changes toCachedSupplieritself.CachedSupplier<T>acceptsSupplier<T>, whileshortcutTotalHitCountSupplier()returnsThrowingSupplier<TotalHits>—these are incompatible types, and no caching variant forThrowingSupplierexists.Moreover,
totalHitsSupplier.get()is invoked exactly once per search execution: either vianewTopDocs()(lines 687–689, mutually exclusive paths) ornewTopDocs(TopDocs, float, Integer)(line 645). Repeated invocation is not a practical concern with the current execution flow.
| static boolean deferShortcutTotalHitCount(boolean hasFilterCollector, boolean hasDeletions, Query query) { | ||
| if (hasFilterCollector) { | ||
| return false; | ||
| } | ||
| query = removeWrappersFromQuery(query); | ||
| return (query.getClass() == TermQuery.class && hasDeletions == false); | ||
| } | ||
|
|
||
| static Query removeWrappersFromQuery(Query query) { | ||
| while (true) { | ||
| // remove wrappers that don't matter for counts | ||
| // this is necessary so that we don't only optimize match_all | ||
| // queries but also match_all queries that are nested in | ||
| // a constant_score query | ||
| if (query instanceof ConstantScoreQuery constantScoreQuery) { | ||
| query = constantScoreQuery.getQuery(); | ||
| } else if (query instanceof BoostQuery boostQuery) { | ||
| query = boostQuery.getQuery(); | ||
| } else if (query instanceof ApproximateScoreQuery approximateScoreQuery) { | ||
| query = approximateScoreQuery.getOriginalQuery(); | ||
| } else { | ||
| break; | ||
| } | ||
| } | ||
| return query; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# Search for test files related to TopDocsCollectorContext
fd -type f -name "*TopDocsCollectorContext*" -e javaRepository: opensearch-project/OpenSearch
Length of output: 243
🏁 Script executed:
# Search for test files that might test this class (using various naming patterns)
fd -type f -path "*/test*" -name "*TopDocs*" -e javaRepository: opensearch-project/OpenSearch
Length of output: 243
🏁 Script executed:
# Look at the directory structure to find test locations
find . -type d -name "test*" | head -20Repository: opensearch-project/OpenSearch
Length of output: 1406
🏁 Script executed:
# Find TopDocsCollectorContext source file first
find . -path "*/server/src/main/java/*" -name "TopDocsCollectorContext.java"Repository: opensearch-project/OpenSearch
Length of output: 152
🏁 Script executed:
# Search for any test file mentioning TopDocsCollectorContext
find . -path "*/test/*" -name "*TopDocsCollectorContext*"Repository: opensearch-project/OpenSearch
Length of output: 55
🏁 Script executed:
# Look for test files in the server module
find ./server/src/test -name "*.java" | grep -i "topdocs"Repository: opensearch-project/OpenSearch
Length of output: 163
🏁 Script executed:
# Search for references to TopDocsCollectorContext in test files
grep -r "TopDocsCollectorContext" ./server/src/test --include="*.java"Repository: opensearch-project/OpenSearch
Length of output: 960
🏁 Script executed:
# Search for the specific method names mentioned in the review
grep -r "deferShortcutTotalHitCount\|removeWrappersFromQuery\|shortcutTotalHitCountSupplier" ./server/src/test --include="*.java"Repository: opensearch-project/OpenSearch
Length of output: 55
🏁 Script executed:
# Check if TopDocsCollectorContextTests.java exists anywhere
find . -name "TopDocsCollectorContextTests.java" -o -name "TopDocsCollectorContextTest.java"Repository: opensearch-project/OpenSearch
Length of output: 55
🏁 Script executed:
# Read the source file to understand the context and methods
cat -n ./server/src/main/java/org/opensearch/search/query/TopDocsCollectorContext.java | head -100Repository: opensearch-project/OpenSearch
Length of output: 4854
🏁 Script executed:
# Get line count to understand file size
wc -l ./server/src/main/java/org/opensearch/search/query/TopDocsCollectorContext.javaRepository: opensearch-project/OpenSearch
Length of output: 156
🏁 Script executed:
# Look specifically at the methods mentioned in lines 827-852
sed -n '800,860p' ./server/src/main/java/org/opensearch/search/query/TopDocsCollectorContext.javaRepository: opensearch-project/OpenSearch
Length of output: 2835
🏁 Script executed:
# Search for where deferShortcutTotalHitCount is called
grep -n "deferShortcutTotalHitCount" ./server/src/main/java/org/opensearch/search/query/TopDocsCollectorContext.javaRepository: opensearch-project/OpenSearch
Length of output: 615
🏁 Script executed:
# Search for shortcutTotalHitCountSupplier
grep -n "shortcutTotalHitCountSupplier" ./server/src/main/java/org/opensearch/search/query/TopDocsCollectorContext.javaRepository: opensearch-project/OpenSearch
Length of output: 284
🏁 Script executed:
# Get context around the usage of these methods
sed -n '700,850p' ./server/src/main/java/org/opensearch/search/query/TopDocsCollectorContext.javaRepository: opensearch-project/OpenSearch
Length of output: 6920
🏁 Script executed:
# Check if there are any tests that specifically test the three methods
grep -r "shortcutTotalHitCount\|removeWrappersFromQuery" ./server/src/test --include="*.java" -lRepository: opensearch-project/OpenSearch
Length of output: 55
🏁 Script executed:
# Look at the shortcutTotalHitCountSupplier method definition
sed -n '520,540p' ./server/src/main/java/org/opensearch/search/query/TopDocsCollectorContext.javaRepository: opensearch-project/OpenSearch
Length of output: 987
Add unit tests for hit count optimization logic.
The helper methods deferShortcutTotalHitCount, removeWrappersFromQuery, and shortcutTotalHitCountSupplier handle critical search path optimization for hit counting but lack dedicated unit test coverage. Unit tests should verify:
deferShortcutTotalHitCountcorrectly returnstrueonly for unwrappedTermQuerywithout deletions and without filter collectorsremoveWrappersFromQueryproperly unwraps nestedConstantScoreQuery,BoostQuery, andApproximateScoreQuerycombinationsshortcutTotalHitCountSuppliercorrectly defers or computes hit counts based on the deferred flag
|
❌ Gradle check result for e93d902: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Description
[Describe what this change achieves]
Related Issues
Resolves #[Issue number to be closed when this PR is merged]
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.