Skip to content

Conversation

@musabafzal
Copy link

@musabafzal musabafzal commented Jan 13, 2026

Description

[Describe what this change achieves]

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Summary by CodeRabbit

  • Chores
    • Enhanced internal exception handling in query processing to support deferred total hits computation and improved error propagation during supplier operations.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 13, 2026

📝 Walkthrough

Walkthrough

A new functional interface ThrowingSupplier<T> is introduced to support suppliers that propagate checked exceptions. TopDocsCollectorContext is refactored to use this interface for total hits suppliers, with helper methods added to manage deferred total hit count computation based on query and reader properties.

Changes

Cohort / File(s) Summary
New Functional Interface
server/src/main/java/org/opensearch/common/util/ThrowingSupplier.java
Single-method functional interface declaring T get() throws IOException to enable lambda usage for exception-throwing suppliers
Total Hits Supplier Refactoring
server/src/main/java/org/opensearch/search/query/TopDocsCollectorContext.java
Replaces Supplier<TotalHits> with ThrowingSupplier<TotalHits> for totalHitsSupplier. Adds three helper methods: deferShortcutTotalHitCount() (determines deferral logic), removeWrappersFromQuery() (unwraps query wrappers for accurate counting), and shortcutTotalHitCountSupplier() (returns deferred/computed hit count supplier). Updates newTopDocs() method signatures to throw IOException. Integrates deferral logic into SimpleTopDocsCollectorContext initialization

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A supplier that throws with grace,
No checked exception finds its hiding place,
Through query wrappers we unwrap and soar,
Deferred totals, now we can explore!
Lambda magic, IOException flows—
Where ThrowingSupplier goes, clarity grows.

🚥 Pre-merge checks | ❌ 3
❌ Failed checks (2 warnings, 1 inconclusive)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is entirely the default template with placeholder text; the 'Description' section is not filled in, the 'Related Issues' number is missing, and no actual change details, implementation rationale, or testing information is provided. Fill in the Description section with concrete details about what the PR achieves, provide the related issue number, and indicate testing status in the checklist.
Docstring Coverage ⚠️ Warning Docstring coverage is 22.22% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'Term query I/O concurrency optimization' is specific and clearly describes a performance optimization, but the actual changes show implementation of a ThrowingSupplier interface and related changes to TopDocsCollectorContext, which are infrastructure changes supporting the optimization rather than the optimization itself. Clarify whether the title accurately reflects the primary change—are the new utility interfaces the main focus, or is this truly about term query optimization? Consider a more specific title if the utility layer is the focus.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
server/src/main/java/org/opensearch/common/util/ThrowingSupplier.java (1)

13-15: Clean functional interface for IO-throwing suppliers.

The interface is well-designed and appropriately placed. Consider adding a brief Javadoc to clarify the intended use case (e.g., deferred I/O operations that may throw):

📝 Optional: Add Javadoc
+/**
+ * A supplier of results that may throw an {@link IOException}.
+ * Useful for deferred I/O operations where checked exceptions must propagate.
+ *
+ * @param <T> the type of results supplied by this supplier
+ */
 @FunctionalInterface
 public interface ThrowingSupplier<T> {
     T get() throws IOException;
 }
server/src/main/java/org/opensearch/search/query/TopDocsCollectorContext.java (1)

472-488: Sentinel value hitCount = 0 may cause confusion.

When deferShortcutTotalHitCount is true, hitCount is set to 0 to ensure the shortcut path (else branch at line 483) is taken while deferring actual I/O. However, 0 is also a valid document count (e.g., for an empty index), making this sentinel usage non-obvious.

Consider adding a clarifying comment or extracting a named constant:

📝 Suggested clarification
 boolean deferShortcutTotalHitCount = deferShortcutTotalHitCount(hasFilterCollector, reader.hasDeletions(), query);
 if (deferShortcutTotalHitCount) {
+    // Use 0 as a placeholder to take the shortcut path; actual count computed lazily in supplier
     this.hitCount = 0;
 } else {
     this.hitCount = hasFilterCollector ? -1 : shortcutTotalHitCount(reader, query);
 }
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ba420b3 and e93d902.

📒 Files selected for processing (2)
  • server/src/main/java/org/opensearch/common/util/ThrowingSupplier.java
  • server/src/main/java/org/opensearch/search/query/TopDocsCollectorContext.java
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2026-01-13T17:40:27.167Z
Learnt from: reta
Repo: opensearch-project/OpenSearch PR: 20411
File: server/src/main/java/org/opensearch/index/codec/CodecService.java:112-133
Timestamp: 2026-01-13T17:40:27.167Z
Learning: Avoid capturing or evaluating a supplier (e.g., this::defaultCodec) upfront when passing it to a registry during object construction. If registries may replace defaults during iteration (as in EnginePlugin.getAdditionalCodecs), pass the supplier itself and only resolve it at use time. This ensures dynamic behavior is preserved during initialization and prevents premature binding of defaults in codecs/registry setup. This pattern should apply to similar initialization paths in Java server code where registries may mutate defaults during construction.

Applied to files:

  • server/src/main/java/org/opensearch/common/util/ThrowingSupplier.java
  • server/src/main/java/org/opensearch/search/query/TopDocsCollectorContext.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: gradle-check
  • GitHub Check: Mend Security Check
🔇 Additional comments (5)
server/src/main/java/org/opensearch/search/query/TopDocsCollectorContext.java (5)

417-417: Type change to ThrowingSupplier enables deferred I/O with proper exception propagation.

The field type change from Supplier<TotalHits> to ThrowingSupplier<TotalHits> correctly allows the deferred shortcutTotalHitCount computation to propagate IOException.


633-633: Method signature correctly propagates IOException.

The throws IOException declaration is necessary since totalHitsSupplier.get() (line 645) can now throw. Callers in reduceWith and postProcess already handle this.


827-833: Deferral logic is correctly scoped to TermQuery without deletions.

The method correctly limits deferral to TermQuery cases where:

  1. No filter collector is present (implicit counts are valid)
  2. No deletions exist (required for accurate docFreq counts)

MatchAllDocsQuery is not deferred since reader.numDocs() is in-memory. FieldExistsQuery is not deferred due to its more complex I/O pattern. This is appropriate.


835-852: Query unwrapping logic is correct and properly handles nested wrappers.

The loop correctly removes scoring-only wrappers that don't affect document counts:

  • ConstantScoreQuery: Wraps query with constant score
  • BoostQuery: Applies boost factor
  • ApproximateScoreQuery: OpenSearch-specific approximate scoring wrapper

The inline comment clearly explains the purpose, and the method properly unwraps nested wrapper combinations (e.g., ConstantScoreQuery wrapping BoostQuery).


520-533: Remove the CachedSupplier wrapping suggestion—it's impractical due to type incompatibility.

The observation that totalHitsSupplier lacks CachedSupplier wrapping is correct, but unlike topDocsSupplier (line 486), wrapping it would require changes to CachedSupplier itself. CachedSupplier<T> accepts Supplier<T>, while shortcutTotalHitCountSupplier() returns ThrowingSupplier<TotalHits>—these are incompatible types, and no caching variant for ThrowingSupplier exists.

Moreover, totalHitsSupplier.get() is invoked exactly once per search execution: either via newTopDocs() (lines 687–689, mutually exclusive paths) or newTopDocs(TopDocs, float, Integer) (line 645). Repeated invocation is not a practical concern with the current execution flow.

Comment on lines +827 to +852
static boolean deferShortcutTotalHitCount(boolean hasFilterCollector, boolean hasDeletions, Query query) {
if (hasFilterCollector) {
return false;
}
query = removeWrappersFromQuery(query);
return (query.getClass() == TermQuery.class && hasDeletions == false);
}

static Query removeWrappersFromQuery(Query query) {
while (true) {
// remove wrappers that don't matter for counts
// this is necessary so that we don't only optimize match_all
// queries but also match_all queries that are nested in
// a constant_score query
if (query instanceof ConstantScoreQuery constantScoreQuery) {
query = constantScoreQuery.getQuery();
} else if (query instanceof BoostQuery boostQuery) {
query = boostQuery.getQuery();
} else if (query instanceof ApproximateScoreQuery approximateScoreQuery) {
query = approximateScoreQuery.getOriginalQuery();
} else {
break;
}
}
return query;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# Search for test files related to TopDocsCollectorContext
fd -type f -name "*TopDocsCollectorContext*" -e java

Repository: opensearch-project/OpenSearch

Length of output: 243


🏁 Script executed:

# Search for test files that might test this class (using various naming patterns)
fd -type f -path "*/test*" -name "*TopDocs*" -e java

Repository: opensearch-project/OpenSearch

Length of output: 243


🏁 Script executed:

# Look at the directory structure to find test locations
find . -type d -name "test*" | head -20

Repository: opensearch-project/OpenSearch

Length of output: 1406


🏁 Script executed:

# Find TopDocsCollectorContext source file first
find . -path "*/server/src/main/java/*" -name "TopDocsCollectorContext.java"

Repository: opensearch-project/OpenSearch

Length of output: 152


🏁 Script executed:

# Search for any test file mentioning TopDocsCollectorContext
find . -path "*/test/*" -name "*TopDocsCollectorContext*"

Repository: opensearch-project/OpenSearch

Length of output: 55


🏁 Script executed:

# Look for test files in the server module
find ./server/src/test -name "*.java" | grep -i "topdocs"

Repository: opensearch-project/OpenSearch

Length of output: 163


🏁 Script executed:

# Search for references to TopDocsCollectorContext in test files
grep -r "TopDocsCollectorContext" ./server/src/test --include="*.java"

Repository: opensearch-project/OpenSearch

Length of output: 960


🏁 Script executed:

# Search for the specific method names mentioned in the review
grep -r "deferShortcutTotalHitCount\|removeWrappersFromQuery\|shortcutTotalHitCountSupplier" ./server/src/test --include="*.java"

Repository: opensearch-project/OpenSearch

Length of output: 55


🏁 Script executed:

# Check if TopDocsCollectorContextTests.java exists anywhere
find . -name "TopDocsCollectorContextTests.java" -o -name "TopDocsCollectorContextTest.java"

Repository: opensearch-project/OpenSearch

Length of output: 55


🏁 Script executed:

# Read the source file to understand the context and methods
cat -n ./server/src/main/java/org/opensearch/search/query/TopDocsCollectorContext.java | head -100

Repository: opensearch-project/OpenSearch

Length of output: 4854


🏁 Script executed:

# Get line count to understand file size
wc -l ./server/src/main/java/org/opensearch/search/query/TopDocsCollectorContext.java

Repository: opensearch-project/OpenSearch

Length of output: 156


🏁 Script executed:

# Look specifically at the methods mentioned in lines 827-852
sed -n '800,860p' ./server/src/main/java/org/opensearch/search/query/TopDocsCollectorContext.java

Repository: opensearch-project/OpenSearch

Length of output: 2835


🏁 Script executed:

# Search for where deferShortcutTotalHitCount is called
grep -n "deferShortcutTotalHitCount" ./server/src/main/java/org/opensearch/search/query/TopDocsCollectorContext.java

Repository: opensearch-project/OpenSearch

Length of output: 615


🏁 Script executed:

# Search for shortcutTotalHitCountSupplier
grep -n "shortcutTotalHitCountSupplier" ./server/src/main/java/org/opensearch/search/query/TopDocsCollectorContext.java

Repository: opensearch-project/OpenSearch

Length of output: 284


🏁 Script executed:

# Get context around the usage of these methods
sed -n '700,850p' ./server/src/main/java/org/opensearch/search/query/TopDocsCollectorContext.java

Repository: opensearch-project/OpenSearch

Length of output: 6920


🏁 Script executed:

# Check if there are any tests that specifically test the three methods
grep -r "shortcutTotalHitCount\|removeWrappersFromQuery" ./server/src/test --include="*.java" -l

Repository: opensearch-project/OpenSearch

Length of output: 55


🏁 Script executed:

# Look at the shortcutTotalHitCountSupplier method definition
sed -n '520,540p' ./server/src/main/java/org/opensearch/search/query/TopDocsCollectorContext.java

Repository: opensearch-project/OpenSearch

Length of output: 987


Add unit tests for hit count optimization logic.

The helper methods deferShortcutTotalHitCount, removeWrappersFromQuery, and shortcutTotalHitCountSupplier handle critical search path optimization for hit counting but lack dedicated unit test coverage. Unit tests should verify:

  1. deferShortcutTotalHitCount correctly returns true only for unwrapped TermQuery without deletions and without filter collectors
  2. removeWrappersFromQuery properly unwraps nested ConstantScoreQuery, BoostQuery, and ApproximateScoreQuery combinations
  3. shortcutTotalHitCountSupplier correctly defers or computes hit counts based on the deferred flag

@github-actions
Copy link
Contributor

❌ Gradle check result for e93d902: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant