Fix: Search ranking - usage/tier boosts should act as tiebreakers, not override text relevance by mohityadav766 · Pull Request #26941 · open-metadata/OpenMetadata

mohityadav766 · 2026-04-01T19:58:42Z

Fixes https://github.com/open-metadata/openmetadata-collate/issues/3468

Problem:
When searching the dataAsset composite index, entities with high usage (e.g., dashboards with 213K weekly views) were ranked far above entities with strong name matches. The
root cause was that usage boosts were additive (boostMode: sum), producing ~1847 raw points from sqrt(213214) × 4.0 alone — completely overwhelming text relevance scores of
~30-120 points. This caused dashboards to dominate search results even when tables had "location" directly in their name.

Solution:
Changed the scoring model from additive to multiplicative so that text relevance is the primary ranking signal and usage/tier only act as proportional tiebreakers:

Switched boostMode from sum to multiply — function scores now multiply the text relevance score instead of adding to it. Formula becomes: final_score = text_score ×
(baseline + tier + usage)
Added a baseline weight function of 1.0 — ensures every document has a minimum multiplier of 1.0 so assets with no tier/usage retain their full text score under
multiplicative mode.
Tightened boost values to true tiebreaker range — max combined multiplier is now ~×1.22 (22% boost) instead of the previous ~1800+ additive points:
- Tier boosts: Tier1=0.05, Tier2=0.03, Tier3=0.01
- Usage count factor: 0.002 (log1p modifier)
- Percentile rank factor: 0.0005
- Votes factor: 0.005 (log1p modifier)

Behavior after fix:

Strong name match (text=100) always beats weak match (text=50) regardless of usage: 100 × 1.0 = 100 vs 50 × 1.22 = 61
Among equally matching assets, higher usage/tier breaks the tie: 80 × 1.07 = 86 vs 80 × 1.0 = 80
Tier1 assets get a slight edge over untiered assets with the same text match

gitar-bot · 2026-04-01T20:04:39Z

Code Review ✅ Approved

Search ranking fix ensures usage/tier boosts function as tiebreakers rather than overriding text relevance, improving result quality. No issues found.

Options

Display: compact → Showing less information.

Comment with these commands to change:

`Compact`
`gitar display:verbose`

_{Was this helpful? React with 👍 / 👎 | Gitar}

Copilot

Pull request overview

Adjusts the search scoring model for the dataAsset composite search so that text relevance remains the primary ranking signal, while tier/usage/votes only provide small tie-breaker influence.

Changes:

Updated default searchSettings.json to drastically reduce tier/usage/vote boost magnitudes and switch boostMode from sum to multiply across asset configurations.
Updated both OpenSearch and Elasticsearch source builder factories to use BoostMode.MULTIPLY for the composite “all assets” configuration.
Added a baseline weight=1.0 function to prevent multiplicative scoring from collapsing scores to zero when no boost functions match.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File	Description
openmetadata-service/src/main/resources/json/data/settings/searchSettings.json	Changes boosts to “tiebreaker range” and switches per-asset `boostMode` to `multiply`.
openmetadata-service/src/main/java/org/openmetadata/service/search/opensearch/OpenSearchSourceBuilderFactory.java	Sets composite boost mode to multiply and adds baseline function score weight.
openmetadata-service/src/main/java/org/openmetadata/service/search/elasticsearch/ElasticSearchSourceBuilderFactory.java	Sets composite boost mode to multiply and adds baseline function score weight.

Copilot · 2026-04-01T20:05:04Z

...in/java/org/openmetadata/service/search/elasticsearch/ElasticSearchSourceBuilderFactory.java

  private List<es.co.elastic.clients.elasticsearch._types.query_dsl.FunctionScore>
      collectBoostFunctionsV2(AssetTypeConfiguration assetConfig) {
    List<es.co.elastic.clients.elasticsearch._types.query_dsl.FunctionScore> functions =
        new ArrayList<>();

+    // Add baseline weight of 1.0 so that assets with no tier/usage retain their text score
+    // when boostMode is multiply. Without this, function_score could be 0 and zero out the
+    // text relevance score.
+    functions.add(ElasticQueryBuilder.weightFunction(ElasticQueryBuilder.matchAllQuery(), 1.0));
+


collectBoostFunctionsV2 now always adds a match_all weight(1.0) function, which means applyFunctionScoringV2 will always wrap the base query in a function_score query even when there are no configured term/field boosts. This is an observable behavior change (different query shape) and adds overhead; it also changes scoring semantics if a non-multiply boostMode is configured in the future (e.g., sum would add +1 to all docs). Consider only adding the baseline function when boostMode is multiply (and only when there is at least one other boost function to apply), otherwise return the base query unchanged.

Copilot · 2026-04-01T20:05:05Z

...src/main/java/org/openmetadata/service/search/opensearch/OpenSearchSourceBuilderFactory.java

+    // Add baseline weight of 1.0 so that assets with no tier/usage retain their text score
+    // when boostMode is multiply. Without this, function_score could be 0 and zero out the
+    // text relevance score.
+    functions.add(
+        OpenSearchQueryBuilder.weightFunction(OpenSearchQueryBuilder.matchAllQuery(), 1.0));


collectBoostFunctionsV2 now always adds a match_all weight(1.0) function, which means applyFunctionScoringV2 will always wrap the base query in a function_score query even when there are no configured term/field boosts. This is an observable behavior change (different query shape) and adds overhead; it also changes scoring semantics if a non-multiply boostMode is configured in the future (e.g., sum would add +1 to all docs). Consider only adding the baseline function when boostMode is multiply (and only when there is at least one other boost function to apply), otherwise return the base query unchanged.

Copilot · 2026-04-01T20:05:05Z

openmetadata-service/src/main/resources/json/data/settings/searchSettings.json

      {
        "field": "usageSummary.weeklyStats.count",
-        "factor": 4.0,
-        "modifier": "sqrt",
+        "factor": 0.002,
+        "modifier": "log1p",
        "missing": 1
      },


The weekly usage boost uses missing: 1 with modifier: log1p, which produces a non-zero boost for documents that don’t have usageSummary.weeklyStats.count (log1p(1)≈0.693). That contradicts the stated intent that assets with no usage retain their original text score under multiplicative scoring. If the goal is “no usage == no boost”, set missing to 0 (the new baseline weight already prevents multiplying by 0).

Copilot · 2026-04-01T20:05:05Z

...in/java/org/openmetadata/service/search/elasticsearch/ElasticSearchSourceBuilderFactory.java

    compositeConfig.setSearchFields(new ArrayList<>(uniqueFields.values()));
    compositeConfig.setTermBoosts(allTermBoosts);
    compositeConfig.setFieldValueBoosts(allFieldValueBoosts);
    compositeConfig.setScoreMode(AssetTypeConfiguration.ScoreMode.SUM);
-    compositeConfig.setBoostMode(AssetTypeConfiguration.BoostMode.SUM);
+    compositeConfig.setBoostMode(AssetTypeConfiguration.BoostMode.MULTIPLY);



There are existing unit tests for the search source builder factories, but none assert the new scoring semantics (boostMode multiply) or the presence/placement of the baseline weight function. Adding a targeted test that inspects the generated function_score query (boost_mode and first function == match_all weight 1.0) would help prevent regressions in ranking behavior across Elasticsearch/OpenSearch implementations.

Copilot · 2026-04-01T20:05:06Z

...src/main/java/org/openmetadata/service/search/opensearch/OpenSearchSourceBuilderFactory.java

    compositeConfig.setTermBoosts(allTermBoosts);
    compositeConfig.setFieldValueBoosts(allFieldValueBoosts);
    compositeConfig.setScoreMode(AssetTypeConfiguration.ScoreMode.SUM);
-    compositeConfig.setBoostMode(AssetTypeConfiguration.BoostMode.SUM);
+    compositeConfig.setBoostMode(AssetTypeConfiguration.BoostMode.MULTIPLY);



There are existing unit tests for the search source builder factories, but none assert the new scoring semantics (boostMode multiply) or the presence/placement of the baseline weight function. Adding a targeted test that inspects the generated function_score query (boost_mode and first function == match_all weight 1.0) would help prevent regressions in ranking behavior across Elasticsearch/OpenSearch implementations.

sonarqubecloud · 2026-04-01T21:02:15Z

Quality Gate passed for 'open-metadata-ingestion'

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

github-actions · 2026-04-01T21:51:03Z

🔴 Playwright Results — 2 failure(s), 25 flaky

✅ 3440 passed · ❌ 2 failed · 🟡 25 flaky · ⏭️ 223 skipped

Shard	Passed	Failed	Flaky	Skipped
🔴 Shard 1	450	1	4	2
🟡 Shard 2	612	0	7	32
🟡 Shard 3	617	0	3	27
🟡 Shard 4	619	0	5	47
🟡 Shard 5	586	0	1	67
🔴 Shard 6	556	1	5	48

Genuine Failures (failed on all attempts)

❌ Pages/SearchSettings.spec.ts › Restore default search settings (shard 1)

Error: �[2mexpect(�[22m�[31mreceived�[39m�[2m).�[22mtoEqual�[2m(�[22m�[32mexpected�[39m�[2m) // deep equality�[22m

�[32m- Expected  - 3�[39m
�[31m+ Received  + 3�[39m

�[33m@@ -12,20 +12,20 @@�[39m
�[2m        "script": "",�[22m
�[2m        "type": "terms",�[22m
�[2m      },�[22m
�[2m    ],�[22m
�[2m    "assetType": "table",�[22m
�[32m-   "boostMode": "sum",�[39m
�[31m+   "boostMode": "multiply",�[39m
�[2m    "fieldValueBoosts": Array [�[22m
�[2m      Object {�[22m
�[32m-       "factor": 3,�[39m
�[31m+       "factor": 0.002,�[39m
�[2m        "field": "usageSummary.monthlyStats.count",�[22m
�[2m        "missing": 0,�[22m
�[2m        "modifier": "log1p",�[22m
�[2m      },�[22m
�[2m      Object {�[22m
�[32m-       "factor": 1,�[39m
�[31m+       "factor": 0.0005,�[39m
�[2m        "field": "usageSummary.monthlyStats.percentileRank",�[22m
�[2m        "missing": 0,�[22m
�[2m        "modifier": "none",�[22m
�[2m      },�[22m
�[2m    ],�[22m

❌ Pages/Glossary.spec.ts › Add and Remove Assets (shard 6)

�[31mTest timeout of 180000ms exceeded.�[39m

🟡 25 flaky test(s) (passed on retry)

Features/CustomizeDetailPage.spec.ts › Table - customization should work (shard 1, 1 retry)
Flow/Metric.spec.ts › Verify Related Metrics Update (shard 1, 1 retry)
Flow/Tour.spec.ts › Tour should work from URL directly (shard 1, 1 retry)
Pages/AuditLogs.spec.ts › should handle audit logs access for non-admin users (shard 1, 1 retry)
Features/BulkEditEntity.spec.ts › Glossary (shard 2, 1 retry)
Features/DataProductRenameConsolidation.spec.ts › Multiple rename + update cycles - assets should be preserved (shard 2, 1 retry)
Features/DataQuality/BundleSuiteBulkOperations.spec.ts › Bulk selection operations (shard 2, 1 retry)
Features/DataQuality/ColumnLevelTests.spec.ts › Column Values Sum To Be Between (shard 2, 1 retry)
Features/DataQuality/DataQuality.spec.ts › Table test case (shard 2, 1 retry)
Features/DataQuality/DataQualityPermissions.spec.ts › User with TEST_CASE.VIEW_BASIC can view test case CONTENT details in UI (shard 2, 1 retry)
Features/Glossary/GlossaryAdvancedOperations.spec.ts › should remove individual reference from term (shard 2, 1 retry)
Features/Permissions/GlossaryPermissions.spec.ts › Team-based permissions work correctly (shard 3, 1 retry)
Flow/ExploreDiscovery.spec.ts › Should display deleted assets when showDeleted is checked and deleted is not present in queryFilter (shard 3, 1 retry)
Flow/NotificationAlerts.spec.ts › Conversation source alert (shard 3, 1 retry)
Pages/Customproperties-part2.spec.ts › entityReferenceList shows item count, scrollable list, no expand toggle (shard 4, 1 retry)
Pages/Domains.spec.ts › Rename domain with deeply nested subdomains (3+ levels) verifies FQN propagation (shard 4, 1 retry)
Pages/Entity.spec.ts › Delete Spreadsheet (shard 4, 1 retry)
Pages/Entity.spec.ts › Set & Update table-cp, hyperlink-cp, string, integer, markdown, number, duration, email, enum, sqlQuery, timestamp, entityReference, entityReferenceList, timeInterval, time-cp, date-cp, dateTime-cp Custom Property (shard 4, 1 retry)
Pages/Entity.spec.ts › Tier Add, Update and Remove (shard 4, 1 retry)
Pages/EntityDataConsumer.spec.ts › Tag Add, Update and Remove (shard 5, 1 retry)
Pages/ExploreTree.spec.ts › Verify Database and Database Schema available in explore tree (shard 6, 1 retry)
Pages/InputOutputPorts.spec.ts › Lineage section collapse/expand (shard 6, 1 retry)
Pages/Users.spec.ts › Permissions for table details page for Data Consumer (shard 6, 1 retry)
Pages/Users.spec.ts › Check permissions for Data Steward (shard 6, 1 retry)
VersionPages/EntityVersionPages.spec.ts › Directory (shard 6, 1 retry)

📦 Download artifacts

How to debug locally

# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

Udpate Rankings

4ad61e1

Copilot AI review requested due to automatic review settings April 1, 2026 19:58

github-actions bot added backend safe to test Add this label to run secure Github workflows on PRs labels Apr 1, 2026

Copilot started reviewing on behalf of mohityadav766 April 1, 2026 19:59 View session

mohityadav766 self-assigned this Apr 1, 2026

mohityadav766 added this to Shipping Apr 1, 2026

mohityadav766 moved this to In Review / QA 👀 in Shipping Apr 1, 2026

Copilot AI reviewed Apr 1, 2026

View reviewed changes

mohityadav766 temporarily deployed to test April 1, 2026 20:09 — with GitHub Actions Inactive

mohityadav766 had a problem deploying to test April 1, 2026 20:09 — with GitHub Actions Failure

mohityadav766 temporarily deployed to test April 1, 2026 20:09 — with GitHub Actions Inactive

mohityadav766 had a problem deploying to test April 1, 2026 20:09 — with GitHub Actions Failure

mohityadav766 temporarily deployed to test April 1, 2026 20:09 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: Search ranking - usage/tier boosts should act as tiebreakers, not override text relevance#26941

Fix: Search ranking - usage/tier boosts should act as tiebreakers, not override text relevance#26941
mohityadav766 wants to merge 1 commit intomainfrom
fix-search-rankings

mohityadav766 commented Apr 1, 2026

Uh oh!

gitar-bot bot commented Apr 1, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 1, 2026

Uh oh!

Copilot AI Apr 1, 2026

Uh oh!

Copilot AI Apr 1, 2026

Uh oh!

Copilot AI Apr 1, 2026

Uh oh!

Copilot AI Apr 1, 2026

Uh oh!

sonarqubecloud bot commented Apr 1, 2026

Uh oh!

github-actions bot commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mohityadav766 commented Apr 1, 2026

Uh oh!

gitar-bot bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud bot commented Apr 1, 2026

Quality Gate passed for 'open-metadata-ingestion'

Uh oh!

github-actions bot commented Apr 1, 2026

🔴 Playwright Results — 2 failure(s), 25 flaky

Genuine Failures (failed on all attempts)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gitar-bot bot commented Apr 1, 2026 •

edited

Loading