[YSQL] Fix pagination over hash-sharded tables with yb_hash_code() = const#30651
[YSQL] Fix pagination over hash-sharded tables with yb_hash_code() = const#30651kirs wants to merge 2 commits intoyugabyte:masterfrom
Conversation
|
Important Installation incomplete: to start using Gemini Code Assist, please ask the organization owner(s) to visit the Gemini Code Assist Admin Console and sign the Terms of Services. |
✅ Deploy Preview for infallible-bardeen-164bc9 ready!Built without sensitive environment variables
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
I manually verified this on a VM against a built branch binary: DROP TABLE IF EXISTS hk_pagination;
CREATE TABLE hk_pagination(
h int,
r1 int,
r2 int,
v text,
PRIMARY KEY(h HASH, r1 ASC, r2 ASC)
);
INSERT INTO hk_pagination
SELECT i % 5, i, i * 10, 'val' || i
FROM generate_series(1, 50) AS i;
SELECT h, yb_hash_code(h), count(*)
FROM hk_pagination
GROUP BY h, yb_hash_code(h)
ORDER BY h;Observed buckets: Using bucket EXPLAIN (COSTS OFF)
SELECT *
FROM hk_pagination
WHERE yb_hash_code(h) = 4624
ORDER BY h, r1, r2
LIMIT 10;EXPLAIN (COSTS OFF)
SELECT *
FROM hk_pagination
WHERE yb_hash_code(h) = 4624
AND (h, r1) > (1, 10)
ORDER BY h, r1, r2
LIMIT 10;EXPLAIN (COSTS OFF)
SELECT *
FROM hk_pagination
WHERE yb_hash_code(h) = 4624
AND (h, r1, r2) > (1, 11, 110)
ORDER BY h, r1, r2
LIMIT 10;Negative checks also look right: EXPLAIN (COSTS OFF)
SELECT *
FROM hk_pagination
WHERE yb_hash_code(h) < 4624
ORDER BY h, r1, r2
LIMIT 10;EXPLAIN (COSTS OFF)
SELECT *
FROM hk_pagination
WHERE yb_hash_code(h) = 4624
ORDER BY r1, r2
LIMIT 10;EXPLAIN (COSTS OFF)
SELECT *
FROM hk_pagination
WHERE yb_hash_code(h) < 4624
AND (h, r1) > (1, 10)
LIMIT 10;I also hit an execution-time crash on the initial version of the patch (before the follow-up After applying the small |
| * IS NOT NULL on partition columns, and for primary-key hash | ||
| * columns the null check is redundant anyway. | ||
| */ | ||
| is_not_null[att_idx] |= (subkey_index == 0 && |
There was a problem hiding this comment.
Manual repro context for this guard: after allowing ROW-comparison pushdown on hash columns in the pinned-bucket case, planning was fine but actual execution crashed in pggate. The RC bind path still synthesizes IS NOT NULL on the first RC column; for queries like WHERE yb_hash_code(h) = 4624 AND (h, r1, r2) > (...) ORDER BY h, r1, r2, that first column is now the hash/partition key h. Pggate rejects BindColumnCondIsNotNull on partition columns (pg_dml_read.cc:765, CHECK(!col.is_partition())), so skipping the synthetic null-check here for hash columns fixes the crash while keeping the row-bound pushdown.
There was a problem hiding this comment.
commit cd0d858 never explains why IS NOT NULL was not supported for hash. Would have been good to know that.
There was a problem hiding this comment.
Do you see this as a requirement for this PR or as something that can be done as a follow-up? Sorry, I have very little context on cd0d858.
|
Important Installation incomplete: to start using Gemini Code Assist, please ask the organization owner(s) to visit the Gemini Code Assist Admin Console and sign the Terms of Services. |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Free Tier Details
Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.
To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
| return true; | ||
| } | ||
| return false; | ||
| } |
There was a problem hiding this comment.
Executor hash-code check omits arity validation present in planner
Low Severity
yb_has_hash_code_equality_scan_key returns true for any BTEqualStrategyNumber hash-code scan key without validating that the yb_hash_code call's arity matches index->rd_index->indnkeyatts hash column count. The planner-side counterparts (yb_has_hash_code_equality_qual) both enforce an arity check (list_length(funcexpr->args) != index->nhashcolumns). This inconsistency means the executor guard is weaker than the planner guard, reducing defense-in-depth for multi-column hash keys.
|
You have used all of your free Bugbot PR reviews. To receive reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial. |
mtakahar
left a comment
There was a problem hiding this comment.
Great work, and thank you so much for putting this together!
I've left a couple of inline comments, one correctness issue and another one, follow-up on what cursor bot pointed out (code duplication).
|
|
||
| if (list_length(funcexpr->args) != index->nhashcolumns) | ||
| continue; | ||
|
|
There was a problem hiding this comment.
Since index->rel->baserestrictinfo contains all the predicates on the table, not only the ones relevant to the index, I think the yb_hash_code arguments need to match the ones in the expression based index key item expression. Otherwise the code can mix up another index on yb_hash_code on a different column if such index exists.
Example:
CREATE TABLE hk_pagination(
h int,
r1 int,
r2 int,
v text,
PRIMARY KEY(h HASH, r1 ASC, r2 ASC)
);
INSERT INTO hk_pagination SELECT i % 5, i, i * 10, 'val' || i FROM generate_series(1, 20) i;
create index hk_pagination_r1_r2_idx on hk_pagination (r1 hash, r2 asc);
/*+ IndexScan(hk_pagination hk_pagination_r1_r2_idx) */
EXPLAIN (COSTS OFF) SELECT * FROM hk_pagination
WHERE yb_hash_code(h) = 49348
ORDER BY r1, r2
LIMIT 2;
QUERY PLAN
-----------------------------------------------------------------
Limit
-> Index Scan using hk_pagination_r1_r2_idx on hk_pagination
Filter: (yb_hash_code(h) = 49348)
(3 rows)
/*+ IndexScan(hk_pagination hk_pagination_r1_r2_idx) */
SELECT * FROM hk_pagination
WHERE yb_hash_code(h) = 49348
ORDER BY r1, r2
LIMIT 2;
h | r1 | r2 | v
---+----+-----+-------
2 | 12 | 120 | val12
2 | 7 | 70 | val7
(2 rows)
/*+ SeqScan(hk_pagination) */
EXPLAIN (COSTS OFF) SELECT * FROM hk_pagination
WHERE yb_hash_code(h) = 49348
ORDER BY r1, r2
LIMIT 2;
QUERY PLAN
-------------------------------------------------
Limit
-> Sort
Sort Key: r1, r2
-> Seq Scan on hk_pagination
Filter: (yb_hash_code(h) = 49348)
(5 rows)
/*+ SeqScan(hk_pagination) */
SELECT * FROM hk_pagination
WHERE yb_hash_code(h) = 49348
ORDER BY r1, r2
LIMIT 2;
h | r1 | r2 | v
---+----+----+------
2 | 2 | 20 | val2
2 | 7 | 70 | val7
(2 rows)
|
Based on a run off of commit 321541d, 4 test failures: TestPgRegressDml:
TestPgRegressIndex:
TestPgRegressPlanner:
TestPgRegressProc:
|
50b7e14 to
1634cdc
Compare
|
You have used all of your free Bugbot PR reviews. To receive reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial. |
1634cdc to
1909a29
Compare
|
You have used all of your free Bugbot PR reviews. To receive reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial. |
| bool has_hash_code_eq = yb_has_hash_code_equality_scan_key(ybScan); | ||
| int hash_cols_covered = 0; | ||
|
|
||
| for (int i = 0; i < index->rd_index->indnkeyatts; i++) |
There was a problem hiding this comment.
nit: really tiny optimization i++ to ++i but maybe it just matches the rest of the code.
| if (opexpr->opno != Int4EqualOperator) | ||
| continue; | ||
|
|
||
| if (list_length(opexpr->args) != 2) |
There was a problem hiding this comment.
nit: ideally this should be like an assert in debug build.
jasonyb
left a comment
There was a problem hiding this comment.
Don't want to expand the scope too much, but do you mind checking what happens when ybctid = const? That case is very similar to yb_hash_code() = const.
| break; /* no good, volatile comparison value */ | ||
|
|
||
| /* | ||
| * The Var side can match any key column of the index. |
There was a problem hiding this comment.
note: it appears this planner code doesn't care about the order of the columns in the row compare, but I believe (could be wrong) the executor yb_scan.c code does care and does not support pushing down the filter in out-of-order cases (e.g. key (r1, r2) and row compare (r2, r1) op (a, b)).
| */ | ||
| if (!indexcol_is_bool_constant_for_query(root, index, i) || | ||
| yb_is_hash_column) | ||
| (yb_is_hash_column && !yb_hash_bucket_pinned)) |
There was a problem hiding this comment.
note: I don't think this whole clause is unnecessary, but better to take that investigation separately
| for (int i = 0; i < index->rd_index->indnkeyatts; i++) | ||
| { | ||
| if (!(index->rd_indoption[i] & INDOPTION_HASH)) | ||
| continue; |
There was a problem hiding this comment.
could break as it is not possible for there to be any more hash columns after the first range column. could put this condition in the for loop
| * Set the first column in this RC to not null, except for | ||
| * partition/hash columns. Pggate does not allow binding | ||
| * IS NOT NULL on partition columns, and for primary-key hash | ||
| * columns the null check is redundant anyway. |
There was a problem hiding this comment.
hash primary key cannot have null, but hash index key can
| WHERE yb_hash_code(h1) = 500 | ||
| ORDER BY h1, h2, r | ||
| LIMIT 10; | ||
|
|
There was a problem hiding this comment.
there is a lack of tests involving
- secondary indexes
- nulls
- multi-column hash + row compare
1909a29 to
cafdc87
Compare
|
You have used all of your free Bugbot PR reviews. To receive reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial. |
…const Summary: When yb_hash_code(hash_cols) = constant pins a scan to a single hash bucket, DocDB stores rows in (hash_cols, range_cols) order within that bucket. Three independent code paths prevented exploiting this ordering for pagination: 1. pathkeys.c (build_index_pathkeys): Hash columns were never allowed to produce sort pathkeys, forcing a Sort node even when ORDER BY matched the index key order within a pinned bucket. Fix: when yb_hash_code equality is detected, pass is_hash_index=false so hash columns generate sort pathkeys. 2. indxpath.c (match_rowcompare_to_indexcol): ROW comparisons like (h, r) > (?, ?) were unconditionally rejected when the first column was a hash column. Fix: allow through when yb_hash_code equality pins the scan to a single bucket. 3. yb_scan.c (YbBindRowComparisonKeys): Two guards prevented binding ROW comparison keys to DocDB when any hash column was present in the index or the comparison subkeys. Fix: relax both guards when a yb_hash_code equality scan key is present. Together these fixes enable efficient keyset pagination: SELECT * FROM t WHERE yb_hash_code(h) = 10 AND (h, r) > (:last_h, :last_r) ORDER BY h, r LIMIT 1000; Previously this required a full sort of the hash bucket on every page and scanned from the start of the bucket. Now it seeks directly to the cursor position. Fixes: yugabyte#30524 Test Plan: New regression test yb.orig.hash_code_pagination covering: - Sort elimination with yb_hash_code equality + ORDER BY - ROW comparison pushdown as Index Cond - Negative tests (range predicates, arity mismatches) - Multi-column hash key support
cafdc87 to
ef908c6
Compare
|
You have used all of your free Bugbot PR reviews. To receive reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial. |
|
@jasonyb thanks for feedback! I checked and Do you have anything particular in mind there? My understanding is that since |
|
You have used all of your free Bugbot PR reviews. To receive reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial. |
There was a problem hiding this comment.
I'm going to do another test run. Hope you saw the previous test results I posted and fixed those issues.
>>> Lint for src/postgres/src/backend/access/yb_access/yb_scan.c:
Error (S&RX) bad_variable_declaration_spacing
Variable declarations should align variable names to the 12 column mark
1684 */
1685 if (can_pushdown)
1686 {
>>> 1687 bool has_hash_code_eq = yb_has_hash_code_equality_scan_key(ybScan);<TAB><TAB>bool<TAB>has_hash_code_eq = yb_has_hash_code_equality_scan_key(ybScan);
1688
1689 for (int i = 0;
1690 i < index->rd_index->indnkeyatts &&
Error (S&RX) bad_variable_declaration_spacing
Variable declarations should align variable names to the 12 column mark
1693 {
1694 if (has_hash_code_eq)
1695 {
>>> 1696 bool found = false;<TAB><TAB><TAB><TAB>bool<TAB>found = false;
1697 for (int j = 0; j < subkey_count; j++)
1698 {
1699 if (subkeys[j]->sk_attno - 1 == i)
>>> Lint for src/postgres/src/backend/utils/misc/pg_yb_utils.c:
Error (S&RX) bad_variable_declaration_spacing
Variable declarations should align variable names to the 12 column mark
8922 col = 0;
8923 foreach(arg_lc, funcexpr->args)
8924 {
>>> 8925 Node *arg = (Node *) lfirst(arg_lc);<TAB><TAB><TAB>Node *arg = (Node *) lfirst(arg_lc);
8926
8927 if (IsA(arg, RelabelType))
8928 arg = (Node *) ((RelabelType *) arg)->arg;
LINT ERRORS Lint raised errors!
| -- Negative: range predicate on yb_hash_code should still Sort (multiple buckets). | ||
| EXPLAIN (COSTS OFF) SELECT * FROM hk_pagination | ||
| WHERE yb_hash_code(h) < 1000 | ||
| ORDER BY h, r1, r2 | ||
| LIMIT 10; |
There was a problem hiding this comment.
note: this is not ready in the near future, but yb_hash_code(h) IN (0, 1, ..., 999) + yb_max_saop_merge_streams = 1024 + yb_enable_cbo + pg_hint_plan hints to force SAOP merge + yb_hash_code IN-clause Index Cond support would be able to avoid Sort node
| -- Verify results for yb_hash_code equality + ORDER BY (no Sort). | ||
| SELECT * FROM hk_pagination | ||
| WHERE yb_hash_code(h) = 1000 | ||
| ORDER BY h, r1, r2; | ||
| h | r1 | r2 | v | ||
| ---+----+----+--- | ||
| (0 rows) | ||
|
|
||
| -- Verify ROW comparison pagination produces correct results. | ||
| -- First page: | ||
| SELECT * FROM hk_pagination | ||
| WHERE yb_hash_code(h) = 1000 | ||
| AND (h, r1, r2) > (0, 0, 0) | ||
| ORDER BY h, r1, r2 | ||
| LIMIT 5; | ||
| h | r1 | r2 | v | ||
| ---+----+----+--- | ||
| (0 rows) |
There was a problem hiding this comment.
these are useless as no rows are returned. pick better numbers or load more data
| -- The hash column is NOT in the ROW comparison, so pushdown must NOT | ||
| -- happen (EncodeRowKeyForBound needs all hash columns in the bound). |
There was a problem hiding this comment.
so what happens if you omit a range column, such as r1?
|
|
||
| -- | ||
| -- NULL handling tests. | ||
| -- NULLs hash to a different bucket, so yb_hash_code(col) = const already |
There was a problem hiding this comment.
What is "different bucket"? Isn't there at least one bucket they hash to? Wouldn't it be best to test using that one specific const rather than others? That same const should include rows with nulls and rows without nulls.
| INSERT INTO hk_pagination VALUES (NULL, 100, 1000, 'null_h'); | ||
| INSERT INTO hk_pagination VALUES (1, NULL, 1000, 'null_r1'); | ||
| INSERT INTO hk_pagination VALUES (1, 100, NULL, 'null_r2'); | ||
| -- Verify NULLs do not appear in a hash-code-pinned scan. |
There was a problem hiding this comment.
misleading comment if you read it as a matter of fact, not as specific to this table, primary culprit being use of "a"
| INSERT INTO hk_pagination VALUES (NULL, 100, 1000, 'null_h'); | ||
| INSERT INTO hk_pagination VALUES (1, NULL, 1000, 'null_r1'); | ||
| INSERT INTO hk_pagination VALUES (1, 100, NULL, 'null_r2'); |
There was a problem hiding this comment.
shouldn't these fail because PKs cannot contain nulls? did you run the test before pushing?
| 1 | 16 | 160 | val16 | ||
| (4 rows) | ||
|
|
||
| -- Secondary index with NULLs: secondary hash indexes allow NULL values. |
There was a problem hiding this comment.
secondary hash indexes allow NULL values
then why did you have inserts of nulls to PKs in the above queries?
| (4 rows) | ||
|
|
||
| -- Secondary index with NULLs: secondary hash indexes allow NULL values. | ||
| -- yb_hash_code(col) works when col is NULL (function is non-strict). |
There was a problem hiding this comment.
yb_hash_code(col) works when col is NULL (function is non-strict).
I see no null involved in the following query
| -- Secondary hash indexes allow NULL values, and yb_hash_code(NULL::type) | ||
| -- returns a valid hash code, so NULL rows live in a real bucket. | ||
| -- The IS NOT NULL guard is skipped for hash columns (pggate rejects it), | ||
| -- but SQL-layer recheck excludes NULLs because NULL comparisons yield NULL. |
There was a problem hiding this comment.
seems like you have a lot of null tests above that are disorganized. Can all tests be more sensibly organized as follows:
- pk_table (h1, h2, r1, r2, v), PRIMARY KEY ((h1, h2) HASH, r1 ASC, r2 ASC)
- ik_table (h1, h2, r1, r2, v), UNIQUE INDEX KEY ((h1, h2) HASH, r1 ASC, r2 ASC)
- Skip single-column tests and go straight to multi-column
- Incorporate NULLS at the very beginning of data load: pk_table and ik_table have the same data, but ik_table additionally has extra NULL key columns. Ensure that for yb_hash_code = const values you test, there are both NULL and non-NULL rows matching that constant.
- For each query you want to test, you want to test both pk_table and ik_table. No need for pg_hint_plan hints as both tables have only the choice between index/primary key scan and sequential scan. You also want to show both explain plan and results. I suggest you use
:explain1run1on both table names like
\getenv abs_srcdir PG_ABS_SRCDIR
\set filename :abs_srcdir '/yb_commands/explainrun.sql'
\i :filename
\set explain 'EXPLAIN (COSTS OFF)'
\set query 'SELECT * FROM :table_name
WHERE yb_hash_code(h1, h2) = <some_meaningful_constant>
ORDER BY h1, h2;
\set table_name 'pk_table'
:explain1run1
\set table_name 'ik_table'
:explain1run1
Since I see this pattern come up a few times, it may be worth augmenting the explainrun framework to generalize the "hints" system to "params": define \set explain1 ':explain :hint1 :query;' instead as \set explain1 ':explain :query;' and make the responsibility of query to have hint as param at the front of the query. I can take care of this, so you can in the meantime use the above slight duplication of back-and-forth set table_name.
| * IS NOT NULL on partition columns, and for primary-key hash | ||
| * columns the null check is redundant anyway. | ||
| */ | ||
| is_not_null[att_idx] |= (subkey_index == 0 && |
There was a problem hiding this comment.
commit cd0d858 never explains why IS NOT NULL was not supported for hash. Would have been good to know that.
|
Your test is broken. Are you running your tests? |
|
Couple things:
|
|
@kirs, thanks for pointing out the issue and submitting the PR. I see this issue/PR addresses two orthogonal issues.
In meantime, would you mind to split your PR? |
|
So, it looks like we convert pushable row conditions to document key bounds. We probably need to discuss, if we need to implement this optimization, and how it would be better to implement it. As I mentioned above, that part is independent from the yb_hash_code+ORDER BY support, two changes may be made separately. |
|
@andrei-mart thanks for your feedback! Here's the main issue that I'm after: Even without Sort elimination is nice to have but only as long as we have a way to paginate over a table with Btw it's the same issue if I expand that into It sounds like I'm all up for splitting work into multiple PRs. Let me know what you think is best given the context above. |
|
Ah, you want efficient pagination, I see. I suggested to split the PR because the ORDER BY part works and can be merged soon, while the row condition does not seem to work as expected. And in general, it is better to make independent changes separately. So, for your use case, I think the condition It is interesting that the planner has done most of the validations by the time when the condition reaches YbBindRowComparisonKeys. If, for example yb_hash_code takes not the hash column arguments, the index is not even used and the condition becomes a sequential filter. On the other hand, if index has an expression key, condition on that is also an expression and may be hard to distinguish from the yb_hash_code. Beware, it seems like YBCPgDmlAddRow(Lower|Upper)Bound functions do not take a value for the hash code. I don't think we use them with hash keys, but out of the fact it works with your change, these functions calculate the hash code out of the provided hash values. That's not correct. |
|
@andrei-mart for PgBouncer and conn multiplexing reasons, we've been aiming to make this pagination as stateless as possible, so using cursor is less desirable. And yes, Is there a reason you've suggested |
These two are equivalent. |
|
@andrei-mart thank you! I'd like to take a stab at that as a separate PR and then we could close this one. Just a few questions to make this less review rounds for you:
|
That's a good question. Anyway, that means the EncodeRowKeyForBound is implemented incorrectly. Well, let's better say the hash case is not supported. Perhaps we should not use BuildDocKey to make a bound. Bound, while similar to DocKey has certain differences. Give me couple of days, I think I can fix that for you, or provide more detailed guidance. For now, let's stick to that: hash code should be the first in col_values. If you need working EncodeRowKeyForBound for testing, you can patch it locally and ignore the first value in the input. For 2 and 3 I need to gather context. It's been a while since I touched the planner part. In my quick tests I saw ScanKeys are coming to the YbBindRowComparisonKeys and assumed all is good in the planner. Let me look into details and get back to you. |
|
Update: For the best pagination support I think your query should be like this: The nice difference from Sorry, I did not have time to look into your other two questions. Will do the first thing Monday. |
|
So, I've looked into 2. and 3. The Postgres logic basically starts from the match_clause_to_index function. The purpose is to determine if the given index is helpful with evaluation of the given clause. The most of the function is a loop over all the index columns, where match_clause_to_indexcol function is used to determine if the given column in the given index is helpful with evaluation of the given clause. That's where the indexcol is coming from. The idea is to find the first column in the index which is helpful, because subsequent columns may still be helpful, but not much helpful. The match_clause_to_indexcol is basically a big switch which calls respective handler function based on the clause type. In the case of the row compare clause, it is the match_rowcompare_to_indexcol. We were not supposed to optimize row compare so here is a blocker: yugabyte-db/src/postgres/src/backend/optimizer/path/indxpath.c Lines 3663 to 3664 in 129cb53 Nonetheless, the block works while match_clause_to_index function evaluates the hash keys, when it comes to the range keys, if there are any, match_rowcompare_to_indexcol proceeds. Here comes the second bug, if the yb_hash_code is the first expression in the row, the match_index_to_operand is always successful due to the special handling for the yb_hash_code: Honestly, I don't know why it is here. It is pretty old code, people involved are not available to comment, and if they were, I don't expect them to remember details. My assumption is it worked for the purpose and did not break things. Even now, when we found a false positive, the query works correctly, it just prevents optimal execution. The last bastion is here, where expand_indexqual_rowcompare is invoked if opfamily matches. It may be a pure coincidence that the current range column is a int4 and the opfamily matches. The expand_indexqual_rowcompare expects the first expression to match the index. In the case if the yb_hash_code is the first expression, this expectation is surprisingly true. Just the indexcol is random. Fortunately it is not used in the expand_indexqual_rowcompare's logic. That's why I saw seemingly valid scan keys in my test. Interesting, in Postgres expand_indexqual_rowcompare thinks the Row is good while it refers the index columns in same direction, while in Yugabyte it is good while it refers the index columns in same direction and in same order. Otherwise the bound would not be correct. I'm not sure how exactly it should be fixed. I think it is not an option to make the hash_code a column in the index, so it can be referenced by indexcol, it is too fundamental. We can technically use -1, as the indexcol in struct IndexClause is of AttrNumber and technically allows negatives. I'm not sure how OK it is, but the Postgres itself uses InvalidAttrNumber (0) for the first column. I think the check for the hash code should happen before the loop over the index column, and only if the target index is a hash index. There we can also check when it is pointless to iterate over the index columns. Like, if we have a row comparison, it is good if the first row expression is the yb_hash_code and bad otherwise. The good news, if the planner does the job properly, the YbBindRowComparisonKeys implementation does not need any validation. If the row comparison scan keys are here, they match the index's prefix. Hopefully you are not overwhelmed yet. Please let us know if you are still committed to take on this feature. |
My original plan on the application side has been to spawn N routines, each N gets its own chunk of the range of yb_hash_code from 0 to 65536. So if the routine gets assigned hash code 0 to 999, having an upper boundary is still very useful there.
😃 I'm thinking of starting with making the planner recognize |
|
Welcome back @kirs! Yeah, you will be able to distribute the work between multiple routines. So I've started looking at that ORDER BY yb_hash_code(h). I've taken out the code to match yb_hash_code expression to an index from the match_index_to_operand() into a separate function, it is supposed to be used in multiple places. Then I took whole yb_hash_code handling out of the loop over the index columns in the match_clause_to_index(), as it is not related to the columns. I'm taking pause here to run the regression tests. I uploaded my WIP patch here: https://gist.github.com/andrei-mart/3836fc3ca0f3322be35537fd68a98da2 so you can take look. I think the yb_match_clause_to_index the patch adds is a good place to identify the "special" row condition. "Special" here should mean a row condition where the first element is a yb_hash_code matching the index. I can hand it over to you. But if we both are about to work on these items, we will need to synchronize somehow. |


When
yb_hash_code(hash_cols) = constantpins a scan to a single hash bucket, DocDB stores rows in(hash_cols, range_cols)order within that bucket. Three independent code paths prevented exploiting this ordering for pagination:pathkeys.c(build_index_pathkeys): Hash columns were never allowed to produce sort pathkeys, forcing a Sort node even when ORDER BY matched the index key order within a pinned bucket. Fix: when yb_hash_code equality is detected, pass is_hash_index=false so hash columns generate sort pathkeys.indxpath.c(match_rowcompare_to_indexcol): ROW comparisons like(h, r) > (?, ?)were unconditionally rejected when the first column was a hash column. Fix: allow through when yb_hash_code equality pins the scan to a single bucket.yb_scan.c(YbBindRowComparisonKeys): Two guards prevented binding ROW comparison keys to DocDB when any hash column was present in the index or the comparison subkeys. Fix: relax both guards when a yb_hash_code equality scan key is present.Together these fixes enable efficient keyset pagination:
Previously this required a full sort of the hash bucket on every page and scanned from the start of the bucket. Now it seeks directly to the cursor position.
Fixes: #30524
Test Plan: New regression test yb.orig.hash_code_pagination covering:
Note
Medium Risk
Planner and scan-binding logic for YB hash-sharded primary keys is relaxed based on detecting
yb_hash_code(...) = const, which can change query plans and index bound pushdown behavior and may impact correctness if the pinning detection misfires.Overview
Improves keyset pagination and ORDER BY planning on hash-sharded (LSM) primary keys when the scan is pinned to a single hash bucket via
yb_hash_code(hash_cols) = const.This teaches the optimizer to treat hash columns as providing valid ordering/pathkeys and allows
ROW(...)range comparisons to match and be pushed down as index bounds in that pinned-bucket case, and updatesYbBindRowComparisonKeysto permit binding row-comparison bounds involving hash columns while avoidingIS NOT NULLbinds on hash/partition columns.Adds a new regression test
yb.orig.hash_code_pagination(and schedule entry) covering sort elimination, row-compare pushdown, multi-column hash keys, and negative cases.Written by Cursor Bugbot for commit 321541d. This will update automatically on new commits. Configure here.