feat(pageserver): enable reldirv2 by default in regress tests#12758
Open
feat(pageserver): enable reldirv2 by default in regress tests#12758
Conversation
a5debfe to
dcf38bc
Compare
Signed-off-by: Alex Chi Z <chi@neon.tech>
dcf38bc to
f6b1891
Compare
9152 tests run: 8490 passed, 0 failed, 662 skipped (full report)Code coverage* (full report)
* collected from Rust tests only The comment gets automatically updated with the latest test results
5713ff3 at 2025-07-30T16:22:16.761Z :recycle: |
Signed-off-by: Alex Chi Z <chi@neon.tech>
Signed-off-by: Alex Chi Z <chi@neon.tech>
8d8d989 to
d37cd4b
Compare
302884e to
00091a4
Compare
Signed-off-by: Alex Chi Z <chi@neon.tech>
Signed-off-by: Alex Chi Z <chi@neon.tech>
erikgrinaker
approved these changes
Jul 31, 2025
Contributor
erikgrinaker
left a comment
There was a problem hiding this comment.
I haven't paged this back in enough to give it a thorough review, but it seems okay at a high level.
| pub basebackup_cache_enabled: bool, | ||
|
|
||
| #[serde(skip_serializing_if = "std::ops::Not::not")] | ||
| pub rel_size_v1_access_disabled: bool, |
Contributor
There was a problem hiding this comment.
nit: can this get a doc comment please?
|
|
||
| #[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)] | ||
| /// The state of the rel size migration. This is persisted in the index part. | ||
| /// compatibility. |
| current_status: RelSizeMigration, | ||
| // Whether we should initialize the v2 keyspace or not. | ||
| initialize: bool, | ||
| // Whether we should disable v1 access starting this LSNor not |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Part of LKB-197. This patch enables reldirv2 by default.
Using index_part to persist reldirv2 status is not reliable. In the current ingestion path when we initialize the relv2 keyspace, we will immediately set index_part to persist that we have enabled v2. However, if the pageserver is shut down or the tenant gets dropped for whatever reason, the disk persistent LSN might not catch up with the LSN where we enabled the relv2. Therefore, we will end up with a state that we have "v2 enabled" in index_part, but the v2 migration keys are not in the layer files.
This patch mainly addresses this issue by modifying the data path to look at the
migrated_atLSN to decide whether to scan the v2 keyspace or not. If the request is below themigrated_atLSN, only v1 is used.How this mechanism interacts with branches are not tested yet and will be in the upcoming patches. For example, if we create a branch below the migrated_at LSN of the current branch, we should reset it to legacy mode. Currently this is solely guarded by the
maybe_enable_rel_size_v2(see "Revert the status to legacy...") and we should also do this during timeline creation.This patch also adds partial support for v2-only mode. This is governed by
rel_size_v1_access_disabledflag. The code is here solely to make the tests pass, not 100% correct (for example, when the above situation happens during the switch from v1+v2 validation mode to v2-only mode, it might have some races). We do not plan to switch to v2-only mode in the near term. This needs more tests before we enable.There was a previous attempt #12708 to deal with this index_part <-> reality inconsistency by persisting the migration status in the keyspace. The pros of that approach is that we get very precise LSN where the migration is done and that LSN is consistent with when we do the migration. However, the downside is that it needs to query that key on every request, and it greatly slows down the tests (all pg_regress tests are timing out on ingestion). So in this patch, we take the approach to allow the inconsistency between index_part <-> real data and detect such inconsistency on the write path + in the future branch creation time.
Basically, the way to look into the patch is like:
<= migrated_lsninstead of<)timeline_offloadingtest case but we will have a separate one in the future.Summary of changes
rel_size_v1_access_disabledto fully disable v1 and transition the status toMigrated.collect_keyspacestill needs that.migrated_atLSN.test_image_consistent_lsn: this test was added as part of the hadron new features. It has some assumption on the layer size and the relv2 migration process breaks that assumption. So I forced v1 in that test case, and I'll do a full fix later.