Skip to content

Migrate issuecomment table to i64 ids#348

Open
Mark-Simulacrum wants to merge 1 commit intorust-lang:masterfrom
Mark-Simulacrum:bigint-comments
Open

Migrate issuecomment table to i64 ids#348
Mark-Simulacrum wants to merge 1 commit intorust-lang:masterfrom
Mark-Simulacrum:bigint-comments

Conversation

@Mark-Simulacrum
Copy link
Copy Markdown
Member

2 years after the stopgap fix in f9d84fa, this writes a one-way migration to move to bigint columns for all the issuecomment IDs. Presuming we like this approach, it would be a good idea to apply the same to other GitHub IDs (issues, users, etc.), but that can probably be done in follow up work (and maybe can be deferred again -- we should check how close we are to limits on each of those ID spaces), maybe we can move away from this database entirely before we hit those limits.

The migration has been tested on a data export from production postgres, locally it takes ~7 seconds to run. I'm not sure if it's safe to run it while rfcbot is actively running (in particular, what happens if the diesel schema in the running rfcbot disagrees with the database). We may want to try to stop the app, run the migration, and then start it again (downtime should be comparatively short though not zero). rfcbot isn't currently responding to any new comments I think so it's in practice nearly down anyway.

2 years after the stopgap fix in f9d84fa, this writes a
one-way migration to move to bigint columns for all the issuecomment
IDs. Presuming we like this approach, it would be a good idea to apply
the same to other GitHub IDs (issues, users, etc.).

The migration has been tested on a data export from production postgres,
locally it takes ~7 seconds to run. I'm not sure if it's safe to run it
while rfcbot is actively running (in particular, what happens if the
diesel schema in the running rfcbot disagrees with the database). We may
want to try to stop the app, run the migration, and then start it again
(downtime should be comparatively short though not zero). rfcbot isn't
currently responding to any new comments I think so it's in practice
nearly down anyway.
@Mark-Simulacrum
Copy link
Copy Markdown
Member Author

cc https://rust-lang.zulipchat.com/#narrow/channel/242791-t-infra/topic/rfcbot.20unresponsive/with/590342355

Checking other IDs:

  • githubuser - max is 276139429, 12.8% of i32 space
  • milestone - 0.7%
  • issue - 0.17%
  • pullrequest - 0.11%
  • issuecomment - 199% (since we used both sides of i32)

So indeed I think it's safe to defer the other IDs basically indefinitely.

Copy link
Copy Markdown

@teor2345 teor2345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

Changing column data types before dropping foreign keys will temporarily create foreign key constraints with mismatched types. This is non-standard, but there's no documentation prohibiting it, and apparently it works fine:
https://stackoverflow.com/questions/32267778/setting-up-foreign-key-with-different-datatype/68491233#68491233


alter table rfc_feedback_request drop constraint rfc_feedback_request_fk_feedback_comment_fkey;

-- Then update all the values to be u64'd, not i32 storing u32 value.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: Technically the representation is unsigned, but the range is 0..=i64::MAX:
https://www.postgresql.org/docs/current/datatype-numeric.html#DATATYPE-INT

(But the difference is unlikely to ever matter.)

Suggested change
-- Then update all the values to be u64'd, not i32 storing u32 value.
-- Then update all the values to be unsigned in an i64 field, not i32 storing u32 value.

@@ -0,0 +1,4 @@
-- TODO: is there a reasonable rollback procedure? I think we could undo the
-- data type changes, but it would likely mean deleting some comments from the
-- database that can no longer be represented. Is that a reasonable tradeoff?
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For rollback deleting out of range comment IDs seems to be the only option.

But we could keep a copy of those comments (in the migrated but un-rolled-back database), and move them across once we have a successful future migration.

Comment thread src/github/client.rs
let url = format!(
"{}/repos/{}/issues/comments/{}",
BASE_URL, repo, comment_num as u32
BASE_URL, repo, comment_num as u64
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a precaution, would try_from() be better, just in case any negative numbers make it through?

Suggested change
BASE_URL, repo, comment_num as u64
BASE_URL, repo, u64::try_from(comment_num).expect("id is a non-negative i64")

Comment thread src/github/nag.rs
},
number = issue.number,
id = comment_id as u32,
id = comment_id as u64,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here as well?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants