Conversation
Reviewer's GuideImplement numeric-aware sorting for CVE identifiers by introducing a normalized SQL sort key (id_sort_key) and updating the sort translator to route 'id' sorts through it, and add tests to verify correct ascending and descending ordering. Entity relationship diagram for CVE ID sorting keyerDiagram
VULNERABILITY {
id TEXT
id_sort_key TEXT
}
VULNERABILITY ||--o{ PAGINATED_RESULTS : contains
VULNERABILITY ||--o{ COLUMNS : uses
COLUMNS {
id_sort_key TEXT
}
File-Level Changes
Assessment against linked issues
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
a5150d6 to
07293c7
Compare
There was a problem hiding this comment.
Hey there - I've reviewed your changes - here's some feedback:
- Extract the CASE WHEN regex padding logic into a named constant or helper to improve readability and avoid inline SQL clutter.
- Consider persisting the computed id_sort_key as a computed (or materialized) column and indexing it to avoid expensive regex/substring processing on each query.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Extract the CASE WHEN regex padding logic into a named constant or helper to improve readability and avoid inline SQL clutter.
- Consider persisting the computed id_sort_key as a computed (or materialized) column and indexing it to avoid expensive regex/substring processing on each query.
## Individual Comments
### Comment 1
<location> `modules/fundamental/src/vulnerability/service/test.rs:545-540` </location>
<code_context>
+async fn vulnerability_numeric_sorting(ctx: &TrustifyContext) -> Result<(), anyhow::Error> {
</code_context>
<issue_to_address>
**suggestion (testing):** Consider adding tests for edge cases such as malformed or non-standard CVE IDs.
Including malformed CVE IDs in tests will help verify that the sort key logic handles unexpected formats correctly.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
|
The change looks good. I'm just not sure it is the right approach. Yes, it makes it more convenient for CVE IDs. However, there are a lot of OSV sources which use a similar format:
Now the user would see CVE IDs sorted differently than those. And that would be hard to explain and understand. If we can change this to a way that we split this into components and then sort each part as ASCII or numeric (if it's numeric only), I think this could work. |
Thank you. I wasn't aware of those. I think we could certainly generalize those patterns. Out of those three examples, MAL and PSF do seem to follow the same pattern as CVE. RUSTSEC, if I'm reading the spec correctly, always requires 4 digits in the sequence sections, thus For my own notes, the different sources are listed here. Interestingly, some sources follow a slightly different pattern: https://github.com/AlmaLinux/osv-database/tree/master/advisories/almalinux10 Let me explore a way to generalize this. Do you have any performance concerns with this approach? We could introduce a new column that stores the computed sort ID but maybe that's a premature performance improvement right now. |
I always have concerns. 😬 And especially for performance. But we do have scale tests, which can be triggered using |
|
Pushed a change to make the sorting handle different vulnerability IDs. It looks like I broke some tests, I'll have a look at those next, but wanted to share the hmm... creative solution. |
1328ffb to
5ae3648
Compare
The tests were broken. I just didn't have the expected locale set on my local system. If we want the approach of using expressions at query time, I believe the changes here achieve that. It would be great for someone with access to approve running the workflows and maybe run |
I think the concern mentioned in #2020 (comment) is valid and should be addressed. We also do need some performance tests. |
|
This is a change that I think will be very beneficial to RHTPA. But I think we need to revisit the mechanics of the solution to ensure it is performant and develop some scale tests for it. So I've created this ticket TC-3602 to bring this work to a successful conclusion. |
|
@lcarva - Luiz do you think you'll be able to find some time to address the PR feedback? |
5ae3648 to
0b3c66f
Compare
|
Changes have been pushed to address the feedback and are now ready for review 🙏 |
0b3c66f to
80f6499
Compare
ctron
left a comment
There was a problem hiding this comment.
Thanks for following up! Graphql was removed, maybe the PR needs to be rebased. However, I suspect a rebase conflict with the migration once the group PR is (finally) merged. So maybe wait until that.
I like the idea of that materialized column. That might increase storage, but should sort the performance issue.
I guess the test could benefit a bit from DRY-ing it up.
entity/src/vulnerability.rs
Outdated
| /// Generated column for sorting vulnerability IDs with proper numeric ordering | ||
| /// This is a STORED generated column in the database and should not be set during insert/update | ||
| /// Nullable to support LEFT JOIN queries where the vulnerability may not exist | ||
| #[cfg_attr(feature = "async-graphql", graphql(skip))] |
There was a problem hiding this comment.
Graphql has been removed in a most recent PR, so this should be dropped as the feature would no longer exist, that would cause an error.
| ) STORED", | ||
| ) | ||
| .await | ||
| .map(|_| ())?; |
There was a problem hiding this comment.
It looks like a no-op. I checked the other migration files and this pattern is not used there either. Removed.
|
|
||
| #[derive(DeriveIden)] | ||
| #[allow(dead_code)] | ||
| pub enum Vulnerability { |
There was a problem hiding this comment.
I guess this is unused. I'm not sure if it makes sense keeping the pattern of using "sea_orm_migration" here. I'd prefer to. But I also understanding that it might be in the way. In any case, if it's not used, it should probably be removed.
|
|
||
| #[test_context(TrustifyContext)] | ||
| #[test(tokio::test)] | ||
| async fn vulnerability_numeric_sorting(ctx: &TrustifyContext) -> Result<(), anyhow::Error> { |
There was a problem hiding this comment.
It's good to have that. I'm not happy about the repetition.
I think we could rewrite this as:
for id in [
"ID1",
"ID2",
] {
ctx.graph.ingest_vulnerability(id, (), &ctx.db).await?;
}| .await?; | ||
| assert_eq!(15, vulns.items.len()); | ||
| // Alphabetical by prefix, then numeric within each prefix | ||
| assert_eq!(vulns.items[0].head.identifier, "ABC-xxxx-yyyy"); |
There was a problem hiding this comment.
Same here, we could:
let ids : Vec<_> = vulns.items.iter().map(|vuln|vuln.head.identifier).collect();And then:
const EXPECTED : &[&str] = &[…];
assert_eq!(EXPECTED, ids);Maybe even use EXPECTED with shuffle. To have a consistent set.
Pro side of comparing a complete set, you would get a full diff of the result. Not only the first error.
Vulnerability IDs (e.g., CVE-2024-12345) contain numeric segments that should be sorted numerically rather than lexicographically. This change adds a generated database column that stores a normalized sort key where numeric segments are zero-padded to 19 digits (the max defined in the CVE ID spec), enabling proper numeric ordering. Implementation uses a PostgreSQL STORED generated column with an index, which provides: - Automatic computation and maintenance of sort keys for all rows - Efficient indexed sorting without runtime overhead - Single source of truth for the normalization logic fixes guacsec#1811 Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Luiz Carvalho <lucarval@redhat.com>
80f6499 to
28d8be6
Compare
|
@ctron, I believe I addressed all the comments. |
CVE records follow a specific format where the last segment represents a numerical sequence. To properly sort CVE records, we must treat this sequence segment differently than the rest of the record ID.
fixes #1811
Summary by Sourcery
Implement proper numeric sorting for CVE identifiers by introducing a normalized sort key and updating the sorting translator to use it, ensuring correct ascending and descending order across different ID prefixes.
Enhancements:
id_sort_keySQL expression to pad the numeric segment of CVE IDs for accurate numeric sorting.idsort operations to use the newid_sort_keywhen sorting vulnerabilities.Tests:
vulnerability_numeric_sortingintegration test to verify correct ascending and descending ordering for CVE, GHSA, and custom IDs.