fix: sort CVE records correctly by lcarva · Pull Request #2020 · guacsec/trustify

lcarva · 2025-10-08T14:18:01Z

CVE records follow a specific format where the last segment represents a numerical sequence. To properly sort CVE records, we must treat this sequence segment differently than the rest of the record ID.

fixes #1811

Summary by Sourcery

Implement proper numeric sorting for CVE identifiers by introducing a normalized sort key and updating the sorting translator to use it, ensuring correct ascending and descending order across different ID prefixes.

Enhancements:

Introduce id_sort_key SQL expression to pad the numeric segment of CVE IDs for accurate numeric sorting.
Translate id sort operations to use the new id_sort_key when sorting vulnerabilities.

Tests:

Add vulnerability_numeric_sorting integration test to verify correct ascending and descending ordering for CVE, GHSA, and custom IDs.

sourcery-ai · 2025-10-08T14:18:09Z

Reviewer's Guide

Implement numeric-aware sorting for CVE identifiers by introducing a normalized SQL sort key (id_sort_key) and updating the sort translator to route 'id' sorts through it, and add tests to verify correct ascending and descending ordering.

Entity relationship diagram for CVE ID sorting key

erDiagram
    VULNERABILITY {
        id TEXT
        id_sort_key TEXT
    }
    VULNERABILITY ||--o{ PAGINATED_RESULTS : contains
    VULNERABILITY ||--o{ COLUMNS : uses
    COLUMNS {
        id_sort_key TEXT
    }

File-Level Changes

Change	Details	Files
Introduce id_sort_key expression and translate 'id' sorts to numeric-aware key	Add a CASE expression (id_sort_key) that pads the trailing CVE sequence to 19 digits Extend filtering_with translator to map sort('id') to id_sort_key:asc/desc Retain alphabetical sorting for non-CVE prefixes by falling back to raw id	`modules/fundamental/src/vulnerability/service/mod.rs`
Add vulnerability_numeric_sorting test for mixed identifier ordering	Ingest a set of CVE, GHSA and ABC identifiers with varying numeric lengths Verify ascending id sort returns ABC < CVE-2023-1234 < ... < GHSA Verify descending id sort correctly reverses that order	`modules/fundamental/src/vulnerability/service/test.rs`

Assessment against linked issues

Issue	Objective	Addressed	Explanation
#1811	Ensure that GET /api/v2/vulnerability?sort=id:desc returns vulnerabilities sorted by identifier such that CVE records are ordered numerically by their trailing sequence number, not lexicographically.	✅
#1811	Add or update tests to verify correct numeric sorting of vulnerability identifiers, including CVE records and other formats.	✅

Possibly linked issues

GET /api/v2/vulnerability sorted by identifier returns unexpected results #1811: The PR introduces a numeric-aware sorting key for CVE identifiers to fix the incorrect ordering reported in the issue.
GET /api/v2/vulnerability sorted by identifier returns unexpected results #1811: PR adds a numeric-aware sort key for CVE identifiers in the database query to correctly sort them by number, addressing the issue.

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey there - I've reviewed your changes - here's some feedback:

Extract the CASE WHEN regex padding logic into a named constant or helper to improve readability and avoid inline SQL clutter.
Consider persisting the computed id_sort_key as a computed (or materialized) column and indexing it to avoid expensive regex/substring processing on each query.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- Extract the CASE WHEN regex padding logic into a named constant or helper to improve readability and avoid inline SQL clutter.
- Consider persisting the computed id_sort_key as a computed (or materialized) column and indexing it to avoid expensive regex/substring processing on each query.

## Individual Comments

### Comment 1
<location> `modules/fundamental/src/vulnerability/service/test.rs:545-540` </location>
<code_context>
+async fn vulnerability_numeric_sorting(ctx: &TrustifyContext) -> Result<(), anyhow::Error> {
</code_context>

<issue_to_address>
**suggestion (testing):** Consider adding tests for edge cases such as malformed or non-standard CVE IDs.

Including malformed CVE IDs in tests will help verify that the sort key logic handles unexpected formats correctly.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

modules/fundamental/src/vulnerability/service/test.rs

ctron · 2025-10-09T08:32:18Z

The change looks good. I'm just not sure it is the right approach.

Yes, it makes it more convenient for CVE IDs. However, there are a lot of OSV sources which use a similar format:

Now the user would see CVE IDs sorted differently than those. And that would be hard to explain and understand.

If we can change this to a way that we split this into components and then sort each part as ASCII or numeric (if it's numeric only), I think this could work.

lcarva · 2025-10-09T17:45:19Z

The change looks good. I'm just not sure it is the right approach.

Yes, it makes it more convenient for CVE IDs. However, there are a lot of OSV sources which use a similar format:

https://osv.dev/vulnerability/RUSTSEC-2025-0072

https://osv.dev/vulnerability/MAL-2025-47815

https://osv.dev/vulnerability/PSF-2025-12

Thank you. I wasn't aware of those. I think we could certainly generalize those patterns.

Out of those three examples, MAL and PSF do seem to follow the same pattern as CVE. RUSTSEC, if I'm reading the spec correctly, always requires 4 digits in the sequence sections, thus 0072 in the example above. Not sure what happens when there are more than 9,999 RUSTSEC records in a single year.

For my own notes, the different sources are listed here. Interestingly, some sources follow a slightly different pattern: https://github.com/AlmaLinux/osv-database/tree/master/advisories/almalinux10

Let me explore a way to generalize this.

Do you have any performance concerns with this approach? We could introduce a new column that stores the computed sort ID but maybe that's a premature performance improvement right now.

ctron · 2025-10-10T05:49:58Z

Do you have any performance concerns with this approach? We could introduce a new column that stores the computed sort ID but maybe that's a premature performance improvement right now.

I always have concerns. 😬 And especially for performance. But we do have scale tests, which can be triggered using /scale-test on a PR. Assuming we capture this use case with them (maybe we need to extend) we should be sure enough that we don't impact performance. Or we understand what the impact is and can make a decision.

modules/fundamental/src/vulnerability/service/mod.rs

lcarva · 2025-10-17T21:05:49Z

Pushed a change to make the sorting handle different vulnerability IDs. It looks like I broke some tests, I'll have a look at those next, but wanted to share the hmm... creative solution.

lcarva · 2025-10-31T18:34:11Z

Pushed a change to make the sorting handle different vulnerability IDs. It looks like I broke some tests, I'll have a look at those next, but wanted to share the hmm... creative solution.

The tests were broken. I just didn't have the expected locale set on my local system. LC_ALL=C cargo test does the job. I expect it will pass here as well.

If we want the approach of using expressions at query time, I believe the changes here achieve that. It would be great for someone with access to approve running the workflows and maybe run /scale-test.

PhilipCattanach · 2026-01-09T15:58:22Z

@Strum355 @ctron
Gents, this PR has been open for 93 days.
Can we bring it to a successful conclusion please?

ctron · 2026-02-09T09:52:52Z

@Strum355 @ctron Gents, this PR has been open for 93 days. Can we bring it to a successful conclusion please?

I think the concern mentioned in #2020 (comment) is valid and should be addressed. We also do need some performance tests.

PhilipCattanach · 2026-02-09T10:27:26Z

This is a change that I think will be very beneficial to RHTPA. But I think we need to revisit the mechanics of the solution to ensure it is performant and develop some scale tests for it. So I've created this ticket TC-3602 to bring this work to a successful conclusion.

PhilipCattanach · 2026-02-18T09:18:17Z

@lcarva - Luiz do you think you'll be able to find some time to address the PR feedback?

lcarva · 2026-02-18T20:21:43Z

Changes have been pushed to address the feedback and are now ready for review 🙏

ctron

Thanks for following up! Graphql was removed, maybe the PR needs to be rebased. However, I suspect a rebase conflict with the migration once the group PR is (finally) merged. So maybe wait until that.

I like the idea of that materialized column. That might increase storage, but should sort the performance issue.

I guess the test could benefit a bit from DRY-ing it up.

ctron · 2026-02-19T09:26:03Z

entity/src/vulnerability.rs

+    /// Generated column for sorting vulnerability IDs with proper numeric ordering
+    /// This is a STORED generated column in the database and should not be set during insert/update
+    /// Nullable to support LEFT JOIN queries where the vulnerability may not exist
+    #[cfg_attr(feature = "async-graphql", graphql(skip))]


Graphql has been removed in a most recent PR, so this should be dropped as the feature would no longer exist, that would cause an error.

ctron · 2026-02-19T09:27:12Z

migration/src/m0002050_vulnerability_id_sort_index.rs

+                 ) STORED",
+            )
+            .await
+            .map(|_| ())?;


Not sure why we need these?

It looks like a no-op. I checked the other migration files and this pattern is not used there either. Removed.

ctron · 2026-02-19T09:28:36Z

migration/src/m0002050_vulnerability_id_sort_index.rs

+
+#[derive(DeriveIden)]
+#[allow(dead_code)]
+pub enum Vulnerability {


I guess this is unused. I'm not sure if it makes sense keeping the pattern of using "sea_orm_migration" here. I'd prefer to. But I also understanding that it might be in the way. In any case, if it's not used, it should probably be removed.

modules/fundamental/src/vulnerability/service/test.rs

ctron · 2026-02-19T09:31:16Z

modules/fundamental/src/vulnerability/service/test.rs


+#[test_context(TrustifyContext)]
+#[test(tokio::test)]
+async fn vulnerability_numeric_sorting(ctx: &TrustifyContext) -> Result<(), anyhow::Error> {


It's good to have that. I'm not happy about the repetition.

I think we could rewrite this as:

for id in [ "ID1", "ID2", ] { ctx.graph.ingest_vulnerability(id, (), &ctx.db).await?; }

ctron · 2026-02-19T09:34:34Z

modules/fundamental/src/vulnerability/service/test.rs

+        .await?;
+    assert_eq!(15, vulns.items.len());
+    // Alphabetical by prefix, then numeric within each prefix
+    assert_eq!(vulns.items[0].head.identifier, "ABC-xxxx-yyyy");


Same here, we could:

let ids : Vec<_> = vulns.items.iter().map(|vuln|vuln.head.identifier).collect();

And then:

const EXPECTED : &[&str] = &[…]; assert_eq!(EXPECTED, ids);

Maybe even use EXPECTED with shuffle. To have a consistent set.

Pro side of comparing a complete set, you would get a full diff of the result. Not only the first error.

Vulnerability IDs (e.g., CVE-2024-12345) contain numeric segments that should be sorted numerically rather than lexicographically. This change adds a generated database column that stores a normalized sort key where numeric segments are zero-padded to 19 digits (the max defined in the CVE ID spec), enabling proper numeric ordering. Implementation uses a PostgreSQL STORED generated column with an index, which provides: - Automatic computation and maintenance of sort keys for all rows - Efficient indexed sorting without runtime overhead - Single source of truth for the normalization logic fixes guacsec#1811 Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Luiz Carvalho <lucarval@redhat.com>

lcarva · 2026-02-19T21:04:28Z

@ctron, I believe I addressed all the comments.

lcarva force-pushed the fix-cve-ordering branch from a5150d6 to 07293c7 Compare October 8, 2025 14:18

sourcery-ai bot reviewed Oct 8, 2025

View reviewed changes

modules/fundamental/src/vulnerability/service/test.rs Show resolved Hide resolved

ctron mentioned this pull request Oct 13, 2025

Implement more generic vendor packages recommendations #2024

Open

Strum355 reviewed Oct 13, 2025

View reviewed changes

modules/fundamental/src/vulnerability/service/mod.rs Outdated Show resolved Hide resolved

lcarva force-pushed the fix-cve-ordering branch from 1328ffb to 5ae3648 Compare October 31, 2025 18:33

ptomanRH added this to Trustify Jan 20, 2026

lcarva force-pushed the fix-cve-ordering branch from 5ae3648 to 0b3c66f Compare February 18, 2026 20:20

lcarva force-pushed the fix-cve-ordering branch from 0b3c66f to 80f6499 Compare February 19, 2026 01:56

ctron requested changes Feb 19, 2026

View reviewed changes

lcarva force-pushed the fix-cve-ordering branch from 80f6499 to 28d8be6 Compare February 19, 2026 21:02

Comments

Conversation

lcarva commented Oct 8, 2025 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Entity relationship diagram for CVE ID sorting key

File-Level Changes

Assessment against linked issues

Possibly linked issues

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ctron commented Oct 9, 2025

Uh oh!

lcarva commented Oct 9, 2025

Uh oh!

ctron commented Oct 10, 2025

Uh oh!

Uh oh!

lcarva commented Oct 17, 2025

Uh oh!

lcarva commented Oct 31, 2025

Uh oh!

PhilipCattanach commented Jan 9, 2026

Uh oh!

ctron commented Feb 9, 2026

Uh oh!

PhilipCattanach commented Feb 9, 2026

Uh oh!

PhilipCattanach commented Feb 18, 2026

Uh oh!

lcarva commented Feb 18, 2026

Uh oh!

ctron left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lcarva commented Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

lcarva commented Oct 8, 2025 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Oct 8, 2025 •

edited

Loading