Skip to content

Conversation

@robrat
Copy link

@robrat robrat commented Jul 16, 2025

Fetch all security alerts of a repository. Fixes #542

Also slightly adjusted the find check for the security nodes because I had e.g. something like this in the result list (note the missing = in front of the version of vulnerableRequirements):

{
  "vulnerableManifestFilename": "yarn.lock",
  "vulnerableManifestPath": "cypress/yarn.lock",
  "vulnerableRequirements": "4.4.0",
  "state": "OPEN",
  "securityVulnerability": {
    "package": {
      "name": "terser"
    }
  },
  "securityAdvisory": {
    "cvss": {
      "score": 7.5
    },
    "ghsaId": "GHSA-4wf5-vphf-c2xc"
  }
}

@robrat robrat requested a review from a team as a code owner July 16, 2025 10:55
@robrat robrat force-pushed the feat/fetch-all-vulnerability-alerts branch from 12b46a5 to 425e6d2 Compare July 29, 2025 09:20
@robrat robrat force-pushed the feat/fetch-all-vulnerability-alerts branch from 425e6d2 to 1ee296f Compare July 29, 2025 10:34
Copy link
Contributor

@truggeri truggeri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I think there are some good changes in here, but I'd like to better define the behavior for multipage fetches.


const nodes = alerts?.repository?.vulnerabilityAlerts?.nodes
const found = nodes.find(a => (version === '' || a.vulnerableRequirements === `= ${version}`) &&
if (result.repository.vulnerabilityAlerts.pageInfo.hasNextPage) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep the optional chaining as they existed before. Even if we've typed the api response, these runtime guards will gracefully handle any bad data.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The optional chaining above is incomplete. If one of the result properties is undefined, then nodes will be undefined. Invoking find on undefined will result in a runtime error. Since such problems can arise everywhere which accesses the result, a better approach would be a type guard.

Comment on lines 120 to 121
const nextPageNodes = await fetchVulnerabilityAlerts(client, repoOwner, repoName, result.repository.vulnerabilityAlerts.pageInfo.endCursor);
return [...result.repository.vulnerabilityAlerts.nodes, ...nextPageNodes ];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This technique will fetch up to two pages, but not three. As the initial issue commented what if there are 101 alerts, I could ask what if there are 201? This solution is too specific to two pages.

A fix for this would be a loop to fetch while there is a next page, but would also need a maximum pages to fetch to avoid a "runaway" query.

One final note is that I believe there are options that are more performant than spreading into an array. Using concat or the mutable push would be more performant on large arrays. Not a big deal, but we are talking about potentially "large" data being joined.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function calls itself recursively and therefore fetches more than 200 entries. In one of our big repos we have more than 500 entries, and the function fetches all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I see that now, sorry for not catching it on first review. The concern this raises for me is not having an upper limit to the number of recursive calls. We're not being very defensive about the depth of the call stack where each call is making a network request. Not that I think there'll be an issue in typical use cases, but it will run away and fail if anything goes wrong.

Perhaps setting a max page fetch constant and skipping the recursive call if that limit has been reached would provide safety against this potential issue.

Copy link
Author

@robrat robrat Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I understand your concern, but then we would be back at the beginning. The current implementation on the main branch fetches only one page / 100 items, and the interessting part for my use case was mostly after the first page. So there should be an option to fetch everything.
Maybe it also makes sense to allow some filtering. I'm only interested in Open issues, but the function needs to fetch more than 600 items for my repo, and afterwards I'm throwing away more than 550 elements ...

Btw: The array creation via spread operator and contact should behave similary. Both are creating a new array, and both need to copy the elements into the new array. The push will resize the array, but this is usually done via doubling the size and therefore less memory copy operations are needed. However, here the time for the queries are magnitudes higher than the array operations. (I will change it after the query issue has been clarified.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agree on time complexity, my brain just looks out for using spread in these cases. Not a big deal.

I don't disagree with the need for the feature, I'm only calling out that with this implementation there's a chance of an unintentional bug. Because the condition for further recursive iterations is coming from an api response, this code can't really "guard" against an infinite loop. So if someone using this functionality has either a) a massive number of pages, more than is reasonable to fetch or b) somehow is receiving malicious responses (like a man in the middle), they will have a bad experience.

How about we merge this as is, which gets the feature out to those of you that need it, then I can follow up with an upper limit to guard against this edge case?

Copy link
Author

@robrat robrat Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some commits to check the results after each fetch to allow returning early / avoid overfetching / eliminate merging of arrays. I also added a guard to limit the fetches. What do you think of it?

Comment on lines 92 to 94
async function fetchVulnerabilityAlerts(client: InstanceType<typeof GitHub>, repoOwner: string, repoName: string, endCursor?: string): Promise<RepositoryVulnerabilityAlert[]> {
core.debug(`Fetching vulnerability alerts for cursor ${endCursor ?? 'start'}`);
const result: RepositoryVulnerabilityAlertsResult = await client.graphql(`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a great abstraction function. I would recommend two further enhancements.

  1. Pass an optional parameter for the number of records to fetch, defaulting to 100. No one is asking for this right now, but it's a clear config value that's currently hidden in the GraphQL query string.
  2. Make the endCursor check a variable before the const result line to make it even more clear to readers.
const alertsPreamble = `first ${fetchSize} ${endCursor ? ', after: "' + endCursor + '"' : ''}`;
const results = //....
  vulnerabilityAlerts(${alertsPreamble}) {

const nodes = await fetchVulnerabilityAlerts(client, context.repo.owner, context.repo.repo);
core.debug(`Fetched ${nodes.length} vulnerability alerts`);

const found = nodes.find(a => (version === '' || a.vulnerableRequirements === `${version}` || a.vulnerableRequirements === `= ${version}`) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide more detail about the ${version} condition? It's not clear to me why this is necessary other than you saw it one time. I wonder if this means we should do more strict parsing in general.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked our alerts, and indeed there was one entry which uses a different pattern than plain number or starting with an equal sign:

{
  "vulnerableManifestFilename": "package.json",
  "vulnerableManifestPath": "some/path/to/package.json",
  "vulnerableRequirements": "^2.4.2",
  "state": "FIXED",
  "securityVulnerability": {
    "package": {
      "name": "dompurify"
    }
  },
  "securityAdvisory": {
    "cvss": {
      "score": 4.5
    },
    "ghsaId": "GHSA-vhxf-7vqr-mrjg"
  }
}

I don't know other prefixes could possibly appear in vulnerableRequirements. Is it documented somewhere?

Comment on lines 295 to 296
.post('/graphql', query).reply(200, responsePage1)
.post('/graphql', query2).reply(200, response)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a minor comment, but the variable naming here is inconsistent. We have query and query2 which is too ambiguous, but that maps to responses responsePage1 and response. I would prefer,

firstQuery -> firstResponse
secondQuery -> secondResponse

or

query1 -> response1
query2 -> response2

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A simple renaming does not work because query is returning response, responseWithManifestFileAtRoot or responseWithoutEqInFrontOfVulnerableRequirements depending on the test. I "inserted" responsePage1 before the default response is returnend in order to keep the diff small, but I agree that just looking at this is a bit confusing. I will think about an improvement.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, okay, I hadn't considered the diff size as part of the decision making. In that case, perhaps making responsePage1 something more specific to your intent, like responseWithoutCoffeeScript or even making an unnecessary variable for naming (const responsePage2 = response). These are small suggestions, not worth holding the PR over.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I changed the test by reassigning query and response. I also added another response "page" to illustrate that the method is looping via recursion. Additionally, I used template strings + two replaceAll() methods to make the query readable in the test file.
But I'm not sure if I should touch the inconsistent indention, e.g. children of securityAdvisory and pageInfo are using only one space indent while other lines use 2. A proper solution would be extracting the query to a template function, and then also using this function in the tests.

Comment on lines 295 to 296
.post('/graphql', query).reply(200, responsePage1)
.post('/graphql', query2).reply(200, response)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, okay, I hadn't considered the diff size as part of the decision making. In that case, perhaps making responsePage1 something more specific to your intent, like responseWithoutCoffeeScript or even making an unnecessary variable for naming (const responsePage2 = response). These are small suggestions, not worth holding the PR over.

Comment on lines 120 to 121
const nextPageNodes = await fetchVulnerabilityAlerts(client, repoOwner, repoName, result.repository.vulnerabilityAlerts.pageInfo.endCursor);
return [...result.repository.vulnerabilityAlerts.nodes, ...nextPageNodes ];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I see that now, sorry for not catching it on first review. The concern this raises for me is not having an upper limit to the number of recursive calls. We're not being very defensive about the depth of the call stack where each call is making a network request. Not that I think there'll be an issue in typical use cases, but it will run away and fail if anything goes wrong.

Perhaps setting a max page fetch constant and skipping the recursive call if that limit has been reached would provide safety against this potential issue.

truggeri
truggeri previously approved these changes Jan 7, 2026
Copy link
Contributor

@truggeri truggeri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes work for me. I'd like to circle back to adding an upper limit to the number of pages requested and add a log message, but I don't think that should hold this change going out.

}

fetchedResults += nodes.length;
if (fetchDepth > 0 && fetchedResults >= fetchDepth) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] I generally don't like having this kind of thing nested within other logic, but this isn't a huge deal here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Does not work reliably if there are more than 100 vulnerability alerts for the repo

2 participants