-
Notifications
You must be signed in to change notification settings - Fork 110
fetch all vulnerability alerts #633
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fetch all vulnerability alerts #633
Conversation
12b46a5 to
425e6d2
Compare
425e6d2 to
1ee296f
Compare
truggeri
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I think there are some good changes in here, but I'd like to better define the behavior for multipage fetches.
src/dependabot/verified_commits.ts
Outdated
|
|
||
| const nodes = alerts?.repository?.vulnerabilityAlerts?.nodes | ||
| const found = nodes.find(a => (version === '' || a.vulnerableRequirements === `= ${version}`) && | ||
| if (result.repository.vulnerabilityAlerts.pageInfo.hasNextPage) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would keep the optional chaining as they existed before. Even if we've typed the api response, these runtime guards will gracefully handle any bad data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The optional chaining above is incomplete. If one of the result properties is undefined, then nodes will be undefined. Invoking find on undefined will result in a runtime error. Since such problems can arise everywhere which accesses the result, a better approach would be a type guard.
src/dependabot/verified_commits.ts
Outdated
| const nextPageNodes = await fetchVulnerabilityAlerts(client, repoOwner, repoName, result.repository.vulnerabilityAlerts.pageInfo.endCursor); | ||
| return [...result.repository.vulnerabilityAlerts.nodes, ...nextPageNodes ]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This technique will fetch up to two pages, but not three. As the initial issue commented what if there are 101 alerts, I could ask what if there are 201? This solution is too specific to two pages.
A fix for this would be a loop to fetch while there is a next page, but would also need a maximum pages to fetch to avoid a "runaway" query.
One final note is that I believe there are options that are more performant than spreading into an array. Using concat or the mutable push would be more performant on large arrays. Not a big deal, but we are talking about potentially "large" data being joined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function calls itself recursively and therefore fetches more than 200 entries. In one of our big repos we have more than 500 entries, and the function fetches all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I see that now, sorry for not catching it on first review. The concern this raises for me is not having an upper limit to the number of recursive calls. We're not being very defensive about the depth of the call stack where each call is making a network request. Not that I think there'll be an issue in typical use cases, but it will run away and fail if anything goes wrong.
Perhaps setting a max page fetch constant and skipping the recursive call if that limit has been reached would provide safety against this potential issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I understand your concern, but then we would be back at the beginning. The current implementation on the main branch fetches only one page / 100 items, and the interessting part for my use case was mostly after the first page. So there should be an option to fetch everything.
Maybe it also makes sense to allow some filtering. I'm only interested in Open issues, but the function needs to fetch more than 600 items for my repo, and afterwards I'm throwing away more than 550 elements ...
Btw: The array creation via spread operator and contact should behave similary. Both are creating a new array, and both need to copy the elements into the new array. The push will resize the array, but this is usually done via doubling the size and therefore less memory copy operations are needed. However, here the time for the queries are magnitudes higher than the array operations. (I will change it after the query issue has been clarified.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally agree on time complexity, my brain just looks out for using spread in these cases. Not a big deal.
I don't disagree with the need for the feature, I'm only calling out that with this implementation there's a chance of an unintentional bug. Because the condition for further recursive iterations is coming from an api response, this code can't really "guard" against an infinite loop. So if someone using this functionality has either a) a massive number of pages, more than is reasonable to fetch or b) somehow is receiving malicious responses (like a man in the middle), they will have a bad experience.
How about we merge this as is, which gets the feature out to those of you that need it, then I can follow up with an upper limit to guard against this edge case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some commits to check the results after each fetch to allow returning early / avoid overfetching / eliminate merging of arrays. I also added a guard to limit the fetches. What do you think of it?
src/dependabot/verified_commits.ts
Outdated
| async function fetchVulnerabilityAlerts(client: InstanceType<typeof GitHub>, repoOwner: string, repoName: string, endCursor?: string): Promise<RepositoryVulnerabilityAlert[]> { | ||
| core.debug(`Fetching vulnerability alerts for cursor ${endCursor ?? 'start'}`); | ||
| const result: RepositoryVulnerabilityAlertsResult = await client.graphql(` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a great abstraction function. I would recommend two further enhancements.
- Pass an optional parameter for the number of records to fetch, defaulting to
100. No one is asking for this right now, but it's a clear config value that's currently hidden in the GraphQL query string. - Make the endCursor check a variable before the
const resultline to make it even more clear to readers.
const alertsPreamble = `first ${fetchSize} ${endCursor ? ', after: "' + endCursor + '"' : ''}`;
const results = //....
vulnerabilityAlerts(${alertsPreamble}) {
src/dependabot/verified_commits.ts
Outdated
| const nodes = await fetchVulnerabilityAlerts(client, context.repo.owner, context.repo.repo); | ||
| core.debug(`Fetched ${nodes.length} vulnerability alerts`); | ||
|
|
||
| const found = nodes.find(a => (version === '' || a.vulnerableRequirements === `${version}` || a.vulnerableRequirements === `= ${version}`) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you provide more detail about the ${version} condition? It's not clear to me why this is necessary other than you saw it one time. I wonder if this means we should do more strict parsing in general.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked our alerts, and indeed there was one entry which uses a different pattern than plain number or starting with an equal sign:
{
"vulnerableManifestFilename": "package.json",
"vulnerableManifestPath": "some/path/to/package.json",
"vulnerableRequirements": "^2.4.2",
"state": "FIXED",
"securityVulnerability": {
"package": {
"name": "dompurify"
}
},
"securityAdvisory": {
"cvss": {
"score": 4.5
},
"ghsaId": "GHSA-vhxf-7vqr-mrjg"
}
}I don't know other prefixes could possibly appear in vulnerableRequirements. Is it documented somewhere?
| .post('/graphql', query).reply(200, responsePage1) | ||
| .post('/graphql', query2).reply(200, response) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a minor comment, but the variable naming here is inconsistent. We have query and query2 which is too ambiguous, but that maps to responses responsePage1 and response. I would prefer,
firstQuery -> firstResponse
secondQuery -> secondResponse
or
query1 -> response1
query2 -> response2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A simple renaming does not work because query is returning response, responseWithManifestFileAtRoot or responseWithoutEqInFrontOfVulnerableRequirements depending on the test. I "inserted" responsePage1 before the default response is returnend in order to keep the diff small, but I agree that just looking at this is a bit confusing. I will think about an improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, okay, I hadn't considered the diff size as part of the decision making. In that case, perhaps making responsePage1 something more specific to your intent, like responseWithoutCoffeeScript or even making an unnecessary variable for naming (const responsePage2 = response). These are small suggestions, not worth holding the PR over.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I changed the test by reassigning query and response. I also added another response "page" to illustrate that the method is looping via recursion. Additionally, I used template strings + two replaceAll() methods to make the query readable in the test file.
But I'm not sure if I should touch the inconsistent indention, e.g. children of securityAdvisory and pageInfo are using only one space indent while other lines use 2. A proper solution would be extracting the query to a template function, and then also using this function in the tests.
| .post('/graphql', query).reply(200, responsePage1) | ||
| .post('/graphql', query2).reply(200, response) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, okay, I hadn't considered the diff size as part of the decision making. In that case, perhaps making responsePage1 something more specific to your intent, like responseWithoutCoffeeScript or even making an unnecessary variable for naming (const responsePage2 = response). These are small suggestions, not worth holding the PR over.
src/dependabot/verified_commits.ts
Outdated
| const nextPageNodes = await fetchVulnerabilityAlerts(client, repoOwner, repoName, result.repository.vulnerabilityAlerts.pageInfo.endCursor); | ||
| return [...result.repository.vulnerabilityAlerts.nodes, ...nextPageNodes ]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I see that now, sorry for not catching it on first review. The concern this raises for me is not having an upper limit to the number of recursive calls. We're not being very defensive about the depth of the call stack where each call is making a network request. Not that I think there'll be an issue in typical use cases, but it will run away and fail if anything goes wrong.
Perhaps setting a max page fetch constant and skipping the recursive call if that limit has been reached would provide safety against this potential issue.
truggeri
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes work for me. I'd like to circle back to adding an upper limit to the number of pages requested and add a log message, but I don't think that should hold this change going out.
| } | ||
|
|
||
| fetchedResults += nodes.length; | ||
| if (fetchDepth > 0 && fetchedResults >= fetchDepth) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] I generally don't like having this kind of thing nested within other logic, but this isn't a huge deal here.
Fetch all security alerts of a repository. Fixes #542
Also slightly adjusted the find check for the security nodes because I had e.g. something like this in the result list (note the missing
=in front of the version ofvulnerableRequirements):{ "vulnerableManifestFilename": "yarn.lock", "vulnerableManifestPath": "cypress/yarn.lock", "vulnerableRequirements": "4.4.0", "state": "OPEN", "securityVulnerability": { "package": { "name": "terser" } }, "securityAdvisory": { "cvss": { "score": 7.5 }, "ghsaId": "GHSA-4wf5-vphf-c2xc" } }