Skip to content

Conversation

@kashifkhan0771
Copy link
Contributor

@kashifkhan0771 kashifkhan0771 commented Dec 18, 2025

This PR refactors simplifiedGitlabEnumeration to use the legacy pagination API with keyset pagination instead of Scan2, which is currently in an experimental state. This change is intended as an experiment and does not guarantee a fix for the issue.

Additionally, it introduces retry logic to handle 500-level errors and adds logging to track the number of projects being enumerated.

Related issue on Gitlab: https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7516

Checklist:

  • Tests passing (make test-community)?
  • Lint passing (make lint this requires golangci-lint)?

This commit rewrite simplifiedGitlabEnumeration to use legacy pagination API call with keyset pagination instead of Scan2 which is currently in experimental state. Note
that this doesn't promise to fix this problem it's just a test to check. It also adds a retry logic in case any 500 error occurs. I added some logs as well to keep track
of no of projects being enumerated.
@kashifkhan0771 kashifkhan0771 requested a review from a team December 18, 2025 09:21
@kashifkhan0771 kashifkhan0771 requested a review from a team as a code owner December 18, 2025 09:21
@kashifkhan0771 kashifkhan0771 changed the title Backoff from Scan2 which is experimental to legacy pagination API call Gitlab Source: Backoff from Scan2 which is experimental to legacy pagination API call Dec 18, 2025
Copy link
Contributor

@martinlocklear martinlocklear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking at this! Some notes:

  • Definitely need test coverage for this before merging in.
  • I notice in our go.mod we're overriding the version of the library we're using. Is it possible that some of the issues that we've been seeing could be fixed by newer versions of this library?


projectQueryOptions := &gitlab.ListProjectsOptions{
ListOptions: listOpts,
Membership: gitlab.Ptr(true),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why we have membership set to true. (I'm not saying we shouldn't, just that I don't understand)

From the docs:

Limit by projects that the current user is a member of.

If this flag is potentially causing problems, why add in this limitation? Why not just scan everything that's visible to the current token, member or not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for adding this is that, if we don’t set membership to true for GitLab Cloud, the scan will enumerate all public GitLab projects. This is different from GitHub, where we explicitly restrict what can be scanned with the following condition:

if len(*githubScanOrgs) == 0 && len(*githubScanRepos) == 0 {
    return scanMetrics, fmt.Errorf("invalid config: you must specify at least one organization or repository")
}

This ensures that GitHub scans are limited to explicitly specified organizations or repositories. In GitLab, however, we don’t currently have an org option. As a result, a scan can run with just a token and attempt to enumerate everything that token has access to.

To prevent this on GitLab Cloud, we set membership=true so that only projects the user is a member of are enumerated. For self-managed GitLab instances, we disable this behavior, allowing the scan to enumerate all projects the token has access to on that instance.

I see two possible approaches:

  1. Add support for organization/group-based scanning in GitLab (similar to GitHub). With that restriction in place, we could remove the membership flag.
  2. Keep the current behavior to avoid unintentionally scanning all public projects on GitLab Cloud.

The second option ensures we don’t accidentally scan the entire set of public GitLab Cloud projects.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! Ok, that makes sense to me - I trust your judgement on the best path forward here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some testing and found that, by default, GitLab keeps the membership flag set to false. To take advantage of it, I reversed some conditions to avoid setting the membership flag for non-GitLab Cloud instances. This could help if the issue was indeed related to the membership flag. I also set the simple flag to true; according to the docs, this returns only the essential fields for each project instead of the full, large JSON objects.

apiClient, err := gitlab.NewOAuthClient(
s.token,
gitlab.WithBaseURL(s.url),
gitlab.WithCustomRetryWaitMinMax(time.Second, 5*time.Second),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like these are used in quite a few places. It'll be better to create constants for min and max.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

time.Second is already a constant from time and I don't like the idea of making 5 only a constant here. Maybe I am wrong but usually we use time like this n*time.Second.

// Fetch a page of projects from the GitLab API using the current query options.
projects, resp, err := apiClient.Projects.ListProjects(projectQueryOptions, requestOptions...)
if err != nil {
err = fmt.Errorf("received error on listing projects, you might not have permissions to do that: %w", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

totally optional thing:

Suggested change
err = fmt.Errorf("received error on listing projects, you might not have permissions to do that: %w", err)
err = fmt.Errorf("received an error while listing projects; this may be due to insufficient permissions: %w", err)

Copy link
Contributor

@martinlocklear martinlocklear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we still need to add unit testing before we merge this.


projectQueryOptions := &gitlab.ListProjectsOptions{
ListOptions: listOpts,
Membership: gitlab.Ptr(true),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! Ok, that makes sense to me - I trust your judgement on the best path forward here.

Reversed the gitlab cloud logic to add membership flag, so that we use the default false for non gitlab.com instances.
This can help if the issue really was membership flag as mentioned in some gitlab issues.
Also added simple flag in list projects to get only minimal fields in response instead of big json response for each project.

Added test case as well.
@kashifkhan0771
Copy link
Contributor Author

I think we still need to add unit testing before we merge this.

Added the test case

Copy link
Contributor

@camgunz camgunz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! If you want you could add a couple more checks in the test, like I'm pretty sure some other sources check chunk count and so on. But up to you--I know you've done a lot of local testing w/ this.

@camgunz
Copy link
Contributor

camgunz commented Jan 8, 2026

@martinlocklear

I notice in our go.mod we're overriding the version of the library we're using. Is it possible that some of the issues that we've been seeing could be fixed by newer versions of this library?

I just had a look, and while the changelog doesn't go as far back as our pinned version, I didn't see any bug fix that would've affected us. I made INS-249 to do this some day.

@kashifkhan0771
Copy link
Contributor Author

I notice in our go.mod we're overriding the version of the library we're using. Is it possible that some of the issues that we've been seeing could be fixed by newer versions of this library?

Somehow I missed this comment. The reason we did that was the latest version at that time was v1.134.0 but it deprecated existing Auth methods, also the Scan2 API was experimental so we pinned to version to avoid any new updates which accidentally break the API.

As Charlie said there isn't any bug fix in later versions, so I would prefer to update that in a separate PR.

// use simplified gitlab enumeration
feature.UseSimplifiedGitlabEnumeration.Store(true)

ctx, cancel := context.WithTimeout(context.Background(), time.Second*30)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just use the testing context, I think it's t.Context()

}

for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why even are we building this test table and then building a test runner? Let's just write a test. If we want to make our test setup and breakdown reusable, then we can convert this to a test suite and use SetupSuite or (better) SetupTest for that purpose.

}
}

func TestSource_Chunks_SimplifiedGitlabEnumeration(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we (in comments) describe what the purpose of this test is? Like, what would it tell us if it was failing, and what does it give us confidence in when it's passing.

@martinlocklear
Copy link
Contributor

Some non-blocking comments about the new test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants