Link checker selectiveness: run only on changed files on small PRs#2708
Merged
holly-cummins merged 1 commit intoJun 12, 2026
Merged
Conversation
gsmet
approved these changes
Jun 12, 2026
Compare the built site against the gh-pages branch to find changed HTML files. When 15 or fewer pages changed and no build infrastructure files were modified, run the link crawler only on those pages (depth-1 check). This avoids a full-site crawl on small PRs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> # Conflicts: # src/test/java/io/quarkusio/LinkCrawlerTest.java
c2215d2 to
ff5c236
Compare
|
🙈 The PR is closed and the preview is expired. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is the promised follow-on to #2685 to reduce the build-time impact. It only checks changed files for internal dead links. This does introduce a gap where if a file is moved, any files that reference that file wouldn't be checked, so build would be green. However, the external link checker (#2697) would pick that up on its scheduled runs and raise a defect. I think the trade-off is worth it for the improved build speed.
If someone has a PR which is lagging way behind
main, they'll also get a slow build, but I think we can live with that.Here's the logic:
_site/against thegh-pagesbranch (the deployed site) usingrsyncto find changed HTML files (using a git diff would pick up adoc changes instead, and we'd have to map from them to the html)build.yml,pom.xml,src/test/java/) were modified, runs the link crawler only on the changed pages with a depth-1 check (verifies outbound links from those pages)In this PR, we can always trigger the 'full' scenario so can only exercise the 'full' paths, but I've tried my best to test the others locally. I'll also keep an eye on PRs that go in after this merges to try to validate the scope detection is working as expected.
Attempts at testing it
/blog/,/about/) — crawled 217 pages (depth-1) vs thousands for a full crawl, 0 broken links