Check new files for the correct copyright header in th CI#10012
Check new files for the correct copyright header in th CI#10012daniellehrner wants to merge 2 commits intobesu-eth:mainfrom
Conversation
Signed-off-by: daniellehrner <daniel.lehrner@consensys.net>
There was a problem hiding this comment.
Pull request overview
Adds a CI gate in the pre-review workflow to prevent newly added source files from using legacy Besu/ConsenSys copyright headers.
Changes:
- Introduce a new
copyright-headersjob to scan newly added files for legacy header patterns. - Wire the new job into
compileandtestjob dependencies so CI blocks on failures.
| if head -5 "$file" | grep -qE 'Copyright.*(ConsenSys AG|Hyperledger Besu|[0-9]{4} ConsenSys)'; then | ||
| echo "::error file=$file::Old-style copyright header found. New files must use: ' * Copyright contributors to Besu.'" | ||
| FAILED=true | ||
| fi | ||
| fi | ||
| done <<< "$NEW_FILES" | ||
|
|
||
| if [ "$FAILED" = "true" ]; then | ||
| echo "" | ||
| echo "One or more new files contain an old-style copyright header." | ||
| echo "Replace the copyright line with:" |
There was a problem hiding this comment.
This check only fails when it finds an old-style header, but it does not enforce that new files actually contain the required new header line. As written, a newly added file with no header (or a different/incorrect header) will still pass as long as it doesn’t match the legacy patterns. Consider explicitly validating presence of the required string (e.g., Copyright contributors to Besu. within the header block) and failing if it’s missing, in addition to failing on legacy patterns.
| if head -5 "$file" | grep -qE 'Copyright.*(ConsenSys AG|Hyperledger Besu|[0-9]{4} ConsenSys)'; then | |
| echo "::error file=$file::Old-style copyright header found. New files must use: ' * Copyright contributors to Besu.'" | |
| FAILED=true | |
| fi | |
| fi | |
| done <<< "$NEW_FILES" | |
| if [ "$FAILED" = "true" ]; then | |
| echo "" | |
| echo "One or more new files contain an old-style copyright header." | |
| echo "Replace the copyright line with:" | |
| HEADER="$(head -5 "$file")" | |
| if echo "$HEADER" | grep -qE 'Copyright.*(ConsenSys AG|Hyperledger Besu|[0-9]{4} ConsenSys)'; then | |
| echo "::error file=$file::Old-style copyright header found. New files must use: ' * Copyright contributors to Besu.'" | |
| FAILED=true | |
| elif ! echo "$HEADER" | grep -q 'Copyright contributors to Besu\.'; then | |
| echo "::error file=$file::Missing or incorrect copyright header. New files must include: ' * Copyright contributors to Besu.'" | |
| FAILED=true | |
| fi | |
| fi | |
| done <<< "$NEW_FILES" | |
| if [ "$FAILED" = "true" ]; then | |
| echo "" | |
| echo "One or more new files have incorrect copyright headers." | |
| echo "New files must include the line:" |
There was a problem hiding this comment.
claude agrees:
The check logic is inverted relative to the description:
The job name says "Check new files for the correct copyright header" but the implementation only detects and rejects old-style headers — it does not verify the new-style header is present. A file with no copyright
header at all would pass. If this is intentional (relying on spotless-checkLicense to enforce header presence), that should be noted in the PR description. If not, the check should additionally verify:
if ! head -5 "$file" | grep -q 'Copyright contributors to Besu'; then
echo "::error file=$file::Missing required copyright header..."
FAILED=true
fi
| FAILED=false | ||
| while IFS= read -r file; do | ||
| if [ -f "$file" ] && echo "$file" | grep -qE '\.(java|kt|groovy|sol)$'; then | ||
| if head -5 "$file" | grep -qE 'Copyright.*(ConsenSys AG|Hyperledger Besu|[0-9]{4} ConsenSys)'; then |
There was a problem hiding this comment.
Scanning only the first 5 lines is brittle: some files may have a multi-line comment opener, a shebang, formatting directives, or other boilerplate that pushes the copyright line beyond line 5. Expanding the scan window (e.g., first 20–50 lines) or stopping after the first comment block is parsed would reduce false negatives.
| if head -5 "$file" | grep -qE 'Copyright.*(ConsenSys AG|Hyperledger Besu|[0-9]{4} ConsenSys)'; then | |
| if head -50 "$file" | grep -qE 'Copyright.*(ConsenSys AG|Hyperledger Besu|[0-9]{4} ConsenSys)'; then |
| while IFS= read -r file; do | ||
| if [ -f "$file" ] && echo "$file" | grep -qE '\.(java|kt|groovy|sol)$'; then | ||
| if head -5 "$file" | grep -qE 'Copyright.*(ConsenSys AG|Hyperledger Besu|[0-9]{4} ConsenSys)'; then | ||
| echo "::error file=$file::Old-style copyright header found. New files must use: ' * Copyright contributors to Besu.'" |
There was a problem hiding this comment.
The message says an 'old-style copyright header' was found, but the detection regex is broader (it matches Hyperledger Besu and other patterns anywhere in the first lines). To reduce confusion during CI failures, consider making the error more specific (e.g., 'Detected legacy header text (ConsenSys/Hyperledger Besu)') and printing the exact required line without extra quoting/spacing ambiguity.
macfarla
left a comment
There was a problem hiding this comment.
not sure how much of an issue this would be in practice but:
--depth=1 for base branch fetch may produce incorrect merge base for stale PRs:
git fetch origin "$BASE_BRANCH" --depth=1
git diff --name-only --diff-filter=A "origin/${BASE_BRANCH}...HEAD"
The three-dot ... syntax computes the diff from the common ancestor. With only 1 commit of the base branch fetched, the merge base might not be available if the PR is stale (base branch has advanced significantly).
In that case, git diff may show more files than were actually added in the PR, causing false positives.
Consider increasing to --depth=50 or using --shallow-since tied to the PR creation date. Alternatively, for correctness, compare against GITHUB_BASE_SHA if available:
git diff --name-only --diff-filter=A "${{ github.event.pull_request.base.sha }}...HEAD"
This uses the exact base SHA for PR events, which is already fetched with fetch-depth: 0.
PR description
Fixed Issue(s)
Thanks for sending a pull request! Have you done the following?
doc-change-requiredlabel to this PR if updates are required.Locally, you can run these tests to catch failures early:
./gradlew spotlessApply./gradlew build./gradlew acceptanceTest./gradlew integrationTest./gradlew ethereum:referenceTests:referenceTests