Skip to content

Check new files for the correct copyright header in th CI#10012

Open
daniellehrner wants to merge 2 commits intobesu-eth:mainfrom
daniellehrner:feat/check_copyright_new_files
Open

Check new files for the correct copyright header in th CI#10012
daniellehrner wants to merge 2 commits intobesu-eth:mainfrom
daniellehrner:feat/check_copyright_new_files

Conversation

@daniellehrner
Copy link
Contributor

PR description

Fixed Issue(s)

Thanks for sending a pull request! Have you done the following?

  • Checked out our contribution guidelines?
  • Considered documentation and added the doc-change-required label to this PR if updates are required.
  • Considered the changelog and included an update if required.
  • For database changes (e.g. KeyValueSegmentIdentifier) considered compatibility and performed forwards and backwards compatibility tests

Locally, you can run these tests to catch failures early:

  • spotless: ./gradlew spotlessApply
  • unit tests: ./gradlew build
  • acceptance tests: ./gradlew acceptanceTest
  • integration tests: ./gradlew integrationTest
  • reference tests: ./gradlew ethereum:referenceTests:referenceTests
  • hive tests: Engine or other RPCs modified?

Signed-off-by: daniellehrner <daniel.lehrner@consensys.net>
Copilot AI review requested due to automatic review settings March 10, 2026 14:07
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a CI gate in the pre-review workflow to prevent newly added source files from using legacy Besu/ConsenSys copyright headers.

Changes:

  • Introduce a new copyright-headers job to scan newly added files for legacy header patterns.
  • Wire the new job into compile and test job dependencies so CI blocks on failures.

Comment on lines +64 to +74
if head -5 "$file" | grep -qE 'Copyright.*(ConsenSys AG|Hyperledger Besu|[0-9]{4} ConsenSys)'; then
echo "::error file=$file::Old-style copyright header found. New files must use: ' * Copyright contributors to Besu.'"
FAILED=true
fi
fi
done <<< "$NEW_FILES"

if [ "$FAILED" = "true" ]; then
echo ""
echo "One or more new files contain an old-style copyright header."
echo "Replace the copyright line with:"
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check only fails when it finds an old-style header, but it does not enforce that new files actually contain the required new header line. As written, a newly added file with no header (or a different/incorrect header) will still pass as long as it doesn’t match the legacy patterns. Consider explicitly validating presence of the required string (e.g., Copyright contributors to Besu. within the header block) and failing if it’s missing, in addition to failing on legacy patterns.

Suggested change
if head -5 "$file" | grep -qE 'Copyright.*(ConsenSys AG|Hyperledger Besu|[0-9]{4} ConsenSys)'; then
echo "::error file=$file::Old-style copyright header found. New files must use: ' * Copyright contributors to Besu.'"
FAILED=true
fi
fi
done <<< "$NEW_FILES"
if [ "$FAILED" = "true" ]; then
echo ""
echo "One or more new files contain an old-style copyright header."
echo "Replace the copyright line with:"
HEADER="$(head -5 "$file")"
if echo "$HEADER" | grep -qE 'Copyright.*(ConsenSys AG|Hyperledger Besu|[0-9]{4} ConsenSys)'; then
echo "::error file=$file::Old-style copyright header found. New files must use: ' * Copyright contributors to Besu.'"
FAILED=true
elif ! echo "$HEADER" | grep -q 'Copyright contributors to Besu\.'; then
echo "::error file=$file::Missing or incorrect copyright header. New files must include: ' * Copyright contributors to Besu.'"
FAILED=true
fi
fi
done <<< "$NEW_FILES"
if [ "$FAILED" = "true" ]; then
echo ""
echo "One or more new files have incorrect copyright headers."
echo "New files must include the line:"

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

claude agrees:
The check logic is inverted relative to the description:

The job name says "Check new files for the correct copyright header" but the implementation only detects and rejects old-style headers — it does not verify the new-style header is present. A file with no copyright
header at all would pass. If this is intentional (relying on spotless-checkLicense to enforce header presence), that should be noted in the PR description. If not, the check should additionally verify:

if ! head -5 "$file" | grep -q 'Copyright contributors to Besu'; then
echo "::error file=$file::Missing required copyright header..."
FAILED=true
fi

FAILED=false
while IFS= read -r file; do
if [ -f "$file" ] && echo "$file" | grep -qE '\.(java|kt|groovy|sol)$'; then
if head -5 "$file" | grep -qE 'Copyright.*(ConsenSys AG|Hyperledger Besu|[0-9]{4} ConsenSys)'; then
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scanning only the first 5 lines is brittle: some files may have a multi-line comment opener, a shebang, formatting directives, or other boilerplate that pushes the copyright line beyond line 5. Expanding the scan window (e.g., first 20–50 lines) or stopping after the first comment block is parsed would reduce false negatives.

Suggested change
if head -5 "$file" | grep -qE 'Copyright.*(ConsenSys AG|Hyperledger Besu|[0-9]{4} ConsenSys)'; then
if head -50 "$file" | grep -qE 'Copyright.*(ConsenSys AG|Hyperledger Besu|[0-9]{4} ConsenSys)'; then

Copilot uses AI. Check for mistakes.
while IFS= read -r file; do
if [ -f "$file" ] && echo "$file" | grep -qE '\.(java|kt|groovy|sol)$'; then
if head -5 "$file" | grep -qE 'Copyright.*(ConsenSys AG|Hyperledger Besu|[0-9]{4} ConsenSys)'; then
echo "::error file=$file::Old-style copyright header found. New files must use: ' * Copyright contributors to Besu.'"
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The message says an 'old-style copyright header' was found, but the detection regex is broader (it matches Hyperledger Besu and other patterns anywhere in the first lines). To reduce confusion during CI failures, consider making the error more specific (e.g., 'Detected legacy header text (ConsenSys/Hyperledger Besu)') and printing the exact required line without extra quoting/spacing ambiguity.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@macfarla macfarla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure how much of an issue this would be in practice but:

--depth=1 for base branch fetch may produce incorrect merge base for stale PRs:

git fetch origin "$BASE_BRANCH" --depth=1
git diff --name-only --diff-filter=A "origin/${BASE_BRANCH}...HEAD"

The three-dot ... syntax computes the diff from the common ancestor. With only 1 commit of the base branch fetched, the merge base might not be available if the PR is stale (base branch has advanced significantly).
In that case, git diff may show more files than were actually added in the PR, causing false positives.

Consider increasing to --depth=50 or using --shallow-since tied to the PR creation date. Alternatively, for correctness, compare against GITHUB_BASE_SHA if available:
git diff --name-only --diff-filter=A "${{ github.event.pull_request.base.sha }}...HEAD"
This uses the exact base SHA for PR events, which is already fetched with fetch-depth: 0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants