fix: report accurate line numbers for chunked file scanning (#1876) #4615

GLEF1X · 2025-12-23T06:23:30Z

Description:

Problem

Files are processed in ~13KB chunks by default (10KB of actual data plus a 3KB peek into the next chunk). To infer the line the engine would take the line from chunk's metadata SourceMetadata.Line and would consider it absolute, that is it's relative to the entire file. The problem is it was never set anywhere. Since every chunk had line equals to 1 in metadata, all secret scan results appeared at line 1 + FragmentLineOffset() which resulted in line being relative to the chunk not the file itself.

Solution

Track cumulative line numbers in handleNonArchiveContent() by counting newlines as chunks are processed. Each DataOrErr now carries the correct starting line, which flows through to SourceMetadata.Filesystem.Line, giving FragmentFirstLineAndLink() the correct fragStart for the final calculation.

Affected Sources

Directly fixed:

Filesystem - now reports accurate line numbers. I have tested it using the test file provided in the issue wget https://gist.githubusercontent.com/det/1526b4c16d0e07ac023d75c912a68658/raw/c3061c14a811205a65cbdcf0065bd3c11d88bfcb/test.txt

Not affected (no Line field in proto):

S3, GCS, Jenkins, stdin - use handlers but their metadata protos lack a Line field. I think it makes sense for S3 and GCS to report line numbers so it could be a good future change.

[THIS PART WAS EDITED AFTER REALIZING GIT-BASED SOURCES ARE PARTIALLY AFFECTED]
Partially affected (own line tracking):

Git, GitHub, GitLab - regular text diffs use git-based scanning with built-in line tracking (unaffected). However, binary archives (tar.gz, zip) go through HandleBinary() → handlers.HandleFile() and benefit from this fix.
others like BitBucket use their own implementations so not affected

Checklist:

Tests passing (make test-community)?
Lint passing (make lint this requires golangci-lint)?

…ecurity#1876) Files processed through handlers are split into ~10KB chunks, but line numbers weren't tracked across boundaries, causing secrets to always report line 0. - Add LineNumber field to DataOrErr to propagate line context - Add ContentSize() to ChunkResult to avoid double-counting peek overlap - Clone metadata per chunk via populateChunkLineNumber() to prevent shared state

CLAassistant · 2025-12-23T06:23:36Z

All committers have signed the CLA.

rosecodym

Thank you for this PR - it is beautifully precise, detailed, and documented. In addition to the few inline comments I left, I have this question: Does this change generated by the default handler to be 1-indexed instead of 0-indexed?

rosecodym · 2026-01-05T16:42:39Z

pkg/sources/chunker.go

 }

+// NewChunkResult creates a ChunkResult with the given data and content size.
+// This is primarily used for testing to create mock chunk results.


Does it have to be primarily used for this? It looks like we could use this constructor in the actual program logic, too. We're (slowly, gradually) trying to reduce the number of identical things done multiple different ways, so I think that if there's only one place in the program logic that ChunkResults are created, it would be a good idea to use this constructor there too.

I agree, it should be more idiomatic especially since it's a little weird to use public constructor just for testing. I committed a change

rosecodym · 2026-01-05T16:45:53Z

pkg/handlers/handlers.go

+	case *source_metadatapb.MetaData_Github:
+		m.Github.Line = lineNumber
+	case *source_metadatapb.MetaData_Gitlab:
+		m.Gitlab.Line = lineNumber
+	case *source_metadatapb.MetaData_Git:
+		m.Git.Line = lineNumber


Your PR says that these sources are "not affected" by this change. Are they included here just for completeness?

(To be clear, I would support this - although I think a comment warning about the oddity would be a good idea.)

Thanks for calling this out! You're right, PR description was a bit inaccurate. I've added it for completeness because some git logic used it. I retested the binary file handling logic for git-based sources (local git, GitHub, GitLab) and confirmed this issue affects them as well.

Initially, I wasn't sure if

trufflehog/pkg/gitparse/gitparse.go

Lines 474 to 489 in 50aa469

case isBinaryLine(latestState, line):

latestState = BinaryFileLine

path, ok := pathFromBinaryLine(line)

if !ok {

err = fmt.Errorf(`expected line to match 'Binary files a/fileA and b/fileB differ', got "%s"`, line)

ctx.Logger().Error(err, "Failed to parse BinaryFileLine")

latestState = ParseFailure

continue

}

// Don't do anything if the file is deleted. (pathA has file path, pathB is /dev/null)

if path != "" {

currentDiff.PathB = path

currentDiff.IsBinary = true

}

was legacy code or actively used, and I couldn't reproduce the issue at the time. I've since been able to reproduce it using this test repo: https://github.com/GLEF1X/test-trufflehog-again

(Note: the commits are unverified — I used a random username for testing purposes.)

I took the same file from the original issue, packed it into a tarball, and committed it. Results:

Before fix

... Found unverified result 🐷🔑❓ Detector Type: Github Decoder Type: PLAIN Raw result: ghs_012345678901234567890123456789012345 Version: 2 Rotation_guide: https://howtorotate.com/docs/tutorials/github/ Commit: 0f189b9a88e42452d0384a661f33da16b718b2e5 Email: T <[email protected]> File: content/secret.txt Line: 557 <- uses git chunker so it's correctly infrred Link: https://github.com/GLEF1X/test-trufflehog-again/blob/0f189b9a88e42452d0384a661f33da16b718b2e5/content/secret.txt#L557 Repository: https://github.com/GLEF1X/test-trufflehog-again.git Timestamp: 2026-01-07 01:32:05 +0000 ... Found unverified result 🐷🔑❓ Detector Type: Github Decoder Type: PLAIN Raw result: ghs_012345678901234567890123456789012345 Rotation_guide: https://howtorotate.com/docs/tutorials/github/ Version: 2 Commit: 0f189b9a88e42452d0384a661f33da16b718b2e5 Email: T <[email protected]> File: secret.tar.gz Line: 288 <- uses HandleBinary(thus HandleFile which had a bug with chunk line numbers) Link: https://github.com/GLEF1X/test-trufflehog-again/blob/0f189b9a88e42452d0384a661f33da16b718b2e5/secret.tar.gz#L288 Repository: https://github.com/GLEF1X/test-trufflehog-again.git Repository_local_path: /var/folders/hw/r5j4bcyd3472ccwjz0klh5840000gn/T/trufflehog-40414-225149517 Timestamp: 2026-01-07 01:32:05 +0000 ...

After fix

... Found unverified result 🐷🔑❓ Detector Type: Github Decoder Type: PLAIN Raw result: ghs_012345678901234567890123456789012345 Rotation_guide: https://howtorotate.com/docs/tutorials/github/ Version: 2 Commit: 0f189b9a88e42452d0384a661f33da16b718b2e5 Email: T <[email protected]> File: content/secret.txt Line: 557 Link: https://github.com/GLEF1X/test-trufflehog-again/blob/0f189b9a88e42452d0384a661f33da16b718b2e5/content/secret.txt#L557 Repository: https://github.com/GLEF1X/test-trufflehog-again.git Timestamp: 2026-01-07 01:32:05 +0000 ... Found unverified result 🐷🔑❓ Detector Type: Github Decoder Type: PLAIN Raw result: ghs_012345678901234567890123456789012345 Rotation_guide: https://howtorotate.com/docs/tutorials/github/ Version: 2 Commit: 0f189b9a88e42452d0384a661f33da16b718b2e5 Email: T <[email protected]> File: secret.tar.gz Line: 557 <- same line number Link: https://github.com/GLEF1X/test-trufflehog-again/blob/0f189b9a88e42452d0384a661f33da16b718b2e5/secret.tar.gz#L557 Repository: https://github.com/GLEF1X/test-trufflehog-again.git Repository_local_path: /var/folders/hw/r5j4bcyd3472ccwjz0klh5840000gn/T/trufflehog-40569-2815386483 Timestamp: 2026-01-07 01:32:05 +0000 ...

This applies to binary files in git, github and gitlab

trufflehog/pkg/sources/git/git.go

Lines 799 to 808 in 50aa469

if err := HandleBinary(ctx, gitDir, reporter, chunkSkel, commitHash, fileName, s.skipArchives); err != nil {

logger.Error(

err,

"error handling binary file",

"commit", commitHash,

"path", fileName,

)

}

continue

}

GLEF1X · 2026-01-07T02:20:55Z

@rosecodym I believe that it was always 1-indexed. The engine normalizes to 1

trufflehog/pkg/engine/engine.go

Lines 1347 to 1350 in 50aa469

    
           // Ensure we maintain 1-based line indexing if fragmentStart is not set or is 0. 
        
           if *fragmentStart == 0 { 
        
           	*fragmentStart = 1 
        
           }

rosecodym

Beautiful work! I flagged one last thing (that I should have caught the first time, sorry), but everything else looks great.

rosecodym · 2026-01-08T15:01:44Z

pkg/sources/filesystem/filesystem_test.go

+					assert.Truef(
+						t,
+						proto.Equal(chunk.SourceMetadata, tt.wantSourceMetadata),
+						"Source.Chunks() %s metadata mismatch: got %v, want %v",
+						tt.name, chunk.SourceMetadata, tt.wantSourceMetadata,
+					)


Oh, sorry I didn't catch this the first time around, but can you use cmp.Diff here? That would generate a more useful diff when there's a mismatch.

GLEF1X requested a review from a team December 23, 2025 06:23

GLEF1X requested review from a team as code owners December 23, 2025 06:23

kashifkhan0771 requested review from camgunz and rosecodym January 5, 2026 06:47

rosecodym reviewed Jan 5, 2026

View reviewed changes

refactor(chunker): use constructor for ChunkResult

6e548e1

Merge branch 'main' into fix/secrets-wrong-line-report

f2f6dd8

rosecodym reviewed Jan 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: report accurate line numbers for chunked file scanning (#1876) #4615

fix: report accurate line numbers for chunked file scanning (#1876) #4615

GLEF1X commented Dec 23, 2025 •

edited

Loading

Uh oh!

CLAassistant commented Dec 23, 2025 •

edited

Loading

Uh oh!

rosecodym left a comment

Uh oh!

rosecodym Jan 5, 2026

Uh oh!

GLEF1X Jan 7, 2026 •

edited

Loading

Uh oh!

rosecodym Jan 5, 2026

Uh oh!

GLEF1X Jan 7, 2026 •

edited

Loading

Uh oh!

GLEF1X commented Jan 7, 2026 •

edited

Loading

Uh oh!

rosecodym left a comment

Uh oh!

rosecodym Jan 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	case isBinaryLine(latestState, line):
	latestState = BinaryFileLine

	path, ok := pathFromBinaryLine(line)
	if !ok {
	err = fmt.Errorf(`expected line to match 'Binary files a/fileA and b/fileB differ', got "%s"`, line)
	ctx.Logger().Error(err, "Failed to parse BinaryFileLine")
	latestState = ParseFailure
	continue
	}

	// Don't do anything if the file is deleted. (pathA has file path, pathB is /dev/null)
	if path != "" {
	currentDiff.PathB = path
	currentDiff.IsBinary = true
	}

	if err := HandleBinary(ctx, gitDir, reporter, chunkSkel, commitHash, fileName, s.skipArchives); err != nil {
	logger.Error(
	err,
	"error handling binary file",
	"commit", commitHash,
	"path", fileName,
	)
	}
	continue
	}

fix: report accurate line numbers for chunked file scanning (#1876) #4615

Are you sure you want to change the base?

fix: report accurate line numbers for chunked file scanning (#1876) #4615

Conversation

GLEF1X commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description:

Problem

Solution

Affected Sources

Checklist:

Uh oh!

CLAassistant commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rosecodym left a comment

Choose a reason for hiding this comment

Uh oh!

rosecodym Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

GLEF1X Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rosecodym Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

GLEF1X Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Before fix

After fix

Uh oh!

GLEF1X commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rosecodym left a comment

Choose a reason for hiding this comment

Uh oh!

rosecodym Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

GLEF1X commented Dec 23, 2025 •

edited

Loading

CLAassistant commented Dec 23, 2025 •

edited

Loading

GLEF1X Jan 7, 2026 •

edited

Loading

GLEF1X Jan 7, 2026 •

edited

Loading

GLEF1X commented Jan 7, 2026 •

edited

Loading