Skip to content

Comments

fix: Issue 684 - Inefficiency report double-counting reclaimable bytes#685

Open
huntharo wants to merge 2 commits intowagoodman:mainfrom
huntharo:issue-684/ineff-double-report
Open

fix: Issue 684 - Inefficiency report double-counting reclaimable bytes#685
huntharo wants to merge 2 commits intowagoodman:mainfrom
huntharo:issue-684/ineff-double-report

Conversation

@huntharo
Copy link

@huntharo huntharo commented Feb 22, 2026

Summary

This PR fixes an over-reporting bug in image inefficiency analysis where potential savings were calculated as the sum of all copies of duplicated files instead of the sum of only removable copies.

Before this change, duplicate paths could inflate:

  • Potential wasted space in the UI
  • WastedBytes in analysis/CI/export output

The efficiency percentage was already based on a one-copy-retained model, so bytes and percent could disagree.

Problem

For a duplicated path with N copies:

  • Previous behavior effectively counted N copies as reclaimable (via CumulativeSize)
  • Correct behavior should count only N - 1 copies as reclaimable (one copy must remain)

Equivalent framing per path:

  • Previous: reclaimable = sum(all discovered bytes)
  • Correct: reclaimable = sum(all discovered bytes) - one retained copy

In the current model, retained copy is represented by the smallest discovered size for that path.

Root cause

EfficiencyData already tracks both:

  • CumulativeSize: total discovered bytes across layers
  • minDiscoveredSize: smallest discovered copy size for that path

But downstream consumers summed CumulativeSize directly:

  • dive/image/analysis.go (WastedBytes)
  • cmd/dive/cli/internal/ui/v1/view/image_details.go (Potential wasted space)

This double-counted reclaimable size.

Solution

  1. Added EfficiencyData.WastedSize():
  • CumulativeSize - minDiscoveredSize
  • clamped at >= 0
  1. Switched consumers to use reclaimable bytes:
  • Analyze(...) now sums file.WastedSize()
  • UI wasted-space line in ImageDetails.Render now sums data.WastedSize()
  1. Updated affected fixtures/expectations:
  • docker archive analysis expected wasted bytes/percent
  • export snapshot inefficientBytes

Tests added (first commit)

New regression coverage

  • dive/image/analysis_regression_test.go
    • single copy => 0 waste
    • 2 copies => remove 1 copy
    • 10 copies => remove 9 copies
    • 100 copies => remove 99 copies
    • varied sizes => remove all but one retained copy
  • cmd/dive/cli/internal/ui/v1/view/image_details_regression_test.go
    • verifies rendered Potential wasted space uses removable bytes
  • dive/filetree/efficiency_test.go
    • edge-case coverage with many copies and varied hash/size inputs

Pre-fix baseline (before implementation)

  • go test ./dive/filetree -run 'TestEfficiency'
  • go test ./dive/image -run 'TestAnalyzeWastedBytesCountsOnlyRemovableDuplicates'
  • go test ./cmd/dive/cli/internal/ui/v1/view -run 'TestRenderImageDetailsBodyPotentialWastedSpaceUsesRemovableBytes'
image

Post-fix verification

  • go test ./dive/filetree ./dive/image/... ./cmd/dive/cli/internal/ui/v1/view ./cmd/dive/cli/internal/command/ci ./cmd/dive/cli/internal/command/export
image

Behavioral impact

For duplicate-heavy layers (e.g., chown-style copy duplication), wasted-byte totals now align with the efficiency ratio model (retain one copy, count only removable copies).

Commits

  1. 91fda99test: add regressions for duplicate wasted-space accounting
  2. 51c0446fix: report reclaimable bytes for inefficiencies

Fixes #684

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: Double counting of potential space savings in inefficiency report

1 participant