Skip to content

fix(cubebox): allow deleting paused sandboxes via out-of-band teardown#480

Open
HuChundong wants to merge 1 commit into
TencentCloud:masterfrom
HuChundong:fix/delete-paused-sandbox
Open

fix(cubebox): allow deleting paused sandboxes via out-of-band teardown#480
HuChundong wants to merge 1 commit into
TencentCloud:masterfrom
HuChundong:fix/delete-paused-sandbox

Conversation

@HuChundong

Copy link
Copy Markdown
Contributor

A paused/frozen VM cannot serve its runtime (ttrpc) API, so the shim rejects the regular per-container delete with "sandbox not in normal state" and the sandbox can never be removed (error 130593/130506).

Route teardown of paused sandboxes through a binary force-delete that SIGKILLs the shim (reaping the in-process cube-hypervisor) and detaches it from the runtime manager, letting containerd's dead-shim cleanup reclaim the VM workdir and pause snapshot:

  • Destroy: select the SkipRuntimeAPI path when the sandbox is paused and factor the force-delete into destroySandboxByBinary.
  • CleanUp (GC reconcile / workflow compensation): mirror the same path so a paused sandbox is reclaimed instead of looping forever on "shim process still Exists".

Verified end-to-end (running -> pause -> delete): RetCode 200, shim and VMM reaped, VM workdir / pause snapshot / bundle all reclaimed, no error spam.

Assisted-by: Cursor:Claude-Opus-4.8

@HuChundong HuChundong requested a review from fslongjin as a code owner June 6, 2026 09:23
@cubesandboxbot

cubesandboxbot Bot commented Jun 6, 2026

Copy link
Copy Markdown

Claude finished @HuChundong's task in 1m 29s —— View job


Code Review for PR #480

PR Title: fix(cubebox): allow deleting paused sandboxes via out-of-band teardown


Overview

This PR addresses a real problem: a paused/frozen VM can't serve its ttrpc runtime API, so the shim rejects per-container delete with "sandbox not in normal state" and the sandbox becomes undeletable. The fix routes paused sandbox teardown through a binary force-delete that SIGKILLs the shim and detaches it from the runtime manager.

Files changed: 2 files, +34/-8


✅ Good Design Decisions

  1. Extraction of destroySandboxByBinary into a helper (destroy.go:181-192): The old code inlined a one-off l.localTask.Delete call that only touched containerd metadata. The new helper comprehensively:

    • Kills the shim process (reaping the VMM)
    • Detaches from the shim manager (l.shims.Delete)
    • Lets containerd's dead-shim cleanup reclaim the VM workdir and pause snapshot
  2. Both code paths updated consistently: Destroy (direct API delete) and CleanUp (GC reconcile / workflow compensation) both use the same destroySandboxByBinary function, preventing the GC loop described in the PR description ("shim process still Exists").

  3. Graceful no-op on already-cleaned state: destroySandboxByBinary correctly handles:

    • syscall.ESRCH (process already gone) — logged at warn level, no error returned
    • errdefs.IsNotFound (shim already detached) — silently ignored
  4. Separation of concerns (destroy.go:120-124): The paused-detection (IsPaused()) is now cleanly separated from the action (SkipRuntimeAPI()), making the control flow more readable and allowing future callers to set the skip flag independently.


⚠️ Potential Concerns

1. SIGKILL to a potentially stale PID (destroy.go:183)

if pid := sb.Endpoint.Pid; pid != 0 {
    if err := syscall.Kill(int(pid), syscall.SIGKILL); err != nil && !errors.Is(err, syscall.ESRCH) {

sb.Endpoint.Pid could be stale if the shim has already exited and the PID has been recycled by the kernel to a different process. SIGKILL to an unrelated process is destructive. In practice, within this code path:

  • shims.Delete later validates the sandbox-to-shim mapping, providing a secondary guard
  • The PID comes from shim registration and is updated in the store

Suggestion: Consider a best-effort verification (e.g., reading /proc/<pid>/cmdline) before issuing SIGKILL, or at minimum logging the target PID name. This is defense-in-depth rather than a blocking issue.

2. Non-fatal SIGKILL failure could orphan the shim (destroy.go:183-185)

The SIGKILL failure is only logged as a warning, not propagated as an error. If the kill fails for a reason other than ESRCH (e.g., EPERM from a security restriction), the function still proceeds to call l.shims.Delete, which detaches the sandbox from containerd's management while the shim process continues running — creating an orphan.

Suggestion: Consider returning an error for unexpected SIGKILL failures (non-ESRCH), or add a post-kill ProcessExists check:

if pid := sb.Endpoint.Pid; pid != 0 {
    if err := syscall.Kill(int(pid), syscall.SIGKILL); err != nil && !errors.Is(err, syscall.ESRCH) {
        log.G(ctx).WithError(err).Warnf("kill shim %d of sandbox %s failed", pid, sb.ID)
        // Fall through — shims.Delete will still be attempted.
    }
}

3. No locking on the status read in CleanUp (local.go:459)

if info.GetStatus().IsPaused() {

In Destroy, the sandbox is locked (sb.Lock() at line 65). In CleanUp, the info object is read from the store and IsPaused() is called without any lock. The state could change between the check and the kill. This matches the pre-existing pattern in CleanUp (it never locked) but does introduce a small race window.

Suggestion: Not a blocker since killing a shim is safe even if the VM just resumed (containerd will restart the shim), but adding a comment acknowledging this would help future readers.

4. No tests for the new destroySandboxByBinary function

destroy_test.go doesn't test the paused-sandbox path or the new helper. Test coverage for this function would help validate:

  • Successful kill + shim detach
  • ESRCH / IsNotFound edge cases
  • Error propagation behavior

5. Code duplication: two sandboxDeletable calls (destroy.go:77-79, 87-89)

This pre-existing duplication (present before the PR) calls the same check twice within ~10 lines with no intervening state change:

if !sandboxDeletable(sb, opts.DestroyInfo.GetFilter()) {
    return ret.Err(errorcode.ErrorCode_PreConditionFailed, "unmatched filter")
}
// ... ~8 lines of namespace setup ...
if !sandboxDeletable(sb, opts.DestroyInfo.GetFilter()) {
    return ret.Err(errorcode.ErrorCode_PreConditionFailed, "unmatched filter")
}

The second check will always return the same result as the first since nothing between them modifies sb.Labels or the filter. This should be cleaned up separately.


Summary

Category Verdict
Correctness The approach is correct and addresses a real bug.
Design Clean extraction of destroySandboxByBinary; consistent across Destroy/CleanUp.
Safety Mostly safe. Stale-PID risk is mitigated by shims.Delete.
Test Coverage Missing tests for the new function and the paused-sandbox path.
Edge Cases ESRCH and IsNotFound handled; stale PID is the notable gap.

The PR is well-structured and solves the stated problem. The most impactful improvement would be adding unit tests for destroySandboxByBinary. The stale-PID concern is defense-in-depth and unlikely to manifest in practice given how sandbox PID lifecycle works.


@cubesandboxbot

cubesandboxbot Bot commented Jun 6, 2026

Copy link
Copy Markdown

Claude finished @HuChundong's task in 1m 39s —— View job


Review of PR #480: fix(cubebox): allow deleting paused sandboxes via out-of-band teardown


Summary

The codebase has been updated since the initial automated scan. The PR now contains 2 commits and the stale-PID concern from the first-pass review has been addressed with the isCubeShimPid guard. This review covers the latest state at 4b1636e.


✅ What Was Improved Since First Scan

  1. Stale PID guard added (destroy.go:200-215): The new isCubeShimPid(pid, sandboxID) function reads /proc/<pid>/cmdline via procfs.NewProc and verifies the sandbox ID appears in the process arguments before issuing SIGKILL. This directly addresses the earlier concern about signaling a recycled kernel PID.

  2. Tests for the PID guard (destroy_test.go:275-290): TestIsCubeShimPid validates three scenarios:

    • PID matches its own os.Args[0]true
    • PID with a non-matching token → false
    • Nonexistent PID → false

Design Review

Separation of concerns in Destroy (destroy.go:120-124)

The logic is now cleaner than the original inline l.localTask.Delete call:

if sb.GetStatus().IsPaused() {
    ctx = constants.WithSkipRuntimeAPI(ctx)
}

This separates detection (is the sandbox paused?) from action (how do we delete it?). The SkipRuntimeAPI(ctx) check at line 124 is the branch point. This is a good pattern — it lets future code paths set the context flag independently without duplicating the paused check.

destroySandboxByBinary extraction (destroy.go:181-194)

The helper function cleanly encapsulates the force-delete dance:

  1. SIGKILL the shim (with stale-PID guard)
  2. Detach from shims.Delete
  3. Let containerd's dead-shim GC reclaim the rest

This is correct because containerd's shim manager, upon observing that a registered shim process has exited, will handle cleanup of the VM workdir, pause snapshot, and IO fifos.

CleanUp parity (local.go:459-463)

The GC/compensation path now mirrors Destroy for paused sandboxes, which fixes the "shim process still Exists" infinite loop described in the PR. Any code path that reaches the post-cleanup ProcessExists check at line 473 will now find the shim reaped.

One concern: CleanUp reads info.GetStatus().IsPaused() without holding the sandbox lock (line 459), while Destroy locks via sb.Lock() (destroy.go:65-66). A state transition (paused → running) between the check and the SIGKILL would:

  • Call destroySandboxByBinary on a now-running sandbox
  • Which would kill the shim, causing the running VM to terminate

This is the same pre-existing pattern in CleanUp (it never locked), and as noted in the first review, killing the shim is still safe — containerd would restart it. A comment acknowledging the unlocked read wouldn't hurt but isn't a blocker.


Code-Level Observations

isCubeShimPid uses exact sandbox ID matching (destroy.go:209-213)

for _, arg := range cmdline {
    if arg == sandboxID {
        return true
    }
}

The shim is launched with -id <sandboxID>, so an exact token match against the cmdline array is correct. However, what if the sandbox ID happens to appear as a substring of another argument (e.g., an unrelated --foo=<sandboxID> style flag)? Double-check whether the shim's -id flag is split as separate argv entries (-id, <sandboxID>) or as a single -id=<sandboxID> token. If it's the latter, this would still match correctly since the tested arg is id=<sandboxID> which won't equal sandboxID. If it's the former, the match is correct. Given this is the same team's code, this is safe — just worth being explicit about in a comment or making the matching slightly more structured by looking for a -id / <sandboxID> pair.

Non-blocking: This is defense in depth. Even a false positive would just mean killing a rogue process whose cmdline happens to contain a sandbox ID — unlikely in practice.

Non-fatal SIGKILL failure (destroy.go:185-187)

After the isCubeShimPid guard, a SIGKILL failure that isn't ESRCH is very unlikely (would require the process to vanish between the /proc check and syscall.Kill, or an EPERM from security policy). The code logs a warning and continues to shims.Delete, which is the right behavior — shims.Delete is the more important operation for successful cleanup. No change needed.

Test scope

TestIsCubeShimPid covers the guard function well. However, destroySandboxByBinary itself has no direct test because it depends on l.shims (a *v2.ShimManager which requires a running containerd daemon to test). This is a reasonable constraint for integration-level code. If desired, an interface abstraction over ShimManager could enable unit testing in the future, but that's scope beyond this PR.


Suggestions (non-blocking)

Priority Suggestion
🟢 Low Add a one-line comment in isCubeShimPid noting the expected argv format (-id <sandboxID>) so future readers understand why a raw token match is correct.
🟢 Low Consider adding a brief comment in CleanUp's paused check acknowledging the unlocked read, so future maintainers know it's intentional.
🟡 Medium The sandboxDeletable double-call at destroy.go:77-79 and 87-89 (pre-existing) is a no-op duplicate — consider cleaning up separately.

Conclusion

The PR is correct, well-structured, and solves a real problem (undeletable paused sandboxes). The second commit (4b1636e) addresses the stale-PID safety concern raised in the initial review by adding isCubeShimPid. The force-delete path is consistently applied across both Destroy and CleanUp. No blocking issues remain.


Verification Checklist

  • Paused sandbox Destroy → selects destroySandboxByBinary → SIGKILLs shim (with stale-PID guard) → detaches from shim manager
  • Running sandbox Destroy → unchanged path (per-container destroyContainer)
  • Paused sandbox CleanUp (GC) → same destroySandboxByBinary path, preventing infinite GC loop
  • Running sandbox CleanUp → unchanged path (per-container stopTask)
  • isCubeShimPid guards against stale/recycled PIDs via /proc/<pid>/cmdline verification
  • Tests cover the PID guard function
  • syscall.ESRCH and errdefs.IsNotFound are handled gracefully
  • procfs dependency already exists in Cubelet/go.mod

| Branch: fix/delete-paused-sandbox

@cubesandboxbot

cubesandboxbot Bot commented Jun 6, 2026

Copy link
Copy Markdown

Claude finished @HuChundong's task in 1m 2s —— View job


Review of PR #480

Branch: fix/delete-paused-sandboxmaster
Commits: 3 (c0f124f, 4b1636e, 0129d18)
Files changed: 3 files (+75/-12)


Summary

This PR fixes a real problem: a paused/frozen VM cannot serve its ttrpc runtime API, so the shim rejects per-container delete with "sandbox not in normal state", making the sandbox permanently undeletable (errors 130593/130506). The fix routes paused sandbox teardown through a binary force-delete that SIGKILLs the shim and detaches it from containerd's shim manager, letting dead-shim cleanup reclaim the workdir and pause snapshot.


✅ What's Good

1. Clean extraction of destroySandboxByBinary (destroy.go:177-190)

The previous code inlined a one-off l.localTask.Delete call that only touched containerd metadata. The new helper comprehensively:

  • Verifies the PID is still the right shim via /proc before SIGKILL
  • Kills the shim (reaping the in-process VMM)
  • Detaches from the shim manager via l.shims.Delete
  • Lets containerd's dead-shim cleanup reclaim VM workdir and pause snapshot

2. Both Destroy and CleanUp updated consistently

  • Destroy: paused sandboxes take the SkipRuntimeAPIdestroySandboxByBinary path
  • CleanUp (GC reconciliation): mirrors the same path, preventing the "shim process still Exists" infinite GC loop

3. Stale-PID guard (isCubeShimPid, destroy.go:196-211)

The second commit (4b1636e) responsibly addresses a safety concern: the stored shim PID could be stale and recycled by the kernel. The guard reads /proc/<pid>/cmdline via procfs and verifies the sandbox ID token appears in the process arguments before issuing SIGKILL. This is defense-in-depth and well-done.

4. Graceful edge-case handling

  • syscall.ESRCH (process already gone) → logged at warn level, not propagated as error
  • errdefs.IsNotFound (shim already detached) → silently ignored
  • Zero PID (sb.Endpoint.Pid == 0) → skipped, no procfs access attempted

5. Unit test for the PID guard (destroy_test.go:275-290)

TestIsCubeShimPid covers three scenarios:

  • PID matching its own argv[0]true
  • PID with a non-matching token → false
  • Nonexistent PID → false

⚠️ Observations & Suggestions

# Severity Issue Suggestion
1 🟡 Medium Non-fatal SIGKILL failure may orphan the shim (destroy.go:181-184) If syscall.Kill fails with a reason other than ESRCH (e.g., EPERM from a security restriction like AppArmor or seccomp), the function logs a warning and continues to shims.Delete. This detaches the sandbox from containerd's management while the shim may still be running — creating an orphan process. Consider returning the kill error rather than just logging it, or adding a post-kill ProcessExists check to confirm the shim actually exited. At minimum, a comment explaining why falling through is safe would help future readers.
2 🟡 Medium Unlocked status read in CleanUp (local.go:459) Destroy holds sb.Lock() (destroy.go:65-66) before checking IsPaused(), but CleanUp reads info.GetStatus().IsPaused() without any lock. A state transition (paused → running) between the check and the SIGKILL would kill the shim of a now-running sandbox, terminating the VM. This matches the pre-existing pattern in CleanUp (it never locked), and killing the shim is safe — containerd would restart it — but adding a comment acknowledging the unlocked read would help future maintainers understand it's intentional.
3 🟢 Low isCubeShimPid uses exact token matching (destroy.go:205-209) The function iterates over cmdline args and matches arg == sandboxID. The shim is launched with -id <sandboxID> as separate argv entries, so this is correct. However, if any other argument happens to equal the sandbox ID (e.g., a filename in the same directory), there's a false positive. In practice the sandbox ID is a UUID, so collision risk is negligible. Consider documenting the expected argv shape in a comment, or tightening to look for a -id / <sandboxID> pair pattern.
4 🟢 Low collectSandboxMaxMemoryUsage also reads /proc/<pid> without stale-PID guard (destroy.go:396-397) The pre-existing readVmHWM(pid) call uses procfs.NewProc(pid) to read memory stats, but unlike the new isCubeShimPid, it doesn't verify the PID still belongs to the right shim. A recycled PID could lead to reading memory stats from an unrelated process. This is pre-existing and likely low-impact (the metric read is advisory), but worth noting for consistency.
5 🟢 Low procfs import footprint (destroy.go:24) procfs was already imported by local.go for readVmHWM. Using the same library in destroy.go for isCubeShimPid is consistent, but note that procfs.NewProc does a stat(2) on /proc/<pid>, which has a tiny race window with syscall.Kill. The isCubeShimPidsyscall.Kill sequence is not atomic — the PID could die between the check and the kill, but that's benign (returns ESRCH which is handled).

Code Structure Notes

Destroy control flow (destroy.go:114-124)

The separation of detection (is paused?) from action (how to delete?) is clean:

if sb.GetStatus().IsPaused() {
    ctx = constants.WithSkipRuntimeAPI(ctx)
}

if constants.SkipRuntimeAPI(ctx) {
    if err := l.destroySandboxByBinary(ctx, sb); err != nil {
        ...
    }
} else {
    // normal per-container deletion
}

This is extensible — future callers can set the SkipRuntimeAPI flag independently without duplicating the paused check.

CleanUp GC parity (local.go:458-470)

The mirroring of destroySandboxByBinary in CleanUp is critical: without it, the GC loop would keep finding the shim process still running (ProcessExists at line 473) and retry indefinitely. This fix breaks that loop cleanly.

Redundant sandboxDeletable removal (commit 0129d18)

The commit correctly identifies that sandboxDeletable is a pure function of sb.Labels and the filter — nothing between the two calls changes either, so the second check (previously at line 87) always returned the same result as the first. Removing it is the right cleanup.


Test Coverage Assessment

Function Tested? Notes
isCubeShimPid ✅ Yes Good coverage of match, non-match, nonexistent PID
destroySandboxByBinary ❌ No Depends on l.shims (ShimManager), which requires a running containerd. Acceptable for now.
Destroy paused path ❌ No Integration-level; would need a test harness with a mock shim
CleanUp paused path ❌ No Same integration constraint

Adding an integration test or interface abstraction over ShimManager to enable unit testing of destroySandboxByBinary would be a nice follow-up but is out of scope for this PR.


Conclusion

Category Verdict
Correctness ✅ The approach is correct and addresses the root cause
Safety ✅ Stale-PID guard (isCubeShimPid) is well-implemented
Design ✅ Clean extraction of destroySandboxByBinary; consistent across Destroy/CleanUp
Edge Cases ✅ ESRCH, IsNotFound, zero-PID handled. Stale-PID mitigated
Tests 🟡 PID guard tested. destroySandboxByBinary untested (integration constraint)
Non-blocking concerns ⚠️ Orphan risk on EPERM, unlocked CleanUp read

The PR is well-structured, addresses a real and impactful bug, and the second commit (4b1636e) appropriately addresses the stale-PID safety concern. No blocking issues remain.

Most impactful follow-up: Consider hardening the SIGKILL failure path in destroySandboxByBinary — returning the error (or at minimum confirming the process exited) for non-ESRCH failures would prevent potential orphaned shims.


| Branch: fix/delete-paused-sandbox

@cubesandboxbot

cubesandboxbot Bot commented Jun 6, 2026

Copy link
Copy Markdown

Claude finished @HuChundong's task in 1m 17s —— View job


Code Review for PR #480

Branch: fix/delete-paused-sandboxmaster
Commits: 4 (c0f124f4b1636e0129d183028b33)
Files changed: 3 files (+117/-12)


Summary

This PR fixes a real bug: a paused/frozen VM cannot serve its ttrpc runtime API, so the shim rejects per-container delete with "sandbox not in normal state", making the sandbox permanently undeletable. The fix routes paused sandbox teardown through a binary force-delete that SIGKILLs the shim (with stale-PID guard) and detaches it from containerd's shim manager, letting dead-shim cleanup reclaim the workdir and pause snapshot.

The code has evolved through 4 commits and has been reviewed multiple times. The latest commit (3028b33) addresses several concerns raised in earlier passes.


✅ What's Good

1. Clean extraction of destroySandboxByBinary (destroy.go:177-192)

The helper function cleanly encapsulates the force-delete dance:

  • Verifies the PID is still the right shim via /proc/<pid>/cmdline before SIGKILL
  • SIGKILLs the shim (reaping the in-process VMM)
  • Detaches from the shim manager via l.shims.Delete
  • Lets containerd's dead-shim cleanup reclaim the VM workdir and pause snapshot

2. Both Destroy and CleanUp updated consistently

  • Destroy (destroy.go:116-124): paused sandboxes set SkipRuntimeAPI context flag → destroySandboxByBinary path
  • CleanUp (local.go:458-469): GC/compensation mirrors the same path, preventing the "shim process still Exists" infinite loop

3. Stale-PID guard — well hardened (destroy.go:194-222)

The stored shim PID may be stale and recycled by the kernel. The guard reads /proc/<pid>/cmdline via procfs and uses cmdlineServesSandbox to verify the sandbox ID appears as a -id <sandboxID> or -id=<sandboxID> argument:

func cmdlineServesSandbox(cmdline []string, sandboxID string) bool {
    for i, arg := range cmdline {
        if arg == "-id" && i+1 < len(cmdline) && cmdline[i+1] == sandboxID {
            return true
        }
        if arg == "-id="+sandboxID {
            return true
        }
    }
    return false
}

This is deliberately not a bare token match — an unrelated process that merely mentions the sandbox ID in its cmdline won't be mistaken for the shim. This directly addresses concern #3 from the first-pass review.

4. SIGKILL failure now returns an error (destroy.go:181-184)

The latest commit (3028b33) changed the SIGKILL failure handling from a warn-and-fall-through to an error return:

if err := syscall.Kill(pid, syscall.SIGKILL); err != nil && !errors.Is(err, syscall.ESRCH) {
    return fmt.Errorf("kill shim %d of sandbox %s: %w", pid, sb.ID, err)
}

This is the right behavior. Returning the error prevents shims.Delete from detaching a shim that wasn't killed, which would orphan the shim and its in-process VMM. This addresses concern #1 and #2 from the earlier reviews.

5. Graceful edge-case handling

  • syscall.ESRCH (process already gone) → destroySandboxByBinary skips the kill, proceeds to shims.Delete
  • errdefs.IsNotFound (shim already detached) → silently ignored
  • Zero PID (sb.Endpoint.Pid == 0) → skipped, no procfs access attempted
  • procfs.NewProc / CmdLine errors → isCubeShimPid returns false, no SIGKILL attempted

6. Comprehensive tests for the cmdline matching (destroy_test.go:275-308)

TestCmdlineServesSandbox validates 7 scenarios:

  • Real shim layout with -id <sandboxID>true
  • -id=<sandboxID> form → true
  • Wrong sandbox ID → false
  • ID as a bare token (not after -id) → falsekey safety test
  • ID as a namespace value with different -idfalse
  • Dangling -id (no value) → false
  • Empty cmdline → false

7. Removed duplicate sandboxDeletable check (commit 0129d18)

The redundant second sandboxDeletable call at the old destroy.go:87 was correctly removed — it was a pure function of sb.Labels and the filter, and nothing between the two calls changed either.


Observations

🟡 CleanUp unlocked status read (local.go:459)

In Destroy, the sandbox state is read under sb.Lock() (destroy.go:65-66). In CleanUp, info.GetStatus().IsPaused() is called without any lock. If the sandbox transitions from paused → running between the check and the destroySandboxByBinary call, the shim of a now-running sandbox would be killed, terminating the VM.

Mitigation: This matches the pre-existing pattern in CleanUp (it never held the lock for any path), and killing the shim of a running sandbox is safe — containerd would restart it. However, a brief comment acknowledging the unlocked read would help future maintainers understand this is intentional.

🟢 procfs.NewProc + syscall.Kill race window (destroy.go:198-205 vs 181)

isCubeShimPid reads /proc/<pid>/cmdline (a stat(2) + read(2)), then destroySandboxByBinary issues syscall.Kill(pid, SIGKILL). These are not atomic — the PID could die between the check and the kill. This is benign: syscall.Kill returns ESRCH which is handled.

🟢 collectSandboxMaxMemoryUsage also reads /proc/<pid> without stale-PID guard (destroy.go:407)

The pre-existing readVmHWM(pid) uses procfs.NewProc(pid) to read memory stats without verifying the PID. A recycled PID could lead to reading memory stats from an unrelated process. Low impact (advisory metric), but an inconsistency worth noting.

🟢 No tests for destroySandboxByBinary itself

The function depends on l.shims (a *v2.ShimManager which requires a running containerd daemon). Acceptable for integration-level code; unit testing would require abstracting ShimManager behind an interface.


Conclusion

Category Verdict
Correctness ✅ Addresses the root cause — paused sandboxes are now deletable
Safety ✅ Stale-PID guard (cmdlineServesSandbox) properly prevents signaling recycled PIDs
SIGKILL failure ✅ Now correctly returns an error instead of potentially orphaning the shim
Edge cases ✅ ESRCH, IsNotFound, zero-PID, nonexistent PID all handled
Tests cmdlineServesSandbox well-tested with 7 cases; isCubeShimPid tested for nonexistent and non-shim PIDs
Design ✅ Clean extraction of destroySandboxByBinary; consistent across Destroy/CleanUp
Unlocked CleanUp read 🟡 Pre-existing pattern, safe in practice, no action needed
destroySandboxByBinary test 🟢 Integration constraint, acceptable

The PR is well-structured and the safety concerns from earlier iterations have been properly addressed. The cmdlineServesSandbox function is a particularly nice touch — it's more conservative than a simple token match, requiring the -id flag prefix. No blocking issues remain.

@kinwin-ustc

Copy link
Copy Markdown
Collaborator

Can you squash it into one commit?

A paused/frozen VM cannot serve its runtime (ttrpc) API, so the shim
rejects the regular per-container delete with "sandbox not in normal
state" and the sandbox can never be removed (error 130593/130506).

Route teardown of paused sandboxes through a binary force-delete that
SIGKILLs the shim (reaping the in-process cube-hypervisor) and detaches
it from the runtime manager, letting containerd's dead-shim cleanup
reclaim the VM workdir and pause snapshot:

- Destroy: select the SkipRuntimeAPI path when the sandbox is paused and
  factor the force-delete into destroySandboxByBinary.
- CleanUp (GC reconcile / workflow compensation): mirror the same path so
  a paused sandbox is reclaimed instead of looping forever on
  "shim process still Exists".

Harden destroySandboxByBinary:
- Verify via /proc that the recorded shim PID is still this sandbox's
  cube shim (matching the "-id <sandboxID>" launch arg) before SIGKILL,
  so a stale/recycled PID is never signaled.
- Propagate non-ESRCH SIGKILL failures instead of detaching a shim we
  failed to kill, which would orphan the shim and its in-process VMM.
- Drop a redundant sandboxDeletable check in Destroy.

Covered by table-driven tests for the PID/cmdline matching. Verified
end-to-end (running -> pause -> delete): RetCode 200, shim and VMM
reaped, VM workdir / pause snapshot / bundle all reclaimed.

Assisted-by: Cursor:Claude-Opus-4.8
Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: carmake <gycm520@gmail.com>
@HuChundong HuChundong force-pushed the fix/delete-paused-sandbox branch from 3028b33 to 598dfdf Compare June 11, 2026 12:46
Comment on lines +178 to +186
// The stored shim PID may be stale (shim already gone, PID recycled), so only
// SIGKILL it after /proc confirms it is still this sandbox's cube shim.
if pid := int(sb.Endpoint.Pid); pid != 0 && isCubeShimPid(pid, sb.ID) {
if err := syscall.Kill(pid, syscall.SIGKILL); err != nil && !errors.Is(err, syscall.ESRCH) {
// Don't detach a shim we couldn't kill: shims.Delete would drop it from
// containerd's management and orphan the shim (and its in-process VMM).
return fmt.Errorf("kill shim %d of sandbox %s: %w", pid, sb.ID, err)
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TOCTOU race: PID verified then signaled, but PID can be recycled between the two operations

isCubeShimPid() reads /proc/<pid>/cmdline on line 180, and then syscall.Kill(pid, SIGKILL) is called on line 181. If the shim exits between these two calls, the kernel can recycle that PID to an unrelated process (another sandbox's VMM, a Cubelet worker, etc.), and the SIGKILL terminates an innocent process.

On Linux 5.3+, use pidfd_open(pid, 0) to pin the PID, verify via /proc/self/fdinfo/<pidfd>, and signal through the pidfd using pidfd_send_signal. This closes the recycle window because the pidfd holds a reference preventing PID reuse.

If pidfd is unavailable, explicitly document this race and consider that shims.Delete (line 188) is the safer primary mechanism — the manual syscall.Kill is a best-effort emergency measure.

Comment on lines +459 to +462
if info.GetStatus().IsPaused() {
if er := l.destroySandboxByBinary(ctx, info); er != nil {
stepLog.Warnf("CleanUp force delete sandbox %s by binary fail: %v", sandBoxID, er)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Race condition: CleanUp checks IsPaused() without holding the sandbox lock

The Destroy path (destroy.go:66) holds sb.Lock() before checking IsPaused(), preventing a concurrent Resume from changing state mid-flight. But here in CleanUp, no lock is held on info. If a concurrent Resume transitions the sandbox from paused to running between the IsPaused() check and the destroySandboxByBinary call, this code will force-kill a now-running sandbox with active workloads.

Consider acquiring the sandbox lock before the IsPaused() check, or atomically re-verify the state under the lock before the destructive operation.

Comment on lines +122 to 124
if err := l.destroySandboxByBinary(ctx, sb); err != nil {
result = multierror.Append(result, fmt.Errorf("delete sandbox by binary failed: %w", err))
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing container metadata/filesystem cleanup in the paused Destroy path

When the paused path is taken, destroySandboxByBinary is called (which kills the shim and detaches it from containerd), but the container-by-container teardown in the else branch (lines 129-162) is entirely skipped. This means:

  • deleteContainer() is never called — containerd container metadata entries remain in boltdb
  • rootfs.CleanRootfs() is not called for individual containers — rootfs mounts may leak
  • taskio.Clean() is not called — FIFO directories may leak

Compare with CleanUp (local.go:564-575), which DOES call cleanFn for each container after destroySandboxByBinary to perform this cleanup.

Consider adding container metadata cleanup after the runc.Clean call, something like:

for id := range sb.All() {
    if cntr, _ := l.client.LoadContainer(ctx, id); cntr != nil {
        _ = deleteContainer(ctx, cntr)
    }
}
if first := sb.FirstContainer(); first != nil && first.Container != nil {
    _ = deleteContainer(ctx, first.Container)
}

@fslongjin fslongjin left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issues

The intent (allowing deletion of paused sandboxes) is correct, but the implementation has problems:

  1. Shouldn't write custom kill logic. The cube shim already implements delete_shim() (srv.rs:56-72), which handles SIGKILL of the original shim process, socket cleanup, and removal of VM workdir and pause snapshot. The PR reimplements this in Cubelet with manual SIGKILL + /proc PID verification (isCubeShimPid / cmdlineServesSandbox), duplicating the shim's own logic while also missing VM workdir and pause snapshot cleanup.

  2. Bypasses containerd's standard mechanism. containerd has a standard binary.Delete flow for unresponsive shims: launch the shim binary with the delete subcommand, which triggers delete_shim() to handle cleanup. The PR doesn't use this path.

Suggestion

Use the shim's binary.Delete: launch the cube-shim binary with -id <sandboxID> delete, let the shim's own delete_shim() handle kill + resource cleanup, then call l.shims.Delete to remove from containerd. Cubelet doesn't need any custom kill logic.

Remove:

  • isCubeShimPid / cmdlineServesSandbox functions and their tests
  • Manual syscall.Kill + /proc verification logic

Revert:

  • sandboxDeletable check (unrelated to the paused sandbox fix)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants