Skip to content

fix(executions): Fixing Process Exit Code#7437

Merged
xoscar merged 3 commits intomainfrom
bug/TKC-4008/finvi-parallel-workers-look-like-they-are-running-in-the-ui-even
May 6, 2026
Merged

fix(executions): Fixing Process Exit Code#7437
xoscar merged 3 commits intomainfrom
bug/TKC-4008/finvi-parallel-workers-look-like-they-are-running-in-the-ui-even

Conversation

@xoscar
Copy link
Copy Markdown
Contributor

@xoscar xoscar commented Mar 30, 2026

Fixing Process Exit Code

The execution code was sending SIGKILL to the child processes, causing the cleanup code to never be executed. The side effects of this caused parallel runs to stay in progress forever.

Checklist (choose whats happened)

  • breaking change! (describe)
  • tested locally
  • tested on cluster
  • added new dependencies
  • updated the docs
  • added a test

Fixes

  • Fixes exit code for child processes

@xoscar xoscar self-assigned this Mar 30, 2026
@xoscar xoscar marked this pull request as ready for review March 30, 2026 15:49
@xoscar xoscar requested a review from a team as a code owner March 30, 2026 15:49
@xoscar xoscar requested review from Copilot and mgazza March 30, 2026 15:49
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 30, 2026

Greptile Summary

This PR fixes parallel test runs getting stuck by changing the context-cancellation signal from the default SIGKILL to SIGTERM (with a 10-second SIGKILL fallback via WaitDelay), allowing child-process cleanup/deferred code to run. getProcessStatus is updated to treat "signal: terminated" the same as "signal: killed" so the abort-propagation logic fires correctly in both cases.

Confidence Score: 5/5

Safe to merge — targeted fix with no logic errors or regressions introduced.

Both changes are straightforward and correct: the Cancel closure is only invoked after cmd.Start() succeeds (so cmd.Process is never nil), WaitDelay provides a sensible hard-kill fallback, and the getProcessStatus update keeps abort-detection consistent with the new signal. No new failure modes introduced.

No files require special attention.

Important Files Changed

Filename Overview
cmd/testworkflow-init/orchestration/executions.go Replaces default SIGKILL on context cancellation with SIGTERM + 10s WaitDelay, and updates getProcessStatus to treat SIGTERM the same as SIGKILL for abort detection.

Sequence Diagram

sequenceDiagram
    participant Ctx as Context
    participant Cmd as exec.Cmd (CreateWithContext)
    participant Proc as Child Process
    participant Run as execution.Run()

    Ctx->>Cmd: ctx.Done() fires
    Note over Cmd: Cancel() called (was: os.Kill / SIGKILL)
    Cmd->>Proc: SIGTERM (new behaviour)
    Proc->>Proc: cleanup / deferred code runs
    alt exits within 10s (WaitDelay)
        Proc-->>Run: Wait() returns "signal: terminated"
        Run->>Run: getProcessStatus → aborted=true
        Run->>Run: group.Abort() if not already aborted
    else still running after 10s
        Cmd->>Proc: SIGKILL (WaitDelay fallback)
        Proc-->>Run: Wait() returns "signal: killed"
        Run->>Run: getProcessStatus → aborted=true
    end
Loading

Reviews (2): Last reviewed commit: "cleanup" | Re-trigger Greptile

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adjusts how testworkflow-init cancels child process executions so that child processes receive a graceful termination signal (allowing cleanup hooks to run) instead of being hard-killed, addressing stuck “in progress” parallel runs.

Changes:

  • Override exec.Cmd.Cancel to send SIGTERM on context cancellation.
  • Configure a WaitDelay to give processes time to exit cleanly before escalation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread cmd/testworkflow-init/orchestration/executions.go
Comment thread cmd/testworkflow-init/orchestration/executions.go
Comment thread cmd/testworkflow-init/orchestration/executions.go
@xoscar xoscar force-pushed the bug/TKC-4008/finvi-parallel-workers-look-like-they-are-running-in-the-ui-even branch from 8723cc5 to bf4367c Compare March 30, 2026 15:57
@xoscar xoscar changed the title TKC-4008 bug(executions): Fixing Process Exit Code fix(executions): Fixing Process Exit Code Mar 30, 2026
@mgazza
Copy link
Copy Markdown
Contributor

mgazza commented Apr 1, 2026

Code review

Found 1 issue:

  1. getProcessStatus only recognizes "signal: killed" (SIGKILL) as an abort indicator. With Cancel now sending SIGTERM, a child process that does not trap SIGTERM exits with "signal: terminated", which falls into the exitCode < 0 branch in Run() and prints the misleading "The process has been corrupted: signal: terminated." message. The fix is to extend the check to also match "signal: terminated" — e.g. details == "signal: killed" || details == "signal: terminated". (Copilot and Greptile flagged the same issue independently.)

details := e.String()
return details == "signal: killed", details, e.ExitCode()
}

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

@xoscar
Copy link
Copy Markdown
Contributor Author

xoscar commented Apr 30, 2026

@greptile

@xoscar
Copy link
Copy Markdown
Contributor Author

xoscar commented Apr 30, 2026

Thanks @mgazza, do you mind taking a look again?

@xoscar xoscar merged commit 0874af6 into main May 6, 2026
13 checks passed
@xoscar xoscar deleted the bug/TKC-4008/finvi-parallel-workers-look-like-they-are-running-in-the-ui-even branch May 6, 2026 16:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants