Fix flaky CanRunOnIdleTask by polling instead of sleeping#2314
Conversation
`CanRunOnIdleTask` (and its twin `CanRunOnIdleInProfileTask`) were flaky on the net462 (Windows PowerShell 5.1) CI leg — the former was just caught failing on PR #2298's Windows job. The root cause is that `PsesInternalHost.OnPowerShellIdle` calls `Events.GenerateEvent(PSEngineEvent.OnIdle, ...)`, which only *enqueues* the event. For a subscriber registered with `-Action {...}`, PowerShell doesn't run the action scriptblock inline; it becomes a pending action that the engine dispatches asynchronously on the pipeline thread, around subsequent pipeline invocations. So the action's execution was never synchronized with the test's `$handled` read, and the fixed `Thread.Sleep(2000)` was just a timing guess — sometimes too short on the slower WinPS leg, leaving `$global:handled` still `$false` at the assertion. The key realization is that each *additional* pipeline execution gives the engine another chance to drain the pending action, so re-reading the handler variable in a loop both waits for *and* drives completion. I replaced the sleep with a shared `WaitForHandledAsync` helper that polls the variable (~200ms apart, ~15s ceiling) until it reports `$true`, returning the last observed value on timeout so the assertion still fails loudly. This keeps the tests' intent intact and isn't merely a longer sleep. I validated both tests on net8.0 (green across repeated runs, ~0.4s each vs. the old fixed 2s); net462 can't run on macOS, but the mechanism is identical across targets and the 15s ceiling self-terminates on success, so it's strictly safer on the slow leg without slowing the fast one. Drafted by Copilot (Claude Opus 4.8). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR fixes flaky CanRunOnIdleTask and CanRunOnIdleInProfileTask tests that intermittently failed on the net462 (Windows PowerShell 5.1) CI leg. The root cause was a hard-coded Thread.Sleep(2000) that was insufficiently long for the asynchronous event dispatch on slower runners. The fix replaces both sleeps with a shared polling helper that actively drives event processing while waiting.
Changes:
- Adds a new
OnIdleTestHelpers.WaitForHandledAsyncstatic helper that polls a PowerShell handler variable viaExecutePSCommandAsync<bool>in a loop (200 ms between polls, 15 s timeout ceiling), returning the last observed value on timeout so assertions still fail loudly. - Replaces the
Thread.Sleep(2000)+ manualExecutePSCommandAsyncread in bothCanRunOnIdleTaskandCanRunOnIdleInProfileTaskwith calls to the new helper, also removing the// TODO: Why is this racy?comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Small follow-up to review feedback: both call sites only ever asserted the handler variable became `$true`, so the `IReadOnlyList<bool>` return was needless ceremony. `AssertHandledAsync` now owns the assertion — it returns once the variable reports `$true` and otherwise fails via `Assert.Fail` when the ~15s poll window elapses, which reads as "the OnIdle handler never ran." `Assert.Fail` is fine here — we're on xUnit 2.9.3 and already use it in the E2E tests. No behavior change to what's being verified; the call sites just shrink to a single `await OnIdleTestHelpers.AssertHandledAsync(...)`. Still green on net8.0. Drafted by Copilot (Claude Opus 4.8). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Follow-up to review feedback: the helper short-circuited with a bare `return` on success and only asserted (`Assert.Fail`) on timeout, so the happy path had no explicit assertion. Restructure the loop to poll until the handler variable is `$true` or the ~15s window elapses, then assert the outcome once with `Assert.True(handled, ...)`. Same behavior, but the success and timeout paths now share a single, self-describing assertion. Still green on net8.0. Drafted by Copilot (Claude Opus 4.8). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
That seems reasonable to me! |
JustinGrote
left a comment
There was a problem hiding this comment.
Ah the classic "fix a race condition by introducing a timeout". Good catch copilot!
|
@JustinGrote and a polling loop. |
|
@andyleejordan I mean I'll take a polling loop over a blind timeout any day, at least it's confirming its OK to proceed. |
Summary
CanRunOnIdleTask(and its twinCanRunOnIdleInProfileTask) inPsesInternalHostTests.cswere intermittently failing on the net462 (WindowsPowerShell 5.1) CI leg.
CanRunOnIdleTaskwas just caught failing on PR #2298'sWindows job (
Assert.Collection() … Assert.False/Assert.Trueat line ~200). Bothtests carried a
// TODO: Why is this racy?above a hard-codedThread.Sleep(2000).Root cause
PsesInternalHost.OnPowerShellIdlecalls_mainRunspaceEngineIntrinsics.Events.GenerateEvent(PSEngineEvent.OnIdle, ...),which only enqueues the OnIdle event. For a subscriber registered with
-Action { ... }(exactly what these tests register), PowerShell does not runthe action scriptblock inline at
GenerateEventtime — it becomes a pendingaction that the engine's event manager dispatches asynchronously, on the
pipeline thread, at the next process-pending-actions point (around subsequent
pipeline invocations).
OnPowerShellIdlethen runs a tiny artificial pipeline(
…param() 0) to nudge event processing, but the action's actual execution isnever synchronized with that pipeline returning or with the test's later
$handledread.So the fixed
Thread.Sleep(2000)was only a timing guess. On a fast runner theaction finishes first; on the slower WinPS leg 2 seconds is sometimes not enough,
leaving
$global:handledstill$falseat the assertion — hence the intermittentfailure.
The key realization: each additional pipeline execution gives the engine
another chance to drain the pending action, so re-reading the handler variable in
a loop both waits for and drives completion. That makes polling a real
deterministic fix rather than just a longer sleep.
Change
WaitForHandledAsync(psesHost, variableName)helper that polls thehandler variable via
ExecutePSCommandAsync<bool>until it reports$true(~200 ms between polls, ~15 s ceiling via
CancellationTokenSource). On timeoutit returns the last observed value so the existing
Assert.Collection(handled, Assert.True)still fails loudly.CanRunOnIdleTask($handled) andCanRunOnIdleInProfileTask(
$handledInProfile), replacing bothThread.Sleep(2000)calls and removing the// TODO: Why is this racy?comments. Registration, the pre-assert(
Assert.False), and theOnPowerShellIdledelegate call are unchanged.No production code or test-profile script changes — test file only.
Validation
Invoke-Build TestPS74 -TestFilter 'FullyQualifiedName~CanRunOnIdle'): green across repeated runs, ~0.4 s each vs.the old fixed 2 s.
targets — only latency differs. The 15 s ceiling is ~7.5× the old 2 s window and
self-terminates on success, so it's strictly safer on the slow leg without
slowing the fast one.