fix(testing): run prefect_test_harness teardown on exception-path exits by zzstoatzz · Pull Request #21469 · PrefectHQ/prefect

zzstoatzz · 2026-04-08T15:25:57Z

Scope

This PR does NOT fix the test_prefect_test_harness py3.14 postgres flake (#21405, #21414, #21460). It addresses a related-but-separate gap in prefect_test_harness exception handling that I noticed while investigating that flake.

The gap

prefect_test_harness is a @contextmanager that places test_server.stop() and run_coro_as_sync(drain_workers()) after the yield with no try/finally:

```python
yield
run_coro_as_sync(drain_workers())
test_server.stop()
```

If any exception propagates out of the harness body, the lines after `yield` are never executed. `test_server.stop()` is skipped, which means the `SubprocessASGIServer` singleton cleanup path (that #21405 carefully set up via `_instance_key`) is never reached — `_instances[None]` is left holding a stale entry with `running=True`, and the next `prefect_test_harness()` in the same process silently reuses it.

This is narrower than what the py3.14 flake exercises. The failing test (`test_prefect_test_harness`) runs clean — no exception — so this gap is invisible to that test. But any other test that raises from inside the harness would leave the process in a polluted state for subsequent tests, and that pollution is hard to attribute back to the source. I hit this gap by accident while writing a test that raised inside the harness and saw the next harness start() become a no-op.

Fix

Register `test_server.stop()` and the log/event drain on the harness's existing `ExitStack` so both run regardless of how the harness body exits. LIFO ordering means the drain runs while the server is still alive (before `stop()`), matching the prior ordering.

```python
test_server = SubprocessASGIServer()
stack.callback(test_server.stop) # runs on unwind, LIFO
test_server.start(...)
stack.enter_context(temporary_settings({PREFECT_API_URL: test_server.api_url}))
...
stack.callback(lambda: run_coro_as_sync(drain_workers())) # runs first (LIFO)
yield
```

What this deliberately does NOT include

The `SubprocessASGIServer.start()` "dead subprocess" health check that was in the earlier draft of this PR. It was speculative defense-in-depth, not tied to any observed failure. The py3.14 flake's stale subprocess is alive, just pointed at the wrong database, so the health check doesn't help it. Dropped to keep this PR focused.
Any attempt to fix the py3.14 postgres flake itself. Previous attempts along the "defensive stop-before-start" and "identity-based singleton cleanup" lines (fix: fix SubprocessASGIServer singleton cleanup causing test_prefect_test_harness flakiness #21414, fix: fix flaky tests on main (harness singleton, SQLite lock, timing tolerance) #21460) were closed without merging, and I don't yet understand the root cause well enough to propose a different fix. That work belongs in a separate investigation.

Relationship to prior PRs

Builds on Fix flaky tests: SubprocessASGIServer singleton cleanup and SQLite lock retry #21405 (merged). Fix flaky tests: SubprocessASGIServer singleton cleanup and SQLite lock retry #21405's `_instance_key` cleanup only runs if `stop()` actually runs. This PR makes `stop()` actually run on exception paths.
Does not overlap with fix: properly drain workers in prefect_test_harness when used in async contexts #19770 (async drain refactor). The exact `drain_workers` coroutine and `run_coro_as_sync` wrapper from fix: properly drain workers in prefect_test_harness when used in async contexts #19770 are preserved verbatim, just moved onto the ExitStack.
Does not overlap with Fix EventsWorker singleton leak in prefect_test_harness #19343 (EventsWorker singleton leak). `EventsWorker.drain_all()` still happens in the drain callback.

Tests

One new regression test, verified to fail against the pre-fix source:

`test_prefect_test_harness_cleans_up_on_exception_in_body` — raises from inside `with prefect_test_harness():`, asserts `SubprocessASGIServer._instances` has no `None` key after the unwind, then proves a subsequent harness actually works.

Test plan

`tests/testing/test_utilites.py` — 9/9 pass with the fix
Regression test verified to fail on pre-fix source
Ruff + format clean

…ct_test_harness `prefect_test_harness` previously placed `test_server.stop()` after the generator `yield` with no try/finally, so an exception raised from the `with` block body skipped the teardown entirely. Because `SubprocessASGIServer` is a singleton keyed by `port=None`, the stale entry lingered in `_instances[None]` with `running=True`, and the next `prefect_test_harness()` in the same process silently reused it (start() becomes a no-op when running=True) — pointing the new harness at a dead/stale subprocess. This matches the observed `test_prefect_test_harness` leak-check flake (#21405 addressed a related `_instances` cleanup bug but did not cover the case where `stop()` never runs at all). Changes: * `prefect_test_harness` registers `test_server.stop()` on the ExitStack *before* calling `start()`, and registers the log/event drain as a stack callback that runs first (LIFO). Exceptions from the harness body, from `start()` itself, and from the drain are all now covered by deterministic teardown. * `SubprocessASGIServer.start()` defensively verifies the subprocess is alive when `running=True` — if `poll()` reports a dead process, the stale state is reset and a fresh subprocess is spawned instead of returning a no-op that would leave callers pointed at a dead socket. * Two new regression tests pin both behaviors. Both were verified to fail against the pre-fix source. * Updated `test_start_is_idempotent` to configure `popen_mock.poll` to return `None` — a real running subprocess's `poll()` returns `None`, and the new defensive check correctly treats a `MagicMock` return value as "process has died." Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

codspeed-hq · 2026-04-08T15:30:58Z

Merging this PR will not alter performance

✅ 2 untouched benchmarks

_{Comparing fix/prefect-test-harness-exception-cleanup (6d5befb) with main (f6addff)}

Dropping the defensive dead-subprocess check in SubprocessASGIServer.start(). It was speculative defense-in-depth, not tied to any observed failure — the test_prefect_test_harness py3.14 flake we originally chased is NOT caused by a dead subprocess (the stale subprocess in that scenario is alive, just pointed at the wrong database). Keeps only the ExitStack-callback refactor in prefect_test_harness that guarantees test_server.stop() runs even on exception from the harness body — a correct fix for a real but *different* bug from the py3.14 flake. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

zzstoatzz requested review from chrisguidry, cicdw and desertaxle as code owners April 8, 2026 15:25

zzstoatzz marked this pull request as draft April 8, 2026 15:44

zzstoatzz changed the title ~~fix(testing): clean up SubprocessASGIServer on exception inside prefect_test_harness~~ fix(testing): run prefect_test_harness teardown on exception-path exits Apr 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(testing): run prefect_test_harness teardown on exception-path exits#21469

fix(testing): run prefect_test_harness teardown on exception-path exits#21469
zzstoatzz wants to merge 2 commits intomainfrom
fix/prefect-test-harness-exception-cleanup

zzstoatzz commented Apr 8, 2026 •

edited

Loading

Uh oh!

codspeed-hq bot commented Apr 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

zzstoatzz commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Scope

The gap

Fix

What this deliberately does NOT include

Relationship to prior PRs

Tests

Test plan

Uh oh!

codspeed-hq bot commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

zzstoatzz commented Apr 8, 2026 •

edited

Loading

codspeed-hq bot commented Apr 8, 2026 •

edited

Loading