fix(testing): run prefect_test_harness teardown on exception-path exits#21469
Draft
fix(testing): run prefect_test_harness teardown on exception-path exits#21469
Conversation
…ct_test_harness `prefect_test_harness` previously placed `test_server.stop()` after the generator `yield` with no try/finally, so an exception raised from the `with` block body skipped the teardown entirely. Because `SubprocessASGIServer` is a singleton keyed by `port=None`, the stale entry lingered in `_instances[None]` with `running=True`, and the next `prefect_test_harness()` in the same process silently reused it (start() becomes a no-op when running=True) — pointing the new harness at a dead/stale subprocess. This matches the observed `test_prefect_test_harness` leak-check flake (#21405 addressed a related `_instances` cleanup bug but did not cover the case where `stop()` never runs at all). Changes: * `prefect_test_harness` registers `test_server.stop()` on the ExitStack *before* calling `start()`, and registers the log/event drain as a stack callback that runs first (LIFO). Exceptions from the harness body, from `start()` itself, and from the drain are all now covered by deterministic teardown. * `SubprocessASGIServer.start()` defensively verifies the subprocess is alive when `running=True` — if `poll()` reports a dead process, the stale state is reset and a fresh subprocess is spawned instead of returning a no-op that would leave callers pointed at a dead socket. * Two new regression tests pin both behaviors. Both were verified to fail against the pre-fix source. * Updated `test_start_is_idempotent` to configure `popen_mock.poll` to return `None` — a real running subprocess's `poll()` returns `None`, and the new defensive check correctly treats a `MagicMock` return value as "process has died." Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Dropping the defensive dead-subprocess check in SubprocessASGIServer.start(). It was speculative defense-in-depth, not tied to any observed failure — the test_prefect_test_harness py3.14 flake we originally chased is NOT caused by a dead subprocess (the stale subprocess in that scenario is alive, just pointed at the wrong database). Keeps only the ExitStack-callback refactor in prefect_test_harness that guarantees test_server.stop() runs even on exception from the harness body — a correct fix for a real but *different* bug from the py3.14 flake. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Scope
This PR does NOT fix the
test_prefect_test_harnesspy3.14 postgres flake (#21405, #21414, #21460). It addresses a related-but-separate gap inprefect_test_harnessexception handling that I noticed while investigating that flake.The gap
prefect_test_harnessis a@contextmanagerthat placestest_server.stop()andrun_coro_as_sync(drain_workers())after theyieldwith notry/finally:```python
yield
run_coro_as_sync(drain_workers())
test_server.stop()
```
If any exception propagates out of the harness body, the lines after `yield` are never executed. `test_server.stop()` is skipped, which means the `SubprocessASGIServer` singleton cleanup path (that #21405 carefully set up via `_instance_key`) is never reached — `_instances[None]` is left holding a stale entry with `running=True`, and the next `prefect_test_harness()` in the same process silently reuses it.
This is narrower than what the py3.14 flake exercises. The failing test (`test_prefect_test_harness`) runs clean — no exception — so this gap is invisible to that test. But any other test that raises from inside the harness would leave the process in a polluted state for subsequent tests, and that pollution is hard to attribute back to the source. I hit this gap by accident while writing a test that raised inside the harness and saw the next harness start() become a no-op.
Fix
Register `test_server.stop()` and the log/event drain on the harness's existing `ExitStack` so both run regardless of how the harness body exits. LIFO ordering means the drain runs while the server is still alive (before `stop()`), matching the prior ordering.
```python
test_server = SubprocessASGIServer()
stack.callback(test_server.stop) # runs on unwind, LIFO
test_server.start(...)
stack.enter_context(temporary_settings({PREFECT_API_URL: test_server.api_url}))
...
stack.callback(lambda: run_coro_as_sync(drain_workers())) # runs first (LIFO)
yield
```
What this deliberately does NOT include
Relationship to prior PRs
Tests
One new regression test, verified to fail against the pre-fix source:
Test plan