Skip to content

feat(aevatar): auto-free the port when aevatar web hits an address-in-use bind#82

Merged
eanzhao merged 1 commit intodevfrom
feat/aevatar-web-port-auto-kill
Apr 18, 2026
Merged

feat(aevatar): auto-free the port when aevatar web hits an address-in-use bind#82
eanzhao merged 1 commit intodevfrom
feat/aevatar-web-port-auto-kill

Conversation

@eanzhao
Copy link
Copy Markdown
Owner

@eanzhao eanzhao commented Apr 18, 2026

Summary

Follow-up to #80 — the port-auto-kill commit missed the merge window. Without this, a stale listener on 6688 (usually the previous aexon aevatar web session) forces the user to hunt down and kill the leftover by hand.

What it does

On an AddressAlreadyInUse bind failure, AevatarWebHost now:

  1. Walks the exception chain to confirm it's really a bind error (SocketException.SocketErrorCode == AddressAlreadyInUse) — nothing else is caught.
  2. Probes the OS for listening PIDs on the port:
    • macOS / Linux: lsof -ti :<port> -sTCP:LISTEN
    • Windows: netstat -ano -p tcp
  3. Kills each holder (skipping the aexon process itself) and waits up to 2s per process.
  4. Retries StartOnceAsync exactly once. A second failure propagates the real error.

Test plan

  • dotnet build src/Aexon.Cli/Aexon.Cli.csproj — clean.
  • Live smoke test: a Python listener squats an arbitrary port, then aexon aevatar web --port <squatted> identifies the pid, kills it (log shows → killing pid N (Python)), and the web UI boots normally with /api/health responding on the retry.
  • Related test suite (Aevatar|Storage|BuiltinCommands filters, 132 tests) — all pass.
  • Reviewer: run aexon aevatar web twice in quick succession (the second would previously fail with Failed to bind to address http://127.0.0.1:6688) and confirm the second run auto-frees and comes up.

…in-use bind

Previously a stale listener on 6688 (or whichever port was requested) meant
every subsequent `aexon aevatar web` failed with a bind error and the user
had to hunt down and kill the leftover process by hand. This was
particularly annoying because the most common leftover is the *previous*
aexon session — the user was already committed to running the web UI on
that port.

On an `AddressAlreadyInUse` socket error, the host now:

  1. Walks the exception chain to confirm it's really a bind failure
     (`SocketException.SocketErrorCode == AddressAlreadyInUse`).
  2. Probes the OS for the PID(s) holding the port —
     `lsof -ti :<port> -sTCP:LISTEN` on macOS/Linux,
     `netstat -ano -p tcp` on Windows.
  3. Kills each holder (skipping itself, just in case) and waits up to 2s
     per process for a clean exit.
  4. Retries `StartOnceAsync` exactly once. A second failure propagates so
     the user still gets a real error message for anything we can't recover
     from (stuck TIME_WAIT on a non-loopback bind, permission denied, etc.).

Smoke-tested: a Python squatter on an arbitrary port gets identified
(pid + process name logged), killed, and the web UI boots on the same
port with `/api/health` responding normally on the retry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@eanzhao eanzhao merged commit eae3b3f into dev Apr 18, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant