fix(ssh): tear down tunnel relay on socket hangup instead of busy-spinning (#1769) by datlechin · Pull Request #1772 · TableProApp/TablePro

datlechin · 2026-06-26T03:30:19Z

Problem

TablePro pins ~100% of a CPU core while idle when an SSH tunnel's socket drops (laptop sleep, network change, server-side idle timeout). The tunnel's relay loop keeps running at full tilt forever instead of tearing down.

Root cause

Both SSH relay loops polled their fds for POLLIN only and never inspected POLLHUP/POLLERR/POLLNVAL:

LibSSH2Tunnel.runRelay (client socket <-> SSH channel)
LibSSH2TunnelFactory.startChannelRelay (jump-host socketpair <-> SSH channel)

When the transport (or, for jump hosts, the inter-hop UNIX socketpair) peer closes, that fd sits permanently readable-at-EOF. poll() returns immediately every iteration and libssh2_channel_read returns EAGAIN forever (libssh2 never surfaces a clean channel EOF when only the transport died), so none of the existing branches fire and the loop spins. For multi-hop tunnels the keepalive also can't detect the death, because it writes into a still-locally-open socketpair, so nothing ever tore the dead tunnel down.

Fix

The two relay loops were near-duplicate code carrying the same bug, so they are replaced with one tested engine:

RelayPollState.swift — pure relayFDState(_:) classifier: POLLERR/POLLNVAL are fatal; POLLHUP drains buffered data then stops (POLLHUP can coexist with readable data, so the last bytes are not dropped); POLLIN is readable.
SSHChannelRelay.swift — one bidirectional relay engine (pure Swift/Darwin, no libssh2). It terminates on hangup/error instead of spinning, returns why it stopped, and also closes a secondary spin in the write path (a transport hangup during an EAGAIN write-wait now bails). Channel I/O sits behind an SSHChannelIO protocol so the whole loop is unit-testable.
LibSSH2ChannelIO.swift — the libssh2-backed SSHChannelIO, preserving the existing EAGAIN/EOF semantics and serializing through sessionQueue.

Death propagation reuses the existing markDead -> onDeath -> recoverDeadTunnel machinery, no new callback types:

runRelay calls markDead() on transport hangup (gated on isRunning, so a deliberate close doesn't trigger recovery). A client-side hangup just ends that one relay.
startChannelRelay now shutdowns its socketpair end before closing it on exit, so a dead hop's EOF cascades down the chain to runRelay, which marks the tunnel dead. This also covers the multi-hop keepalive gap.

Tests

RelayPollStateTests — 8 cases, including the exact POLLIN | POLLHUP -> drainThenStop misclassification that caused the spin, and fatal-wins cases.
SSHChannelRelayTests — runs the engine over real socketpairs with a scripted channel and asserts a closed peer terminates as .transportHangup / .localClosed (the spin would fail these by timing out), plus data-forwarding both directions.

Limitation

Recovery flows through runRelay, so a transport that dies before any client has connected relies on the existing 30s health-monitor backstop to reconnect. The CPU spin itself is fixed unconditionally.

Verification

swiftlint lint --strict clean on changed files. The user builds and runs the test suite in Xcode.

Credit to @drekinov for the detailed ps/sample/lsof/fs_usage investigation in #1769 that pinned the spinning fd and the missing POLLHUP/POLLERR handling.

…nning (#1769)

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 71806a44e2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-26T03:33:14Z

+            if transportState == .drainThenStop { return .transportHangup }
+            if localState == .drainThenStop { return .localClosed }


Drain all queued bytes before stopping on hangup

When a peer sends more than one relay buffer and then closes, poll can report POLLIN | POLLHUP; this loop pumps only a single 32 KiB chunk and then these lines return, so any remaining bytes already queued on the socket/channel are discarded. This can truncate large requests or result chunks that arrive with a close; keep draining until recv/channel read reaches EOF or would-block before terminating on the hangup.

Useful? React with 👍 / 👎.

… hangup

fix(ssh): tear down tunnel relay on socket hangup instead of busy-spi…

71806a4

…nning (#1769)

chatgpt-codex-connector Bot reviewed Jun 26, 2026

View reviewed changes

refactor(ssh): detect transport EOF directly and propagate write-path…

442bf4f

… hangup

datlechin merged commit d23e45e into main Jun 27, 2026
3 checks passed

datlechin deleted the fix/ssh-relay-busy-spin-1769 branch June 27, 2026 04:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix(ssh): tear down tunnel relay on socket hangup instead of busy-spinning (#1769)#1772

fix(ssh): tear down tunnel relay on socket hangup instead of busy-spinning (#1769)#1772
datlechin merged 2 commits into
mainfrom
fix/ssh-relay-busy-spin-1769

datlechin commented Jun 26, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		if transportState == .drainThenStop { return .transportHangup }
		if localState == .drainThenStop { return .localClosed }

Uh oh!

Uh oh!

Conversation

datlechin commented Jun 26, 2026

Problem

Root cause

Fix

Tests

Limitation

Verification

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant