Use Web Worker heartbeat to keep connection alive during blocking dialogs#5784
Use Web Worker heartbeat to keep connection alive during blocking dialogs#5784evnchn wants to merge 3 commits intozauberzeug:mainfrom
Conversation
…logs When blocking browser dialogs (alert, confirm, print) freeze the main JS thread, Socket.IO cannot respond to server pings, causing disconnection and page reload. This adds a Web Worker that sends periodic HTTP heartbeat requests on a separate thread, keeping the server-side client alive. Also fixes stale next_message_id on reconnection by updating Socket.IO query params in the disconnect handler. Closes zauberzeug#2410 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Testing method:
|
|
What would happen if we moved the entire Socket.IO management into a Web Worker? When the main thread is blocked, what happens to the messages though... |
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
I can confirm this behaviour :d We had a simple rename javascript dialog and just waiting a minute or so disconnected the client. Our solution was simple, just do an async real ui.dialog() and await it. Is it possible to add an async |
|
@phifuh On your
*: Overall, I am not inclined to add another JS execution means beside |
|
@falkoschindler If memory serves, none of the files touched here is actually changed in what's merged in 3.10, so this is rather good candidate for 3.10 inclusion. I'd imagine it's good to "change all files, but each just a little" over "10 PRs stabbed at a particular file" in one release because it gives us easier bisection for free. |
falkoschindler
left a comment
There was a problem hiding this comment.
Thanks for this PR — the Web Worker approach is a clever way to work around the main thread freeze.
I haven't done a detailed manual review yet because I had Claude look at it first, and it flagged a number of issues that seem worth addressing before I dig deeper. Posting them below so you can take a pass at them.
It would also be great if the PR description included a short "how to test manually" section (e.g., open a page, trigger alert(), wait past the reconnect timeout, dismiss, and verify state is preserved).
-
Heartbeat cancels delete tasks permanently — client may never be cleaned up
_handle_heartbeat()cancels all pending delete tasks but does not reschedule them. If the browser is truly gone (e.g., crashed with the worker still running briefly, or a misbehaving client), the delete task is cancelled and never recreated. The client instance will leak inClient.instancesindefinitely — no mechanism re-creates the delete task after a heartbeat.The heartbeat should reset the timer rather than cancel it outright. Consider replacing the cancel-only approach with a "reschedule" pattern: cancel the current delete task and create a new one that sleeps for another
reconnect_timeoutperiod. This way, if heartbeats stop, cleanup still happens.def _handle_heartbeat(self) -> None: """Reset pending delete tasks to keep the client alive during blocking browser dialogs.""" for document_id in list(self._delete_tasks): self._reschedule_delete_task(document_id)
-
Heartbeat worker never stops on normal page unload
The worker is started on page load but there is no
beforeunloadorpagehidelistener to send astopmessage to the worker. While the worker is garbage-collected when the page is destroyed in most browsers, this is not guaranteed (especially with bfcache). Adding a cleanup listener would be defensive and cheap:window.addEventListener("pagehide", () => { if (window.heartbeatWorker) { window.heartbeatWorker.postMessage({ type: "stop" }); window.heartbeatWorker.terminate(); } });
-
On Air compatibility not verified
With On Air, the browser talks to a relay server. The heartbeat POST goes to
window.location.origin(the relay), which needs to proxy it back to the NiceGUI server via the_handle_httphandler inair.py. This should work since the relay proxies arbitrary HTTP requests, but it hasn't been tested. Please manually verify heartbeat behavior withon_air=Trueand confirm it works. -
Heartbeat URL doesn't account for reverse proxy / non-origin deployments
The heartbeat URL is built as
window.location.origin + options.prefix + ..., butwindow.location.originmay not be correct behind a reverse proxy with path rewriting. Other NiceGUI network calls go through Socket.IO (which uses the configured path), so they don't have this issue. Consider using a relative URL instead:url: `${options.prefix}/_nicegui/heartbeat`,
fetch()with a relative URL will use the page's origin automatically and works correctly behind proxies. -
Heartbeat interval may be too aggressive for large deployments
With
reconnect_timeout=3.0, the heartbeat fires every 1.5 seconds. For apps with many concurrent clients, this adds significant HTTP overhead. The minimum of 0.5s seems very low. Consider a higher floor (e.g., 2-3s minimum) and document the tradeoff. -
Tests use long
time.sleep()calls (15s)test_connection_survives_alert_with_high_reconnect_timeoutsleeps for 15 seconds in a single test. This will significantly slow down the test suite. Consider reducingreconnect_timeoutandping_interval/ping_timeoutfor this test to keep the sleep under 5s, or mark the test as@pytest.mark.slowif such a marker exists. -
window.heartbeatWorkerpollutes the global namespaceConsider using a local variable inside
createAppinstead of attaching towindow, unless it needs to be accessed externally (e.g., for debugging). If it does, document why. -
.catch(() => {})silently swallows all fetch errors in the workerThis is understandable (heartbeat failures are expected during shutdown), but a
console.debugwould help with debugging connectivity issues during development. -
Test uses
counter.__setitem__lambda hackIn
test_connection_survives_alert_with_high_reconnect_timeout, this pattern is unnecessarily obscure:lambda: label.set_text(str(counter.__setitem__('value', counter['value'] + 1) or counter['value']))
A simple helper function or
nonlocalvariable would be clearer.
…elative URL, raise interval floor - Heartbeat reschedules delete tasks instead of permanently canceling (prevents client leak) - Add pagehide listener to stop/terminate worker on navigation - Use relative heartbeat URL (works behind reverse proxies) - Raise minimum heartbeat interval from 0.5s to 2s - Reduce test sleep times (15s -> 5s) - Replace counter.__setitem__ hack with simple list+helper Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Motivation
When blocking browser dialogs (
alert(),confirm(),print()) freeze the main JavaScript thread, Socket.IO cannot respond to server pings. This causes the server to consider the client disconnected, eventually deleting the client and forcing a full page reload — losing all user state.Addresses #2410
Implementation
nicegui-heartbeat.js): A lightweight Web Worker runs on a separate thread, sending periodic HTTP POST requests to a new/_nicegui/heartbeatendpoint. Since Web Workers are unaffected by main thread blocking, heartbeats continue even during blocking dialogs.nicegui.py): Receives heartbeat POSTs and callsclient._handle_heartbeat(), which cancels any pending delete tasks to keep the client alive.next_message_idfix (nicegui.js): On disconnect, Socket.IO reconnects using the original query parameters from page load. After many messages, the oldnext_message_idhas been pruned from the outbox history, causingtry_rewindto fail and trigger a barewindow.location.reload(). Fixed by updatingoptions.query.next_message_id = window.nextMessageIdin the disconnect handler.reconnect_timeoutpassed to client JS (client.py,index.html): The heartbeat interval is derived fromreconnect_timeout * 0.5(minimum 0.5s) so the worker adapts to the configured timeout.Progress