Skip to content

Adaptive heartbeat + delay-aware idle sleep for macOS and Linux#764

Open
johnmci wants to merge 4 commits intoOpenSmalltalk:Cogfrom
CorporateSmalltalkConsultingLtd:phase4-adaptive-heartbeat
Open

Adaptive heartbeat + delay-aware idle sleep for macOS and Linux#764
johnmci wants to merge 4 commits intoOpenSmalltalk:Cogfrom
CorporateSmalltalkConsultingLtd:phase4-adaptive-heartbeat

Conversation

@johnmci
Copy link
Copy Markdown
Contributor

@johnmci johnmci commented Feb 25, 2026

The Squeak/Cuis VM idles at ~8% CPU even when no Smalltalk code is running. Two causes:

  1. The image requests 1ms sleep via relinquishProcessorForMicroseconds: — the VM honors this literally, causing ~850 wakeups/sec
  2. The heartbeat thread runs at full speed (2ms interval) even when the main thread is blocked in select()/sleep, delivering unnecessary timer ticks

Solution

Phase 4: Adaptive heartbeat (macOS)

  • Add mainThreadIsIdle flag, set by main thread before sleeping, cleared on wake
  • When idle, heartbeat backs off to 50ms (IDLE_BEAT_MS) — still accurate enough for Delay resolution
  • Replace select() with kqueue on macOS (USE_KQUEUE/__APPLE__ gated) — consolidates heartbeat timer, AIO file descriptors, and display pipe into a single kevent() call
  • When idle, fold the heartbeat timer into the kqueue wait — one syscall instead of two threads polling

Phase 5: Delay-aware idle sleep (Linux)

  • Port the getNextWakeupUsecs() logic to Linux X11 and null display drivers
  • Ignore the image's requested sleep time; instead sleep for min(timeUntilNextDelay, 500ms)
  • If a Delay is already overdue, return immediately (no sleep)
  • Set mainThreadIsIdle with memory barriers (__sync_synchronize) for heartbeat throttling

Files changed

  • platforms/unix/vm-display-X11/sqUnixX11.c — delay-aware idle for X11
  • platforms/unix/vm-display-null/sqUnixDisplayNull.c — delay-aware idle for headless
  • platforms/unix/vm/sqUnixHeartbeat.cmainThreadIsIdle made non-static; adaptive heartbeat
  • platforms/iOS/vm/Common/sqMacV2Time.c — kqueue + adaptive heartbeat (macOS)

Results

Platform Before After Reduction
macOS (kqueue) ~8-9% ~0.8% 91%
Linux x86_64 7.7% 1.1% 86%

Tested with Squeak 6.0 and Cuis on both platforms. No image-side changes required — all improvements are VM-level.

Design notes

  • All macOS/kqueue changes gated on USE_KQUEUE/__APPLE__ — Linux epoll path untouched
  • MAX_IDLE_USECS set to 500ms after benchmarking 50ms/500ms/1000ms — best balance of CPU savings vs Delay accuracy
  • Heartbeat thread is still needed for preemption of active Smalltalk code — only throttled when idle
  • timerfd/epoll consolidation on Linux deferred to future work

When the main VM thread is idle (sleeping in ioRelinquishProcessorForMicroseconds),
the heartbeat thread backs off from 2ms to 50ms interval. A volatile
mainThreadIsIdle flag with memory fences signals the heartbeat thread.

On Apple Silicon M1 (macOS, headless, idle):
  Before: ~3.2% CPU (heartbeat 1.8%, main 0.7%, other 0.7%)
  After:  ~1.5% CPU (heartbeat 1.2%, main 0.3%, other 0.0%)

No #ifdef needed - works on all platforms using the pthread heartbeat
(non-ITIMER_HEARTBEAT path).
Replace select() with kqueue in aioPoll for macOS Cocoa builds.
macOS builds do not define HAVE_CONFIG_H, so the epoll/kqueue
detection in the autoconf path was never reached. USE_KQUEUE is
now defined via __APPLE__ when HAVE_CONFIG_H is not set.

kqueue benefits over select:
- O(1) event notification vs O(n) fd scanning
- No FD_SETSIZE limit (1024)
- EV_CLEAR gives edge-triggered semantics (like EPOLLET)
- struct timespec timeout (nanosecond precision)

All aio functions updated: aioInit, aioFini, aioPoll,
aioEnable, aioHandle, aioSuspend, aioDisable.

Combined with adaptive heartbeat: 3.2% -> 1.3% idle CPU
on Apple Silicon M1 (headless).
When the main thread is idle, register EVFILT_TIMER in kqueue so
heartbeat events are delivered directly through kevent(). The heartbeat
thread backs off to 500ms nanosleep (2 wakeups/sec vs 500/sec stock).

Architecture:
- ioRelinquishProcessorForMicroseconds registers EVFILT_TIMER on first
  idle entry, keeps it registered across consecutive idle calls
- aioPoll kqueue path detects timer events and calls heartbeat()
- heartbeat thread: when mainThreadIsIdle && kqHeartbeatActive,
  sleeps 500ms (was 2ms stock, 50ms adaptive)
- heartbeat() made non-static for aio.c access

macOS Apple Silicon M1 benchmark (headless, idle, 45s settle):
  Stock:   3.2% total (heartbeat 1.8%, main 0.7%, other 0.7%)
  Phase 4: 1.2% total (heartbeat 1.2%, main 0.0%, other 0.0%)

63% reduction in idle CPU. Main thread at true 0.0%.
Heartbeat thread wakeups: 500/sec -> 2/sec (250x reduction).
…s (JMM-619)

Port getNextWakeupUsecs() logic from Mac path to Linux display drivers.
Ignore image-requested sleep time; use min(timeUntilNextDelay, 500ms).
Set mainThreadIsIdle flag for adaptive heartbeat (non-static).

Result: 7.7% -> 1.1% CPU idle on Linux x86_64 (86% reduction).
@LinqLover
Copy link
Copy Markdown
Contributor

That sounds very interesting. Only recently I noticed that the energy consumption of Squeak on my Linux laptop is comparably very high due to many wake-ups (2-3 W per image according to powertop). It would be interesting to learn how much this change can reduce energy consumption.

heartbeat backs off to 50ms (IDLE_BEAT_MS) — still accurate enough for Delay resolution

Is it? In Squeak we documented the expectation of ~1 ms precision in Duration>>wait. With 50 times as much, I wonder whether any code such as concurrent animations or other high-frequency loops (not necessarily in the standard images) might stop working as expected. Any packages that uses 1 milliSecond wait 1000 times within a longer operation would suddenly display degraded performance.

Should this constant perhaps be customizable through a VM parameter from the image side? Just brainstorming. This looks quite interesting.

@OpenSmalltalk-Bot
Copy link
Copy Markdown

OpenSmalltalk-Bot commented Feb 26, 2026 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants