Adaptive heartbeat + delay-aware idle sleep for macOS and Linux#764
Adaptive heartbeat + delay-aware idle sleep for macOS and Linux#764johnmci wants to merge 4 commits intoOpenSmalltalk:Cogfrom
Conversation
When the main VM thread is idle (sleeping in ioRelinquishProcessorForMicroseconds), the heartbeat thread backs off from 2ms to 50ms interval. A volatile mainThreadIsIdle flag with memory fences signals the heartbeat thread. On Apple Silicon M1 (macOS, headless, idle): Before: ~3.2% CPU (heartbeat 1.8%, main 0.7%, other 0.7%) After: ~1.5% CPU (heartbeat 1.2%, main 0.3%, other 0.0%) No #ifdef needed - works on all platforms using the pthread heartbeat (non-ITIMER_HEARTBEAT path).
Replace select() with kqueue in aioPoll for macOS Cocoa builds. macOS builds do not define HAVE_CONFIG_H, so the epoll/kqueue detection in the autoconf path was never reached. USE_KQUEUE is now defined via __APPLE__ when HAVE_CONFIG_H is not set. kqueue benefits over select: - O(1) event notification vs O(n) fd scanning - No FD_SETSIZE limit (1024) - EV_CLEAR gives edge-triggered semantics (like EPOLLET) - struct timespec timeout (nanosecond precision) All aio functions updated: aioInit, aioFini, aioPoll, aioEnable, aioHandle, aioSuspend, aioDisable. Combined with adaptive heartbeat: 3.2% -> 1.3% idle CPU on Apple Silicon M1 (headless).
When the main thread is idle, register EVFILT_TIMER in kqueue so heartbeat events are delivered directly through kevent(). The heartbeat thread backs off to 500ms nanosleep (2 wakeups/sec vs 500/sec stock). Architecture: - ioRelinquishProcessorForMicroseconds registers EVFILT_TIMER on first idle entry, keeps it registered across consecutive idle calls - aioPoll kqueue path detects timer events and calls heartbeat() - heartbeat thread: when mainThreadIsIdle && kqHeartbeatActive, sleeps 500ms (was 2ms stock, 50ms adaptive) - heartbeat() made non-static for aio.c access macOS Apple Silicon M1 benchmark (headless, idle, 45s settle): Stock: 3.2% total (heartbeat 1.8%, main 0.7%, other 0.7%) Phase 4: 1.2% total (heartbeat 1.2%, main 0.0%, other 0.0%) 63% reduction in idle CPU. Main thread at true 0.0%. Heartbeat thread wakeups: 500/sec -> 2/sec (250x reduction).
…s (JMM-619) Port getNextWakeupUsecs() logic from Mac path to Linux display drivers. Ignore image-requested sleep time; use min(timeUntilNextDelay, 500ms). Set mainThreadIsIdle flag for adaptive heartbeat (non-static). Result: 7.7% -> 1.1% CPU idle on Linux x86_64 (86% reduction).
|
That sounds very interesting. Only recently I noticed that the energy consumption of Squeak on my Linux laptop is comparably very high due to many wake-ups (2-3 W per image according to powertop). It would be interesting to learn how much this change can reduce energy consumption.
Is it? In Squeak we documented the expectation of ~1 ms precision in Should this constant perhaps be customizable through a VM parameter from the image side? Just brainstorming. This looks quite interesting. |
|
When you use Delay the logic determines the next time the VM has to wake up, this is known by calling getNextWakeupUsecs() so that is the maximum amount of time the squeak vm thread can wait.
So a delay of 1 ms would set a max sleep time of 1 ms.
....
John M. McIntosh. Corporate Smalltalk Consulting Ltd https://www.linkedin.com/in/smalltalk
Sent from [Proton Mail](https://proton.me/mail/home) for iOS.
…-------- Original Message --------
On Wednesday, 02/25/26 at 17:48 Christoph Thiede via Vm-dev ***@***.***> wrote:
LinqLover left a comment [(OpenSmalltalk/opensmalltalk-vm#764)](#764 (comment))
That sounds very interesting. Only recently I noticed that the energy consumption of Squeak on my Linux laptop is comparably very high due to many wake-ups (2-3 W per image according to powertop). It would be interesting to learn how much this change can reduce energy consumption.
> heartbeat backs off to 50ms (IDLE_BEAT_MS) — still accurate enough for Delay resolution
Is it? In Squeak we documented the expectation of ~1 ms precision in Duration>>wait. With 50 times as much, I wonder whether any code such as concurrent animations or other high-frequency loops (not necessarily in the standard images) might stop working as expected. Any packages that uses 1 milliSecond wait 1000 times within a longer operation would suddenly display degraded performance.
Should this constant perhaps be customizable through a VM parameter from the image side? Just brainstorming. This looks quite interesting.
—
Reply to this email directly, [view it on GitHub](#764 (comment)), or [unsubscribe](https://github.com/notifications/unsubscribe-auth/AIJPEW66476WEB6TGLQRLR34NZGFRAVCNFSM6AAAAACV74HKHOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTSNRTGM4TIMJSGA).
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
The Squeak/Cuis VM idles at ~8% CPU even when no Smalltalk code is running. Two causes:
relinquishProcessorForMicroseconds:— the VM honors this literally, causing ~850 wakeups/secselect()/sleep, delivering unnecessary timer ticksSolution
Phase 4: Adaptive heartbeat (macOS)
mainThreadIsIdleflag, set by main thread before sleeping, cleared on wakeIDLE_BEAT_MS) — still accurate enough for Delay resolutionselect()with kqueue on macOS (USE_KQUEUE/__APPLE__gated) — consolidates heartbeat timer, AIO file descriptors, and display pipe into a singlekevent()callPhase 5: Delay-aware idle sleep (Linux)
getNextWakeupUsecs()logic to Linux X11 and null display driversmin(timeUntilNextDelay, 500ms)mainThreadIsIdlewith memory barriers (__sync_synchronize) for heartbeat throttlingFiles changed
platforms/unix/vm-display-X11/sqUnixX11.c— delay-aware idle for X11platforms/unix/vm-display-null/sqUnixDisplayNull.c— delay-aware idle for headlessplatforms/unix/vm/sqUnixHeartbeat.c—mainThreadIsIdlemade non-static; adaptive heartbeatplatforms/iOS/vm/Common/sqMacV2Time.c— kqueue + adaptive heartbeat (macOS)Results
Tested with Squeak 6.0 and Cuis on both platforms. No image-side changes required — all improvements are VM-level.
Design notes
USE_KQUEUE/__APPLE__— Linux epoll path untouchedMAX_IDLE_USECSset to 500ms after benchmarking 50ms/500ms/1000ms — best balance of CPU savings vs Delay accuracy