Skip to content

Regression 4.0.10 to 4.0.11: -pthread + -sWASM_WORKERS aborts on trivial allocation/deallocation workload #26619

@slowriot

Description

@slowriot

There is a regression between 4.0.10 and 4.0.11. The reproduction code below will crash reliably when built with 4.0.11 upwards, but not with 4.0.10.

Code to repoduce in C++:

#include <cstdint>
#include <emscripten.h>
#include <emscripten/wasm_worker.h>

auto main() -> int {
  emscripten_wasm_worker_post_function_v(emscripten_malloc_wasm_worker(1024 * 1024), []{
    for(unsigned i{}; i < 70'000'000; ++i) {
      delete new uint8_t{0};
    }
  });
  emscripten_set_main_loop([]{new uint8_t{0};}, 0, false);
  return EXIT_SUCCESS;
}

Code to repoduce in C:

#include <emscripten.h>
#include <emscripten/wasm_worker.h>
#include <stdlib.h>

static void worker_loop(void) {
  for(unsigned i = 0; i < 70000000u; ++i) {
    free(malloc(1));
  }
}

static void main_loop(void) {
  malloc(1);
}

int main(void) {
  emscripten_wasm_worker_post_function_v(emscripten_malloc_wasm_worker(1024 * 1024), worker_loop);
  emscripten_set_main_loop(main_loop, 0, false);
  return EXIT_SUCCESS;
}

Building and running:

em++ main.cpp -pthread -sWASM_WORKERS -o build/repro.html
emcc main.c -pthread -sWASM_WORKERS -o build/repro-c.html
emrun build/repro.html
emrun build/repro-c.html

The following elements are all needed for this to take place:

  • At least one allocation in the main thread (doesn't matter if it's freed or not). Without this allocation, no crash.
  • A large (not fixed) number of allocations and frees on the worker thread. Larger allocations seem to reduce the number needed to trip the bug, but no specific limits have been found.

Browser console output

The browser-visible failure is typically in one of the following shapes, changing from run to run:

On the main thread:

Uncaught RuntimeError: Aborted(native code called abort())
abort http://localhost:6931/repro.js:919
__abort_js http://localhost:6931/repro.js:1843
callUserCallback http://localhost:6931/repro.js:1542
runIter http://localhost:6931/repro.js:2190
MainLoop_runner http://localhost:6931/repro.js:2297
repro.js:919:11
abort http://localhost:6931/repro.js:919
__abort_js http://localhost:6931/repro.js:1843
abort http://localhost:6931/repro.wasm:15306
emscripten_builtin_malloc http://localhost:6931/repro.wasm:28994
repro.wasm.operator_new_impl(unsigned long) http://localhost:6931/repro.wasm:37895
repro.wasm.operator new(unsigned long) http://localhost:6931/repro.wasm:37857
repro.wasm.main::$_1::operator()() const http://localhost:6931/repro.wasm:2640
repro.wasm.main::$_1::__invoke() http://localhost:6931/repro.wasm:2398
callUserCallback http://localhost:6931/repro.js:1542
runIter http://localhost:6931/repro.js:2190
MainLoop_runner http://localhost:6931/repro.js:2297

On the worker:

RuntimeError: Aborted(native code called abort()) repro.js:919:11
abort http://localhost:6931/repro.js:919
__abort_js http://localhost:6931/repro.js:1843
abort http://localhost:6931/repro.wasm:15306
emscripten_builtin_malloc http://localhost:6931/repro.wasm:28994
repro.wasm.operator_new_impl(unsigned long) http://localhost:6931/repro.wasm:37895
repro.wasm.operator new(unsigned long) http://localhost:6931/repro.wasm:37857
repro.wasm.main::$_0::operator()() const http://localhost:6931/repro.wasm:2513
repro.wasm.main::$_0::__invoke() http://localhost:6931/repro.wasm:2323
_wasmWorkerRunPostMessage http://localhost:6931/repro.js:1568
callUserCallback http://localhost:6931/repro.js:1542
_wasmWorkerRunPostMessage http://localhost:6931/repro.js:1568
(Async: EventListener.handleEvent)
_wasmWorkerInitializeRuntime http://localhost:6931/repro.js:1616
initRuntime http://localhost:6931/repro.js:787
run http://localhost:6931/repro.js:3063
removeRunDependency http://localhost:6931/repro.js:889
receiveInstance http://localhost:6931/repro.js:1061
wasmModuleReceived http://localhost:6931/repro.js:1107
startWasmWorker http://localhost:6931/repro.js:643
onmessage http://localhost:6931/repro.js:680
(Async: EventHandlerNonNull)
http://localhost:6931/repro.js:673

Note that the above programs will periodically produce EITHER or BOTH of these, varying randomly from run to run.

One important point is that the abort site is not stable. Depending on the exact repro shape and timing, the failure can surface
during either allocation or deallocation. In the larger original program this showed up variously on malloc, free, operator new,
operator delete, and other ordinary STL/string/container allocation sites. In other words, it does not look like one specific call
is “the bug”; it looks more like some allocator/runtime corruption or invalid state is reached first, and the crash is only detected
later at a normal heap operation.

Other observed characteristics:

  • the crash is not instantaneous; it happens after a little runtime/allocator churn
  • it requires -pthread even though no pthread features are used in the example
  • it does not require any real application logic, WebGPU, GUI, or shared data structures
  • increasing allocation size reduces the amount of iteration needed before the abort, which suggests the issue scales with allocator churn / bytes processed rather than with one specific code path

So the failure pattern is: ordinary heap traffic on the main thread plus repeated heap traffic on a wasm worker, under -pthread + -sWASM_WORKERS, eventually leads to a browser abort in 4.0.11, even though the same code remains stable on 4.0.10.

Impact and discovery

This is currently impacting real-life applications using WebGPU - after Emscripten's WebGPU internal support is dropped in favour of --use-port=emdawnwebgpu, a -pthread requirement has been added when using with WASM_WORKERS (because emdawnwebgpu requires mutex features). So unfortunately it's not just an obscure corner case. Code that worked on 4.0.10 now crashes on 4.0.11 upward.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions