Skip to content

Add TSan nightly CI workflow#2402

Open
cuihtlauac wants to merge 2 commits intomirage:eiofrom
cuihtlauac:tsan-ci
Open

Add TSan nightly CI workflow#2402
cuihtlauac wants to merge 2 commits intomirage:eiofrom
cuihtlauac:tsan-ci

Conversation

@cuihtlauac
Copy link
Copy Markdown

Summary

Adds a nightly GitHub Actions workflow that runs the Irmin test suite under ThreadSanitizer (TSan) to hunt for data races on the eio branch.

  • .github/workflows/tsan.yml — nightly cron (0 3 * * *) + workflow_dispatch + opt-in PR tsan label. Ubuntu-latest bare runner, 75-min timeout, opam cache keyed on the TSan compiler spec. Runs dune build @runtest @tsan-stress under TSan with halt_on_error=0 exitcode=66 so a single run surfaces every race. Uploads tsan-report.* + tsan-run.log as artifact; writes a finding count to the job summary.
  • test/irmin-pack/tsan_suppressions.txt — minimal runtime/FFI noise filter (caml_modify, caml_alloc_shr, called_from_lib:index). No Irmin-module entries by design.
  • Existing tests become env-var scalable: IRMIN_STM_ITER (default 500), IRMIN_STM_PACK_ITER (default 100), IRMIN_MULTICORE_DOMAINS (default 2), IRMIN_MULTICORE_ITER (default 1). All backward-compatible; dune runtest in regular CI is unchanged.
  • test/irmin-pack/test_tsan_stress/ ships as an empty dispatcher on a separate @tsan-stress alias so the workflow has a stable target. The five per-hotspot stress scenarios (dict refill, irmin_mem cache, watch globals, Irmin_fs_unix pool, append_only_file buffer) land in a follow-up PR.

Setup recipe is the one validated by the TSan PoC: sudo sysctl vm.mmap_rnd_bits=28, libunwind-dev, ocaml-variants.5.3.0+options,ocaml-option-tsan, opam install dune explicitly. All four Irmin pin-depends build cleanly under TSan (validated here).

No src/ changes — this adds detection only. Expected first-run outcome: a nonzero number of races reported, matching the race patterns fixed in #2397. That output is the baseline for follow-up fix PRs.

Context: part of the Irmin 4.0.0 release roadmap, #2401 Phase 6.

Test plan

  • Add tsan label to this PR → workflow fires → setup + switch build + cache + dune build steps all succeed.
  • Artifact tsan-reports-<run_id> uploaded on completion, step summary shows ### TSan findings: <N>.
  • Merge → first scheduled nightly run produces a baseline report.
  • Verify default dune runtest behavior on non-TSan CI jobs is unchanged (env vars absent → existing defaults).

🤖 Generated with Claude Code

New .github/workflows/tsan.yml runs the test suite under
ThreadSanitizer on a nightly cron (plus manual and opt-in `tsan`
label on PRs). Runs the default suite alongside a new @tsan-stress
alias, with halt_on_error=0 so a single run surfaces every race
TSan observes. Reports are uploaded as a workflow artifact.

The existing multicore and QCheck-STM tests become scalable via
env vars: IRMIN_STM_ITER, IRMIN_STM_PACK_ITER, IRMIN_MULTICORE_DOMAINS,
IRMIN_MULTICORE_ITER. Defaults match prior behaviour, so normal
`dune runtest` is unchanged.

The @tsan-stress alias (test/irmin-pack/test_tsan_stress/) ships as
an empty dispatcher; per-hotspot scenarios (dict refill, irmin_mem
cache, watch globals, fs pool, append_only_file buffer) land in a
follow-up PR.

This adds detection only; no src/ changes. Expect the first nightly
run to surface several known races from mirage#2397 — that output is the
baseline for follow-up fixes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
dune @fmt / ocaml-ci's lint-fmt rejected two short match/if
bindings that fit on a single line. No semantic change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant