Skip to content

base: make heavy tests pass under sanitizers; fix latch_test data race#206

Merged
chen3feng merged 2 commits into
masterfrom
base-tests-under-sanitizers
Jun 27, 2026
Merged

base: make heavy tests pass under sanitizers; fix latch_test data race#206
chen3feng merged 2 commits into
masterfrom
base-tests-under-sanitizers

Conversation

@chen3feng

Copy link
Copy Markdown
Collaborator

Running flare/base/... under --sanitizer=thread surfaced one test-side data race and three tests too heavy to finish under instrumentation. (The DPC library race is fixed separately in #205.)

Changes

latch_test — real data race (fixed for all builds). TEST(Latch, Torture) runs RunTest on 10 threads, each ending with std::cout << ...; std::cout isn't safe for concurrent formatted output, so the threads race its buffer (TSan: race in __pad_and_output). Serialized with a mutex.

spinlock_test — gated scaling. 100 threads × 100k contended lock cycles: under a sanitizer each contended acquire spins an unbounded, instrumented number of times, so the run takes many minutes (was literally unrunnable — 360s+). The brute-force volume is the race detector in a non-sanitized build, so it's kept at full strength there and scaled to 8 × 2k only under ASan/TSan — following flare's existing scheduling_group_test pattern (FLARE_INTERNAL_USE_ASAN/TSAN).

compression_test / view_test — unconditional shrink. Their 10 MiB workloads (LargeSize, VariantSize's ~9.5k-size sweep, the noncontiguous-buffer searches) are slow to instrument and time out under parallel sanitizer runs. Unlike spinlock these are large-data tests, not brute-force race hunts, so they're simply shrunk everywhere to sizes that still exercise the multi-block / cross-block paths (512 KiB, ~1 MiB, a coarser size sweep). The 2 GiB extreme stays covered by DISABLED_HugeSize.

Verification

  • flare/base/... --sanitizer=thread: all green (93/93), including parallel execution
  • normal release build of all touched tests: pass (full-size workloads)

Running `flare/base/...` under `--sanitizer=thread` surfaced one test-side data
race and three tests too heavy to finish under instrumentation:

- latch_test: `TEST(Latch, Torture)` runs `RunTest` on 10 threads, each ending
  with `std::cout << ...`; concurrent formatted output races the stream buffer.
  Serialize that line with a mutex.

- spinlock_test: 100 threads x 100k contended lock cycles -- under a sanitizer
  each contended acquire spins an unbounded, instrumented number of times, so
  the run takes many minutes (well past the test timeout). The brute-force
  volume is the race detector in a non-sanitized build, so keep it there and
  scale down (8 threads x 2k) only under ASan/TSan, following the existing
  `scheduling_group_test` pattern (FLARE_INTERNAL_USE_ASAN/TSAN).

- compression_test / view_test: their 10 MiB workloads (LargeSize, VariantSize's
  ~9.5k-size sweep, the noncontiguous-buffer searches) are slow to instrument
  and time out under parallel sanitizer runs. These are large-data, not
  brute-force-race, tests, so just shrink them unconditionally to sizes that
  still exercise the multi-block / cross-block paths (512 KiB, ~1 MiB, a coarser
  size sweep). The 2 GiB extreme stays covered by DISABLED_HugeSize.

Verified: `flare/base/... --sanitizer=thread` is all green; the normal build is
unaffected (full-size workloads still run there, except the unconditionally
shrunk compression/view).
spinlock_test.cc now includes flare/base/internal/annotation.h (for the
ASan/TSan workload guard); the blade BUILD got the dep but the Bazel build
needs it too.
@chen3feng chen3feng merged commit 6504d13 into master Jun 27, 2026
10 checks passed
@chen3feng chen3feng deleted the base-tests-under-sanitizers branch June 27, 2026 04:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant