Skip to content

Illegal memory access in SDF texture handling on multi-GPU system #2114

@eric-heiden

Description

@eric-heiden

Summary

The first-ever weekly GPU test run (#23106881379, 2026-03-15T08:40Z) failed with 27 errors and 1 failure.

The dominant issue is a Warp CUDA error 700 that poisons the CUDA context, cascading into 27 downstream test errors. There is also a separate, unrelated simulation failure.

Root Cause: Warp CUDA error 700

An illegal memory access in warp/native/warp.cu corrupts the CUDA context. All subsequent tests in the same runner process then error out with CUDA context errors.

Warp CUDA error 700: an illegal memory access was encountered
(wp_cuda_context_synchronize, wp_cuda_context_set_stream, wp_memcpy_d2h)

Errors (27) — CUDA context cascade

Test group Count Devices
test_sdf_texture.* 17 cuda_0 + cuda_1
test_viewer_world_offsets.* 10 cuda_0 + cuda_1

These are not independent failures — they are downstream victims of the CUDA 700 context corruption.

Failure (1) — separate issue

test_robot.example_robot_panda_hydro_cuda_0

AssertionError: World 0: Object is not in the cup.
Object pos=(0.324, -0.585, 0.005), cup pos=(0.130, -0.500, 0.150)

The panda hydroelastic robot example simulation diverged — the object ends up far from the cup. This is not related to CUDA error 700; it is a physics/simulation regression.

Steps to Investigate

  • Identify which test triggers the initial CUDA 700 error (likely the first test_sdf_texture test)
  • Check if test_sdf_texture allocates/accesses GPU memory out of bounds
  • Run test_sdf_texture in isolation on a GPU runner to reproduce
  • Investigate example_robot_panda_hydro simulation divergence separately

Notes

  • This was the first-ever weekly GPU run, so there is no historical baseline to compare against.
  • The CUDA 700 cascade is the critical issue to resolve first — it may mask other real failures.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions