-
Notifications
You must be signed in to change notification settings - Fork 326
Description
Summary
The first-ever weekly GPU test run (#23106881379, 2026-03-15T08:40Z) failed with 27 errors and 1 failure.
The dominant issue is a Warp CUDA error 700 that poisons the CUDA context, cascading into 27 downstream test errors. There is also a separate, unrelated simulation failure.
Root Cause: Warp CUDA error 700
An illegal memory access in warp/native/warp.cu corrupts the CUDA context. All subsequent tests in the same runner process then error out with CUDA context errors.
Warp CUDA error 700: an illegal memory access was encountered
(wp_cuda_context_synchronize, wp_cuda_context_set_stream, wp_memcpy_d2h)
Errors (27) — CUDA context cascade
| Test group | Count | Devices |
|---|---|---|
test_sdf_texture.* |
17 | cuda_0 + cuda_1 |
test_viewer_world_offsets.* |
10 | cuda_0 + cuda_1 |
These are not independent failures — they are downstream victims of the CUDA 700 context corruption.
Failure (1) — separate issue
test_robot.example_robot_panda_hydro_cuda_0
AssertionError: World 0: Object is not in the cup.
Object pos=(0.324, -0.585, 0.005), cup pos=(0.130, -0.500, 0.150)
The panda hydroelastic robot example simulation diverged — the object ends up far from the cup. This is not related to CUDA error 700; it is a physics/simulation regression.
Steps to Investigate
- Identify which test triggers the initial CUDA 700 error (likely the first
test_sdf_texturetest) - Check if
test_sdf_textureallocates/accesses GPU memory out of bounds - Run
test_sdf_texturein isolation on a GPU runner to reproduce - Investigate
example_robot_panda_hydrosimulation divergence separately
Notes
- This was the first-ever weekly GPU run, so there is no historical baseline to compare against.
- The CUDA 700 cascade is the critical issue to resolve first — it may mask other real failures.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status