Refactoring all test scripts to pytests and fixing CUDA graphs compatibility#13
Refactoring all test scripts to pytests and fixing CUDA graphs compatibility#13akshaysubr merged 4 commits intomainfrom
Conversation
- Convert all test scripts to pytest format with fixtures and assertions - Add conftest.py with shared fixtures (device, nside, dtype, tolerances) - Parameterize all SHT tests with both float32 and float64 dtypes - Add tolerance helpers: get_impl_tol(), get_roundtrip_tol(), get_bluestein_tol() - Add pytest configuration to pyproject.toml - Move profiling scripts to benchmarks/ and examples/ directories - Remove obsolete test_sht_two_stream_overlap_profiling.py Signed-off-by: Akshay Subramaniam <6964110+akshaysubr@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR modernizes the cuHPX test suite and adds critical CUDA graph support for production workloads.
Changes:
- Refactored all test scripts from interactive input-based scripts to pytest-based unit tests with proper fixtures and parameterization
- Added CUDA graph compatibility by fixing stream handling and pre-allocating buffers in SHTCUDA/iSHTCUDA operations
- Fixed segmentation fault caused by dangling CUDA stream references in HealpixFFT/HealpixIFFT classes
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_sht_two_stream_overlap_profiling.py | Removed old profiling test (replaced by benchmark script) |
| tests/test_sht_cuda_stream.py | Converted to pytest with CUDA stream validation tests |
| tests/test_sht_cuda_batch.py | Converted to pytest with batched operation tests |
| tests/test_sht_cuda.py | Converted to pytest with CUDA vs PyTorch comparison tests |
| tests/test_sht_bluestein.py | Converted to pytest with Bluestein algorithm tests |
| tests/test_regridding.py | Converted to pytest with grid regridding tests |
| tests/test_harmonic_transform.py | Converted to pytest with SHT/iSHT roundtrip tests |
| tests/test_grad.py | Converted to pytest with gradient flow tests |
| tests/test_differentiability.py | Converted to pytest with gradient-based optimization tests |
| tests/test_data_remapping.py | Converted to pytest with ring/nest remapping tests |
| tests/test_cuda_graphs.py | New file with comprehensive CUDA graph capture tests |
| tests/test_batch_remap.py | Converted to pytest with batched remapping tests |
| tests/conftest.py | New pytest configuration with shared fixtures and tolerance helpers |
| src/harmonic_transform/hpx_fft.h | Fixed stream handling and added pre-allocated buffers for CUDA graphs |
| src/harmonic_transform/hpx_fft.cpp | Updated to use pre-allocated buffers and in-place operations |
| pyproject.toml | Added pytest configuration and test dependencies |
| examples/wmap_optimization.py | New example demonstrating differentiable SHT on WMAP data |
| cuhpx/hpx_sht.py | Fixed stream handling and pre-computed weights for CUDA graph compatibility |
| benchmarks/stream_overlap_profiling.py | New benchmark for profiling stream overlap |
| benchmarks/sht_profiling.py | New benchmark for profiling SHT operations |
Comments suppressed due to low confidence (1)
src/harmonic_transform/hpx_fft.h:1
- The condition
n_ < nonly triggers reconfiguration when the new batch size is larger. Ifnbecomes smaller (e.g., from 8 to 4), no reconfiguration occurs, but the pre-allocated buffers sized for n_=8 will be reused. While the code uses slicing to handle this (line 77), the comment on line 133 is misleading - it should clarify that buffers are sized for the maximum batch size seen, not just 'when batch grows larger'.
/*
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
I ran the pytest suite on my end and observed 3 test failures. It seems that these failures occur with precision checks and are related to numerical tolerances thresholds being set too tightly. Maybe we can try the following to fix?
|
|
@xiaopoc Thanks for the review and independent testing. What GPU, CUDA version, etc were you testing with? |
xiaopoc
left a comment
There was a problem hiding this comment.
All tests passed. The PR looks good to me, so approved it.
This PR includes three major improvements to cuHPX: