All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
1.0.0 - 2025-04-16
This is the first stable release of GPU SpMV, featuring complete CSR and ELL format support, four optimized CUDA kernels with automatic selection, and production-ready engineering quality.
- CSR (Compressed Sparse Row) sparse matrix format with full operations
- ELL (ELLPACK) sparse matrix format with column-major GPU-optimized storage
- Four CUDA Kernels: Scalar CSR, Vector CSR, Merge Path, ELL Kernel
- Automatic kernel selection based on matrix statistics (avg_nnz, skewness)
- Texture cache support with
SpMVExecutionContextfor object reuse - RAII resource management:
CudaBuffer<T>,CudaTimer,ScopedTexture - Semantic error codes:
SpMVErrorenum with descriptive error messages
- Bandwidth metrics calculation with GPU peak bandwidth detection
- Comprehensive benchmarking framework with warmup runs and statistical analysis
- GPU vs CPU performance comparison with speedup metrics
- JSON export for benchmark results
- PageRank algorithm with GPU-accelerated iterative computation
- Configurable damping factor and convergence tolerance
- Top-K node ranking extraction
- CMake Presets for easy Debug/Release builds
- CPU-only configuration option for development environments
- Cross-platform support (Windows/Linux)
- Complete Google Test test suite with property-based testing
- GitHub Actions CI/CD with format checking
- Doxygen-compatible documentation
- Full documentation site at https://lessup.github.io/gpu-spmv/
- Bilingual README (English and Chinese)
- API reference, performance guide, and code examples
- Architecture documentation and design decision records
- Integer overflow protection in size calculations
- Memory bounds checking in matrix operations
- ELL Column-major storage for fully coalesced memory access
- Warp-level shuffle reduction avoiding shared memory bank conflicts
- Merge Path algorithm for perfect load balancing on irregular matrices
- Automatic texture cache for large input vectors (>10000 elements)
0.1.0 - 2025-03-01
- Basic project structure
- Initial CSR matrix implementation
- Simple SpMV GPU kernel
- CMake build configuration
| Version | Date | Status | Highlights |
|---|---|---|---|
| 1.0.0 | 2025-04-16 | Stable | First stable release with complete feature set |
| 0.1.0 | 2025-03-01 | Archived | Initial prototype |
No breaking changes from pre-release versions. The API is now stable.
-
Use named constants instead of magic numbers:
// Before config.block_size = 256; config.use_texture = (cols > 10000); // After (recommended) config.block_size = spmv::DEFAULT_BLOCK_SIZE; config.use_texture = (cols > spmv::TEXTURE_CACHE_THRESHOLD_COLS);
-
Use
SpMVExecutionContextfor texture object reuse:// Before: Texture created/destroyed each call for (int i = 0; i < iterations; i++) { spmv_csr(csr, d_x, d_y, &config, cols); } // After: Reuse texture across calls SpMVExecutionContext context; for (int i = 0; i < iterations; i++) { spmv_csr(csr, d_x, d_y, &config, cols, &context); }
-
Check error codes consistently:
SpMVResult result = spmv_csr(csr, d_x, d_y, &config, cols); if (result.error_code != static_cast<int>(SpMVError::SUCCESS)) { std::cerr << "Error: " << spmv_error_string( static_cast<SpMVError>(result.error_code)) << std::endl; }
- COO (Coordinate) format support
- Hybrid CSR/ELL format
- Multi-GPU support
- Batched SpMV operations
- Double precision support
- BFloat16 precision support
- Automatic format selection tuning
- Integration with cuSPARSE for comparison
- Python bindings