[QDP] Add zero-copy amplitude encoding from float32 GPU tensors by viiccwen · Pull Request #999 · apache/mahout

viiccwen · 2026-01-31T08:11:21Z

Purpose of PR

This PR adds encode_from_gpu_ptr_f32 and encode_from_gpu_ptr_f32_with_stream to QdpEngine, enabling zero-copy amplitude encoding from float32 GPU pointers. It relies on the existing GpuStateVector Float32 support and the launch_amplitude_encode_f32 / launch_l2_norm_f32 kernels.

Summary

New APIs (Linux/CUDA):
- QdpEngine::encode_from_gpu_ptr_f32(input_d, input_len, num_qubits) — amplitude encoding from GPU f32 pointer using the default stream.
- QdpEngine::encode_from_gpu_ptr_f32_with_stream(..., stream) — same with an explicit CUDA stream (null = default).
GPU pointer validation: All encode_from_gpu_ptr and encode_batch_from_gpu_ptr paths now validate the input pointer via cudaPointerGetAttributes (non-null, device/managed memory, same device as engine). Early checks for empty input / sample size are added where missing.
Amplitude encoder: calculate_inv_norm_gpu_f32 is refactored to call a new calculate_inv_norm_gpu_f32_with_stream; the stream-aware variant is used for f32 GPU encoding and is synchronized before host copy.
Python binding: get_torch_cuda_stream_ptr returns null when PyTorch reports stream pointer 0 (default stream) instead of raising, so default-stream usage is supported.
Tests: New Rust tests for encode_from_gpu_ptr_f32 and encode_from_gpu_ptr_f32_with_stream (default and non-default stream, f32/f64 engine, empty input, input length > state size). Existing GPU pointer tests updated to use valid GPU pointers so they still hit the intended error paths after the new validation. Basis tests use Float64 engine. Test helper create_test_data_f32 added in tests/common.
Python tests: Redundant in-function import torch removed; test_encode_cuda_tensor_unsupported_encoding parametrized with iqp / iqp-z and error message match updated to the current supported list.

## BEFORE: f64 path (old API — user had to convert f32→f64 to use this)
## AFTER: f32 path (new API — zero-copy from PyTorch f32 CUDA tensors)

Encode-from-GPU-pointer benchmark: 16 qubits, 200 iterations, state_len=65536
  BEFORE (encode_from_gpu_ptr f64): 0.0838 ms/encode, 11933.2 encodes/s
  AFTER  (encode_from_gpu_ptr_f32): 0.0632 ms/encode, 15820.1 encodes/s
  Speedup (f32 vs f64 path): 1.33x

Related Issues or PRs

closes #996 , also a follow-up PR for #995

Changes Made

Breaking Changes

Yes
No

Checklist

Added or updated unit tests for all changes
Added or updated documentation for all changes
Successfully built and ran all unit tests or manual tests locally
PR title follows "MAHOUT-XXX: Brief Description" format (if related to an issue)
Code follows ASF guidelines

guan404ming · 2026-02-01T03:43:16Z

Design looks nice to me. Could you help attach before and after benchmark on your local machine to prove your improvement works? Thanks!

viiccwen · 2026-02-01T15:42:05Z

@guan404ming got it!

## BEFORE: f64 path (old API — user had to convert f32→f64 to use this)
## AFTER: f32 path (new API — zero-copy from PyTorch f32 CUDA tensors)

Encode-from-GPU-pointer benchmark: 16 qubits, 200 iterations, state_len=65536
  BEFORE (encode_from_gpu_ptr f64): 0.0838 ms/encode, 11933.2 encodes/s
  AFTER  (encode_from_gpu_ptr_f32): 0.0632 ms/encode, 15820.1 encodes/s
  Speedup (f32 vs f64 path): 1.33x

guan404ming · 2026-02-01T17:50:40Z

Looks nice, please help resolve the conflicts. Thanks!

viiccwen · 2026-02-02T03:37:53Z

cc @guan404ming @rich7420
updated PR description

guan404ming · 2026-02-02T04:27:43Z

Looks nice for the change.
Sorry about that but conflict comes again.

…ith stream support

…onversion for both Float32 and Float64

… encoding methods

…sor encoding methods

rich7420 · 2026-02-03T03:34:20Z

@viiccwen thanks for the update!
I think we could add a test like e.g. test_encode_from_gpu_ptr_f32_null_pointer or somthing that calls encode_from_gpu_ptr_f32(std::ptr::null(), len, qubits) and asserts Err(MahoutError::InvalidInput(msg)) with msg.contains("null").
then overall lg

… for Float32

rich7420

LGTM

viiccwen · 2026-02-04T07:39:33Z

Thx all!

rich7420 added this to the Qumat 0.6.0 milestone Jan 31, 2026

viiccwen force-pushed the feature/encode-from-gpu-ptr-f32 branch from b6dec66 to eaad56a Compare January 31, 2026 16:17

viiccwen force-pushed the feature/encode-from-gpu-ptr-f32 branch from 9bad00b to eaad56a Compare February 1, 2026 15:54

viiccwen force-pushed the feature/encode-from-gpu-ptr-f32 branch from eaad56a to 746635e Compare February 2, 2026 03:27

viiccwen added 5 commits February 2, 2026 11:30

feat: add float32 GPU pointer encoding and inverse norm calculation w…

c256724

…ith stream support

refactor: streamline GPU state vector encoding to support precision c…

f0d5f13

…onversion for both Float32 and Float64

test: add test file for GPU pointer encoding with Float32 precision

5d48d81

refactor: improve GPU pointer validation and update documentation for…

194cf68

… encoding methods

test: update unsupported encoding test to reflect changes in CUDA ten…

07fcb2f

…sor encoding methods

viiccwen force-pushed the feature/encode-from-gpu-ptr-f32 branch from 746635e to 07fcb2f Compare February 2, 2026 16:12

test: add unit test for handling null pointer in GPU pointer encoding…

9732910

… for Float32

rich7420 approved these changes Feb 3, 2026

View reviewed changes

guan404ming approved these changes Feb 4, 2026

View reviewed changes

guan404ming merged commit 42da30d into apache:main Feb 4, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QDP] Add zero-copy amplitude encoding from float32 GPU tensors#999

[QDP] Add zero-copy amplitude encoding from float32 GPU tensors#999
guan404ming merged 6 commits intoapache:mainfrom
viiccwen:feature/encode-from-gpu-ptr-f32

viiccwen commented Jan 31, 2026 •

edited

Loading

Uh oh!

guan404ming commented Feb 1, 2026

Uh oh!

viiccwen commented Feb 1, 2026 •

edited

Loading

Uh oh!

guan404ming commented Feb 1, 2026

Uh oh!

viiccwen commented Feb 2, 2026

Uh oh!

guan404ming commented Feb 2, 2026

Uh oh!

rich7420 commented Feb 3, 2026

Uh oh!

rich7420 left a comment

Uh oh!

Uh oh!

viiccwen commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

viiccwen commented Jan 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose of PR

Summary

Related Issues or PRs

Changes Made

Breaking Changes

Checklist

Uh oh!

guan404ming commented Feb 1, 2026

Uh oh!

viiccwen commented Feb 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

guan404ming commented Feb 1, 2026

Uh oh!

viiccwen commented Feb 2, 2026

Uh oh!

guan404ming commented Feb 2, 2026

Uh oh!

rich7420 commented Feb 3, 2026

Uh oh!

rich7420 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

viiccwen commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

viiccwen commented Jan 31, 2026 •

edited

Loading

viiccwen commented Feb 1, 2026 •

edited

Loading