[QDP] [Feature] Multi-GPU Data-Parallel Encoding for Scalable Quantum State Preparation

### Summary

Propose adding **multi-GPU data-parallel encoding** to QDP so users can scale quantum state preparation across multiple GPUs. Currently, `QdpEngine` only supports a single `device_id`, which limits throughput for large batches and high qubit counts (e.g., 20+ qubits).

### Motivation

- **Current limitation**: `QdpEngine::new(device_id: usize)` accepts only one GPU ([qdp-core/src/lib.rs](https://github.com/apache/mahout/blob/main/qdp/qdp-core/src/lib.rs)).
- **Use case**: High-qubit encoding (20+ qubits) and large batches hit single-GPU memory and compute limits. Multi-GPU parallel encoding can increase throughput proportionally.
- **Alignment with PR #1000**: The Quantum Data Loader (PR #1000) provides batch-by-batch iteration; multi-GPU support would allow batches to be distributed across GPUs for an end-to-end, high-throughput pipeline.

### Proposed Design

- **Batch routing**: Distribute batches across GPUs (e.g., round-robin or workload-aware).
- **Result aggregation**: Merge outputs from each GPU into a single DLPack tensor (or keep a distributed representation for downstream use).
- **Stream management**: Each GPU uses its own CUDA stream to avoid synchronization bottlenecks.

### Scope

**qdp-core (Rust)**
- Add a multi-GPU engine abstraction (e.g., `QdpEnginePool`) to manage multiple `QdpEngine` instances.
- Implement `encode_batch_distributed` to split batches across GPUs or assign different batches to different GPUs.
- Use `rayon` or `std::thread` for CPU-side coordination.

**qdp-python**
- Expose a new API (e.g., `QdpEngine(device_ids=[0, 1, 2])` or `MultiGpuEngine`).
- Integrate with the Quantum Data Loader once PR #1000 is merged.
- Preserve backward compatibility when a single device is specified.

### Non-Goals (out of scope)

- Multi-GPU model parallelism or tensor parallelism within a single encoding operation.
- Automatic GPU selection or load balancing in the first version (can be added later).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QDP] [Feature] Multi-GPU Data-Parallel Encoding for Scalable Quantum State Preparation #1001

Summary

Motivation

Proposed Design

Scope

Non-Goals (out of scope)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[QDP] [Feature] Multi-GPU Data-Parallel Encoding for Scalable Quantum State Preparation #1001

Description

Summary

Motivation

Proposed Design

Scope

Non-Goals (out of scope)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions