Skip to content

[QDP] [Feature] Multi-GPU Data-Parallel Encoding for Scalable Quantum State Preparation #1001

@400Ping

Description

@400Ping

Summary

Propose adding multi-GPU data-parallel encoding to QDP so users can scale quantum state preparation across multiple GPUs. Currently, QdpEngine only supports a single device_id, which limits throughput for large batches and high qubit counts (e.g., 20+ qubits).

Motivation

Proposed Design

  • Batch routing: Distribute batches across GPUs (e.g., round-robin or workload-aware).
  • Result aggregation: Merge outputs from each GPU into a single DLPack tensor (or keep a distributed representation for downstream use).
  • Stream management: Each GPU uses its own CUDA stream to avoid synchronization bottlenecks.

Scope

qdp-core (Rust)

  • Add a multi-GPU engine abstraction (e.g., QdpEnginePool) to manage multiple QdpEngine instances.
  • Implement encode_batch_distributed to split batches across GPUs or assign different batches to different GPUs.
  • Use rayon or std::thread for CPU-side coordination.

qdp-python

Non-Goals (out of scope)

  • Multi-GPU model parallelism or tensor parallelism within a single encoding operation.
  • Automatic GPU selection or load balancing in the first version (can be added later).

Metadata

Metadata

Assignees

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions