Skip to content

[QDP] Streaming & Large-Data Support for All Encodings — Roadmap #993

@400Ping

Description

@400Ping

Summary

This roadmap defines the implementation path for streaming and large-data support across all QDP encodings: adding IQP / IQP-Z to the Parquet streaming pipeline, introducing additional input formats (e.g. chunked NumPy, HDF5), and completing documentation and baselines so that “encode from file” is a first-class workflow. It is scoped to be comparable in impact to Pipeline Tuning #969, but focuses on feature coverage and input ecosystem rather than pipeline performance.

Motivation

  • Gap: encode_from_parquet() currently supports only amplitude, angle, and basis. IQP and IQP-Z have kernels and in-memory encode() / encode_batch(), but no streaming path from Parquet or other large files.
  • Goal: Enable all encodings (including IQP) to use the existing dual-stream pipeline from Parquet and, in later phases, from other large-data sources.
  • Non-overlap with [QDP] User Python API interface features and refactoring — Roadmap #969: addresses pipeline performance (observability, chunk/pool tuning, event-based buffer reuse). This roadmap addresses who can run and from where data is read (streaming encodings + input formats + docs).

Phase 1: IQP / IQP-Z streaming encoding

Deliverables

  • Support encode_from_parquet(path, num_qubits, "iqp" | "iqp-z") so that large Parquet files are processed through the existing dual-stream pipeline.
  • Unit and integration tests; optional small throughput benchmark for Parquet + IQP.

Implementation outline

  1. Add IQP ChunkEncoder in qdp-core/src/encoding/ (following the pattern of amplitude/angle/basis):
    • Implement ChunkEncoder: validate_sample_size, needs_staging_copy, init_state, encode_chunk.
    • IQP full: sample_size = num_qubits + num_qubits*(num_qubits-1)/2; IQP-Z: num_qubits.
    • Reuse kernel calls and length checks from qdp-core/src/gpu/encodings/iqp.rs.
  2. Wire into encode_from_parquet() in encoding/mod.rs: add branches for "iqp" and "iqp-z" calling stream_encode with the appropriate IQP encoder variant.
  3. Tests: Reuse logic from tests/iqp_encoding.rs; add integration test that reads a small Parquet file and runs stream encode for IQP/IQP-Z.

Key files: qdp-core/src/encoding/mod.rs, new or extended encoding/iqp.rs (streaming), qdp-core/src/gpu/encodings/iqp.rs (existing), qdp-core/tests/.


Phase 2: Additional input formats (streaming readers)

Deliverables

  • At least one large-data–friendly streaming reader implemented and plugged into the encoding pipeline.
  • Candidates (from readers README Future Enhancements): chunked NumPy (large .npy), or HDF5.

Implementation outline

  1. Implement a new reader satisfying StreamingDataReader in qdp-core/src/reader.rs (read_chunk(&mut self, buffer: &mut [f64]) -> Result<usize>).
  2. Integrate with encoding: Either extend encode_from_* to accept the new reader (or select by path/extension) so that stream_encode can consume data from the new source.
  3. Tests and docs: Unit tests for the new reader; at least one end-to-end test (e.g. amplitude or IQP from the new format). Update qdp/docs/readers/README.md.

Key files: qdp-core/src/readers/, qdp-core/src/reader.rs, qdp/docs/readers/README.md.


Phase 3: Baselines and documentation

Deliverables

  • Reproducible throughput description or benchmark flow for “large file + all encodings (including IQP)”.
  • Complete Getting Started and Examples for QDP (currently TODO in the docs), making “encode from file” a first-class documented workflow.

Implementation outline

  1. Benchmark: Define and document a small workflow (e.g. in qdp-python/benchmark/ or qdp/docs/) for Parquet + amplitude/angle/basis/iqp; align with [QDP] User Python API interface features and refactoring — Roadmap #969 Phase 2 baseline methodology where useful.
  2. Docs:
    • Getting Started: Install, minimal example, typical encode / encode_from_parquet usage (including IQP).
    • Examples: 2–3 full examples (e.g. in-memory amplitude, Parquet + IQP, DLPack → PyTorch).
    • Optionally: short API summary in the QDP API doc.
  3. Relationship to [QDP] User Python API interface features and refactoring — Roadmap #969: Reuse Phase 2 observability/baseline flow if available, to avoid duplicate tooling.

Phase order and dependencies

  • Phase 1 is independent; only depends on the current pipeline and IQP kernels.
  • Phase 2 builds on the same stream_encode interface (can be parallelized with Phase 1 once reader integration is agreed).
  • Phase 3 can be done in parallel with Phase 1/2; the “large file + IQP” benchmark is most meaningful after Phase 1 is merged.

Suggested order: Land Phase 1 first, then Phase 2; Phase 3 docs can start early, with benchmark steps finalized after Phase 1.


Alternatives considered

  • Only document current behavior: Does not address the missing IQP streaming path or additional formats.
  • Single big PR: Phased approach allows incremental review and reduces risk.

Additional context

  • IQP kernel and GPU encoding already exist: qdp-kernels/src/iqp.cu, qdp-core/src/gpu/encodings/iqp.rs, and qdp-core/tests/iqp_encoding.rs.
  • Streaming pipeline and ChunkEncoder are in qdp-core/src/encoding/ (amplitude, angle, basis); encode_from_parquet is in encoding/mod.rs.
  • Readers design: qdp/docs/readers/README.md.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions