-
Notifications
You must be signed in to change notification settings - Fork 979
Description
Summary
This roadmap defines the implementation path for streaming and large-data support across all QDP encodings: adding IQP / IQP-Z to the Parquet streaming pipeline, introducing additional input formats (e.g. chunked NumPy, HDF5), and completing documentation and baselines so that “encode from file” is a first-class workflow. It is scoped to be comparable in impact to Pipeline Tuning #969, but focuses on feature coverage and input ecosystem rather than pipeline performance.
Motivation
- Gap:
encode_from_parquet()currently supports onlyamplitude,angle, andbasis. IQP and IQP-Z have kernels and in-memoryencode()/encode_batch(), but no streaming path from Parquet or other large files. - Goal: Enable all encodings (including IQP) to use the existing dual-stream pipeline from Parquet and, in later phases, from other large-data sources.
- Non-overlap with [QDP] User Python API interface features and refactoring — Roadmap #969: addresses pipeline performance (observability, chunk/pool tuning, event-based buffer reuse). This roadmap addresses who can run and from where data is read (streaming encodings + input formats + docs).
Phase 1: IQP / IQP-Z streaming encoding
Deliverables
- Support
encode_from_parquet(path, num_qubits, "iqp" | "iqp-z")so that large Parquet files are processed through the existing dual-stream pipeline. - Unit and integration tests; optional small throughput benchmark for Parquet + IQP.
Implementation outline
- Add IQP
ChunkEncoderinqdp-core/src/encoding/(following the pattern of amplitude/angle/basis):- Implement
ChunkEncoder:validate_sample_size,needs_staging_copy,init_state,encode_chunk. - IQP full:
sample_size = num_qubits + num_qubits*(num_qubits-1)/2; IQP-Z:num_qubits. - Reuse kernel calls and length checks from
qdp-core/src/gpu/encodings/iqp.rs.
- Implement
- Wire into
encode_from_parquet()inencoding/mod.rs: add branches for"iqp"and"iqp-z"callingstream_encodewith the appropriate IQP encoder variant. - Tests: Reuse logic from
tests/iqp_encoding.rs; add integration test that reads a small Parquet file and runs stream encode for IQP/IQP-Z.
Key files: qdp-core/src/encoding/mod.rs, new or extended encoding/iqp.rs (streaming), qdp-core/src/gpu/encodings/iqp.rs (existing), qdp-core/tests/.
Phase 2: Additional input formats (streaming readers)
Deliverables
- At least one large-data–friendly streaming reader implemented and plugged into the encoding pipeline.
- Candidates (from readers README Future Enhancements): chunked NumPy (large
.npy), or HDF5.
Implementation outline
- Implement a new reader satisfying
StreamingDataReaderinqdp-core/src/reader.rs(read_chunk(&mut self, buffer: &mut [f64]) -> Result<usize>). - Integrate with encoding: Either extend
encode_from_*to accept the new reader (or select by path/extension) so thatstream_encodecan consume data from the new source. - Tests and docs: Unit tests for the new reader; at least one end-to-end test (e.g. amplitude or IQP from the new format). Update
qdp/docs/readers/README.md.
Key files: qdp-core/src/readers/, qdp-core/src/reader.rs, qdp/docs/readers/README.md.
Phase 3: Baselines and documentation
Deliverables
- Reproducible throughput description or benchmark flow for “large file + all encodings (including IQP)”.
- Complete Getting Started and Examples for QDP (currently TODO in the docs), making “encode from file” a first-class documented workflow.
Implementation outline
- Benchmark: Define and document a small workflow (e.g. in
qdp-python/benchmark/orqdp/docs/) for Parquet + amplitude/angle/basis/iqp; align with [QDP] User Python API interface features and refactoring — Roadmap #969 Phase 2 baseline methodology where useful. - Docs:
- Getting Started: Install, minimal example, typical
encode/encode_from_parquetusage (including IQP). - Examples: 2–3 full examples (e.g. in-memory amplitude, Parquet + IQP, DLPack → PyTorch).
- Optionally: short API summary in the QDP API doc.
- Getting Started: Install, minimal example, typical
- Relationship to [QDP] User Python API interface features and refactoring — Roadmap #969: Reuse Phase 2 observability/baseline flow if available, to avoid duplicate tooling.
Phase order and dependencies
- Phase 1 is independent; only depends on the current pipeline and IQP kernels.
- Phase 2 builds on the same
stream_encodeinterface (can be parallelized with Phase 1 once reader integration is agreed). - Phase 3 can be done in parallel with Phase 1/2; the “large file + IQP” benchmark is most meaningful after Phase 1 is merged.
Suggested order: Land Phase 1 first, then Phase 2; Phase 3 docs can start early, with benchmark steps finalized after Phase 1.
Alternatives considered
- Only document current behavior: Does not address the missing IQP streaming path or additional formats.
- Single big PR: Phased approach allows incremental review and reduces risk.
Additional context
- IQP kernel and GPU encoding already exist:
qdp-kernels/src/iqp.cu,qdp-core/src/gpu/encodings/iqp.rs, andqdp-core/tests/iqp_encoding.rs. - Streaming pipeline and
ChunkEncoderare inqdp-core/src/encoding/(amplitude, angle, basis);encode_from_parquetis inencoding/mod.rs. - Readers design: qdp/docs/readers/README.md.