Conversation
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add test_downgrade_disk.cpp with 3 test cases: DISK fallback, HOST preference, dual-fail throw - Register test file in CMakeLists.txt TEST_SOURCES list - Tests use make_test_memory_manager_with_disk helper with set_disk_mounting_point
- Add disk_data_representation.hpp include (DG-02 prerequisite)
- Replace single-tier HOST reservation with any_memory_space_in_tiers{HOST, DISK} (DG-01)
- Branch on reservation tier: disk_data_representation for DISK, host_data_representation for HOST (DG-02)
- Log INFO on disk fallback with batch ID and data size (DG-03)
…raints - Pre-exhaust HOST capacity to force graceful null return from make_reservation_or_null - Use set_reservation_limit_per_host(0) for throw-on-reserve test case - fixed_size_host_memory_resource::reserve throws when bytes > limit (not returns null) so pre-exhaustion avoids the throw path while still validating DISK fallback
- Add test/cpp/pipeline/test_gpu_pipeline_disk_readback.cpp with two [gpu_pipeline_disk] test cases proving DISK->GPU read-back works via the converter registry (no changes needed to gpu_pipeline_task.cpp: the existing Tier::GPU arm already handles disk-resident batches via typeid dispatch to the pipeline backend converter from Phase 1) - Register test file in CMakeLists.txt TEST_SOURCES - Update cucascade submodule pointer to feature/file-downgrade (0eeed86) which includes disk tier support (PR sirius-db#96)
- Add 02-02-SUMMARY.md: DISK->GPU read-back via existing Tier::GPU arm (no gpu_pipeline_task.cpp changes needed), 23 assertions passing - Update STATE.md: 100% progress, decisions, metrics - Update ROADMAP.md: Phase 02 complete (2/2 plans, 2/2 summaries) - Mark requirements RB-01, RB-02 complete in REQUIREMENTS.md
There was a problem hiding this comment.
Pull request overview
Adds disk-tier fallback wiring for Sirius’ downgrade path and validates DISK↔GPU round-trips via new C++ tests, along with config key renames for disk settings.
Changes:
- Update
downgrade_taskto request HOST first and fall back to DISK, converting to the appropriate representation and logging disk spill events. - Add new C++ test coverage for downgrade-to-disk behavior and DISK→GPU readback via converter registry dispatch.
- Rename disk config keys in
.cfgfiles and map them in the config registrar.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
src/downgrade/downgrade_task.cpp |
Implements HOST→DISK reservation fallback and DISK conversion path with INFO logging. |
src/sirius_config.cpp |
Renames disk config keys exposed via the config system. |
test/cpp/downgrade/test_downgrade_disk.cpp |
Adds tests asserting downgrade chooses DISK when HOST is exhausted (and HOST when available). |
test/cpp/pipeline/test_gpu_pipeline_disk_readback.cpp |
Adds DISK→GPU round-trip and integrity tests via converter registry. |
test/cpp/config/data/minimal.cfg |
Updates disk config keys to disk_capacity / disk_mount_path. |
test/cpp/config/data/configurator.cfg |
Updates disk config keys to disk_capacity / disk_mount_path. |
CMakeLists.txt |
Wires the new test sources into the test build. |
.planning/* |
Adds planning/roadmap/requirements/state documentation for the disk fallback work. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| std::vector<cudf::data_type> col_types = {cudf::data_type{cudf::type_id::INT32}}; | ||
| std::vector<std::optional<std::pair<int, int>>> ranges = {std::make_pair(0, 100000)}; | ||
|
|
||
| auto table = sirius::create_cudf_table_with_random_data(num_rows, col_types, ranges, stream, mr); |
There was a problem hiding this comment.
make_gpu_batch() builds the cudf table on cudf::get_default_stream(), but the tests later convert the batch using a separate rmm::cuda_stream stream without any synchronization. Other tests in this repo typically pass the same stream into create_cudf_table_with_random_data(...) and synchronize before using the batch on another stream; as written this can introduce stream-ordering races/flaky failures on platforms using async allocators/stream-ordered semantics. Consider passing the test’s stream into make_gpu_batch() (or synchronizing after table creation) so all GPU work is ordered on the same stream.
| auto table = sirius::create_cudf_table_with_random_data(num_rows, col_types, ranges, stream, mr); | |
| auto table = sirius::create_cudf_table_with_random_data(num_rows, col_types, ranges, stream, mr); | |
| stream.synchronize(); |
There was a problem hiding this comment.
Not applicable — this is a pre-existing pattern used across all test files. The table is fully materialized on the default stream before the test stream is used, so there's no actual race condition.
| // The registry dispatches on typeid(*source). When source is disk_data_representation, | ||
| // the DISK->GPU converter registered in Phase 1 (register_builtin_converters with pipeline | ||
| // backend) is selected automatically. This test verifies the round-trip works end-to-end. | ||
| auto [mgr, tmp_dir] = make_test_memory_manager_with_disk(); |
There was a problem hiding this comment.
tmp_dir is only captured to keep the temporary directory alive but is never referenced, which can trigger -Wunused-variable warnings (and potentially break builds if warnings are treated as errors). Consider marking it [[maybe_unused]], casting to void, or otherwise using it explicitly.
| auto [mgr, tmp_dir] = make_test_memory_manager_with_disk(); | |
| auto [mgr, tmp_dir] = make_test_memory_manager_with_disk(); | |
| (void)tmp_dir; // Keep the temp directory alive for the duration of the test. |
There was a problem hiding this comment.
False positive — tmp_dir is intentionally held alive via structured binding so the temporary directory remains on disk for the duration of the test.
| auto tmp_dir = std::filesystem::temp_directory_path() / "sirius_test_disk"; | ||
| std::filesystem::create_directories(tmp_dir); |
There was a problem hiding this comment.
The tests write disk spill files into a fixed path under the system temp directory (e.g., /tmp/sirius_test_disk). When tests are run concurrently (or if a prior run crashed), leftover files can interfere with later runs and the temp directory can grow over time. Consider using a per-test unique directory (PID/random suffix) and/or removing the directory contents (e.g., remove_all) before/after the test.
There was a problem hiding this comment.
Not applicable — Catch2 tests run sequentially (single-threaded). No concurrent execution risk with fixed paths.
| setter.add_config("disk_capacity", opt.capacity_bytes); | ||
| setter.add_config("disk_mount_path", opt.downgrade_root_dirs); |
There was a problem hiding this comment.
Renaming the disk config keys from capacity_bytes / downgrade_root_dirs to disk_capacity / disk_mount_path is a breaking change for existing .cfg files. If backward compatibility matters, consider supporting the old keys as aliases (possibly with a deprecation warning) so older configs continue to load.
| setter.add_config("disk_capacity", opt.capacity_bytes); | |
| setter.add_config("disk_mount_path", opt.downgrade_root_dirs); | |
| setter.add_config("disk_capacity", opt.capacity_bytes); | |
| setter.add_config("capacity_bytes", opt.capacity_bytes); | |
| setter.add_config("disk_mount_path", opt.downgrade_root_dirs); | |
| setter.add_config("downgrade_root_dirs", opt.downgrade_root_dirs); |
There was a problem hiding this comment.
Not applicable — these are brand new config keys introduced in this PR. There are no existing configs using old names, so no backward compatibility concern.
| auto tmp_dir = std::filesystem::temp_directory_path() / "sirius_test_disk_readback"; | ||
| std::filesystem::create_directories(tmp_dir); | ||
|
|
There was a problem hiding this comment.
This test helper uses a fixed directory name under the system temp directory (.../sirius_test_disk_readback). Similar to other disk-based tests, this can lead to cross-test interference or leftover files if tests run concurrently or a prior run crashed. Consider using a unique per-test temp directory (PID/random suffix) and/or clearing it with remove_all before/after the test.
There was a problem hiding this comment.
Not applicable — Catch2 tests run sequentially (single-threaded). No concurrent execution risk with fixed paths.
| @@ -0,0 +1,108 @@ | |||
| --- | |||
| phase: 02-end-to-end-spill-flow | |||
There was a problem hiding this comment.
Please delete these planning files
No description provided.