Feature/disk fallback wiring by felipeblazing · Pull Request #575 · sirius-db/sirius

felipeblazing · 2026-04-04T12:00:09Z

No description provided.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Add test_downgrade_disk.cpp with 3 test cases: DISK fallback, HOST preference, dual-fail throw - Register test file in CMakeLists.txt TEST_SOURCES list - Tests use make_test_memory_manager_with_disk helper with set_disk_mounting_point

- Add disk_data_representation.hpp include (DG-02 prerequisite) - Replace single-tier HOST reservation with any_memory_space_in_tiers{HOST, DISK} (DG-01) - Branch on reservation tier: disk_data_representation for DISK, host_data_representation for HOST (DG-02) - Log INFO on disk fallback with batch ID and data size (DG-03)

…raints - Pre-exhaust HOST capacity to force graceful null return from make_reservation_or_null - Use set_reservation_limit_per_host(0) for throw-on-reserve test case - fixed_size_host_memory_resource::reserve throws when bytes > limit (not returns null) so pre-exhaustion avoids the throw path while still validating DISK fallback

- Add test/cpp/pipeline/test_gpu_pipeline_disk_readback.cpp with two [gpu_pipeline_disk] test cases proving DISK->GPU read-back works via the converter registry (no changes needed to gpu_pipeline_task.cpp: the existing Tier::GPU arm already handles disk-resident batches via typeid dispatch to the pipeline backend converter from Phase 1) - Register test file in CMakeLists.txt TEST_SOURCES - Update cucascade submodule pointer to feature/file-downgrade (0eeed86) which includes disk tier support (PR sirius-db#96)

- Add 02-02-SUMMARY.md: DISK->GPU read-back via existing Tier::GPU arm (no gpu_pipeline_task.cpp changes needed), 23 assertions passing - Update STATE.md: 100% progress, decisions, metrics - Update ROADMAP.md: Phase 02 complete (2/2 plans, 2/2 summaries) - Mark requirements RB-01, RB-02 complete in REQUIREMENTS.md

Copilot

Pull request overview

Adds disk-tier fallback wiring for Sirius’ downgrade path and validates DISK↔GPU round-trips via new C++ tests, along with config key renames for disk settings.

Changes:

Update downgrade_task to request HOST first and fall back to DISK, converting to the appropriate representation and logging disk spill events.
Add new C++ test coverage for downgrade-to-disk behavior and DISK→GPU readback via converter registry dispatch.
Rename disk config keys in .cfg files and map them in the config registrar.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`src/downgrade/downgrade_task.cpp`	Implements HOST→DISK reservation fallback and DISK conversion path with INFO logging.
`src/sirius_config.cpp`	Renames disk config keys exposed via the config system.
`test/cpp/downgrade/test_downgrade_disk.cpp`	Adds tests asserting downgrade chooses DISK when HOST is exhausted (and HOST when available).
`test/cpp/pipeline/test_gpu_pipeline_disk_readback.cpp`	Adds DISK→GPU round-trip and integrity tests via converter registry.
`test/cpp/config/data/minimal.cfg`	Updates disk config keys to `disk_capacity` / `disk_mount_path`.
`test/cpp/config/data/configurator.cfg`	Updates disk config keys to `disk_capacity` / `disk_mount_path`.
`CMakeLists.txt`	Wires the new test sources into the test build.
`.planning/*`	Adds planning/roadmap/requirements/state documentation for the disk fallback work.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-04T12:05:02Z

test/cpp/pipeline/test_gpu_pipeline_disk_readback.cpp

+  std::vector<cudf::data_type> col_types                 = {cudf::data_type{cudf::type_id::INT32}};
+  std::vector<std::optional<std::pair<int, int>>> ranges = {std::make_pair(0, 100000)};
+
+  auto table = sirius::create_cudf_table_with_random_data(num_rows, col_types, ranges, stream, mr);


make_gpu_batch() builds the cudf table on cudf::get_default_stream(), but the tests later convert the batch using a separate rmm::cuda_stream stream without any synchronization. Other tests in this repo typically pass the same stream into create_cudf_table_with_random_data(...) and synchronize before using the batch on another stream; as written this can introduce stream-ordering races/flaky failures on platforms using async allocators/stream-ordered semantics. Consider passing the test’s stream into make_gpu_batch() (or synchronizing after table creation) so all GPU work is ordered on the same stream.

Suggested change

auto table = sirius::create_cudf_table_with_random_data(num_rows, col_types, ranges, stream, mr);

auto table = sirius::create_cudf_table_with_random_data(num_rows, col_types, ranges, stream, mr);

stream.synchronize();

Not applicable — this is a pre-existing pattern used across all test files. The table is fully materialized on the default stream before the test stream is used, so there's no actual race condition.

Copilot · 2026-04-04T12:05:03Z

test/cpp/pipeline/test_gpu_pipeline_disk_readback.cpp

+  // The registry dispatches on typeid(*source). When source is disk_data_representation,
+  // the DISK->GPU converter registered in Phase 1 (register_builtin_converters with pipeline
+  // backend) is selected automatically. This test verifies the round-trip works end-to-end.
+  auto [mgr, tmp_dir] = make_test_memory_manager_with_disk();


tmp_dir is only captured to keep the temporary directory alive but is never referenced, which can trigger -Wunused-variable warnings (and potentially break builds if warnings are treated as errors). Consider marking it [[maybe_unused]], casting to void, or otherwise using it explicitly.

Suggested change

auto [mgr, tmp_dir] = make_test_memory_manager_with_disk();

auto [mgr, tmp_dir] = make_test_memory_manager_with_disk();

(void)tmp_dir; // Keep the temp directory alive for the duration of the test.

False positive — tmp_dir is intentionally held alive via structured binding so the temporary directory remains on disk for the duration of the test.

Copilot · 2026-04-04T12:05:03Z

test/cpp/downgrade/test_downgrade_disk.cpp

+  auto tmp_dir = std::filesystem::temp_directory_path() / "sirius_test_disk";
+  std::filesystem::create_directories(tmp_dir);


The tests write disk spill files into a fixed path under the system temp directory (e.g., /tmp/sirius_test_disk). When tests are run concurrently (or if a prior run crashed), leftover files can interfere with later runs and the temp directory can grow over time. Consider using a per-test unique directory (PID/random suffix) and/or removing the directory contents (e.g., remove_all) before/after the test.

Not applicable — Catch2 tests run sequentially (single-threaded). No concurrent execution risk with fixed paths.

Copilot · 2026-04-04T12:05:03Z

src/sirius_config.cpp

+    setter.add_config("disk_capacity", opt.capacity_bytes);
+    setter.add_config("disk_mount_path", opt.downgrade_root_dirs);


Renaming the disk config keys from capacity_bytes / downgrade_root_dirs to disk_capacity / disk_mount_path is a breaking change for existing .cfg files. If backward compatibility matters, consider supporting the old keys as aliases (possibly with a deprecation warning) so older configs continue to load.

Suggested change

setter.add_config("disk_capacity", opt.capacity_bytes);

setter.add_config("disk_mount_path", opt.downgrade_root_dirs);

setter.add_config("disk_capacity", opt.capacity_bytes);

setter.add_config("capacity_bytes", opt.capacity_bytes);

setter.add_config("disk_mount_path", opt.downgrade_root_dirs);

setter.add_config("downgrade_root_dirs", opt.downgrade_root_dirs);

Not applicable — these are brand new config keys introduced in this PR. There are no existing configs using old names, so no backward compatibility concern.

Copilot · 2026-04-04T12:05:03Z

test/cpp/pipeline/test_gpu_pipeline_disk_readback.cpp

+  auto tmp_dir = std::filesystem::temp_directory_path() / "sirius_test_disk_readback";
+  std::filesystem::create_directories(tmp_dir);
+


This test helper uses a fixed directory name under the system temp directory (.../sirius_test_disk_readback). Similar to other disk-based tests, this can lead to cross-test interference or leftover files if tests run concurrently or a prior run crashed. Consider using a unique per-test temp directory (PID/random suffix) and/or clearing it with remove_all before/after the test.

Not applicable — Catch2 tests run sequentially (single-threaded). No concurrent execution risk with fixed paths.

wmalpica · 2026-04-07T21:19:24Z

.planning/phases/02-end-to-end-spill-flow/02-02-SUMMARY.md

@@ -0,0 +1,108 @@
+---
+phase: 02-end-to-end-spill-flow


Please delete these planning files

felipeblazing and others added 10 commits April 2, 2026 20:00

docs: initialize project

8b7d3ab

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat(01-01): rename disk config keys (CFG-01)

0935c09

Merge branch 'worktree-agent-a19d1351' into feature/disk-fallback-wiring

9d1e3c4

Merge branch 'worktree-agent-aa308538' into feature/disk-fallback-wiring

eaccc54

Merge branch 'worktree-agent-a6b639fd' into feature/disk-fallback-wiring

26c614b

felipeblazing requested a review from Copilot April 4, 2026 12:00

Copilot started reviewing on behalf of felipeblazing April 4, 2026 12:01 View session

Copilot AI reviewed Apr 4, 2026

View reviewed changes

felipeblazing added 2 commits April 4, 2026 08:14

included disk cucascade

0c00147

fixed lint issues

77519fe

wmalpica requested changes Apr 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/disk fallback wiring#575

Feature/disk fallback wiring#575
felipeblazing wants to merge 12 commits intosirius-db:devfrom
felipeblazing:feature/disk-fallback-wiring

felipeblazing commented Apr 4, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 4, 2026

Uh oh!

felipeblazing Apr 5, 2026

Uh oh!

Copilot AI Apr 4, 2026

Uh oh!

felipeblazing Apr 5, 2026

Uh oh!

Copilot AI Apr 4, 2026

Uh oh!

felipeblazing Apr 5, 2026

Uh oh!

Copilot AI Apr 4, 2026

Uh oh!

felipeblazing Apr 5, 2026

Uh oh!

Copilot AI Apr 4, 2026

Uh oh!

felipeblazing Apr 5, 2026

Uh oh!

wmalpica Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	auto table = sirius::create_cudf_table_with_random_data(num_rows, col_types, ranges, stream, mr);
	auto table = sirius::create_cudf_table_with_random_data(num_rows, col_types, ranges, stream, mr);
	stream.synchronize();

	auto [mgr, tmp_dir] = make_test_memory_manager_with_disk();
	auto [mgr, tmp_dir] = make_test_memory_manager_with_disk();
	(void)tmp_dir; // Keep the temp directory alive for the duration of the test.

		auto tmp_dir = std::filesystem::temp_directory_path() / "sirius_test_disk";
		std::filesystem::create_directories(tmp_dir);

		setter.add_config("disk_capacity", opt.capacity_bytes);
		setter.add_config("disk_mount_path", opt.downgrade_root_dirs);

Conversation

felipeblazing commented Apr 4, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

felipeblazing Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

felipeblazing Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

felipeblazing Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

felipeblazing Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

felipeblazing Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

wmalpica Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants