feat: Add range-based locking for parallel file I/O by a-hirota · Pull Request #847 · rapidsai/kvikio

a-hirota · 2025-09-30T12:36:05Z

Description

This PR implements range-based locking mechanism for kvikio to enable truly parallel non-overlapping writes to files. This significantly
improves performance for multi-GPU workloads.

Key Changes

Added RangeLockManager class for managing non-overlapping range locks
Added FileHandleWithRangeLock extending FileHandle with range lock support
Comprehensive test coverage in both C++ and Python

Performance Benefits

Non-overlapping writes can execute in parallel
Reduces contention for multi-GPU file I/O operations
Maintains data integrity by serializing only overlapping ranges

Usage Example

C++ Usage

#include <kvikio/file_handle_rangelock.hpp>

// Create a file handle with range lock support
kvikio::FileHandleWithRangeLock file("output.bin", "w+");

// Multiple threads/GPUs can write to non-overlapping regions in parallel
std::vector<std::thread> threads;

// GPU 0 writes to first half of file
threads.emplace_back([&]() {
    void* gpu0_data = ...;  // GPU 0 data
    auto future = file.pwrite_rangelock(gpu0_data, size, 0);
    future.get();
});

// GPU 1 writes to second half - executes in parallel!
threads.emplace_back([&]() {
    void* gpu1_data = ...;  // GPU 1 data
    auto future = file.pwrite_rangelock(gpu1_data, size, size);
    future.get();
});

for (auto& t : threads) {
    t.join();
}

Python Usage (Future API)

  import kvikio
  import cupy as cp
  from concurrent.futures import ThreadPoolExecutor

  # When Python bindings are added:
  def write_partition(gpu_id, file_handle, offset, data):
      with cp.cuda.Device(gpu_id):
          # This would execute in parallel for non-overlapping regions
          file_handle.pwrite_rangelock(data, file_offset=offset)

  # Multiple GPUs writing to different file regions
  with kvikio.FileHandleWithRangeLock("output.bin", "w+") as f:
      with ThreadPoolExecutor() as executor:
          futures = []
          for gpu_id in range(num_gpus):
              offset = gpu_id * partition_size
              futures.append(
                  executor.submit(write_partition, gpu_id, f, offset, data[gpu_id])
              )
          # All non-overlapping writes execute in parallel
          for future in futures:
              future.result()

Testing

Added C++ tests in cpp/tests/test_range_lock.cpp
Added Python tests in python/kvikio/tests/test_range_lock.py
Tests cover:
- Non-overlapping parallel writes
- Overlapping range serialization
- Move semantics
- Performance benchmarks

Performance Impact

In our tests with 2 GPUs writing to non-overlapping regions:

Without range lock: Serial execution (one GPU waits for the other)
With range lock: Parallel execution (both GPUs write simultaneously)
Expected speedup: Near-linear with number of GPUs for non-overlapping writes

Implement range-based locking mechanism to enable truly parallel non-overlapping writes to files. This improves performance for multi-GPU workloads by allowing concurrent writes to different file regions while serializing only overlapping operations. Key changes: - Add RangeLockManager class for managing non-overlapping range locks - Add FileHandleWithRangeLock extending FileHandle with range lock support - Add comprehensive C++ and Python tests for range lock functionality - Support move semantics for efficient lock transfer Performance benefits: - Non-overlapping writes can execute in parallel - Reduces contention for multi-GPU file I/O operations - Maintains data integrity by serializing overlapping ranges

copy-pr-bot · 2025-09-30T12:36:11Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Copilot

Pull Request Overview

This PR introduces a range-based locking mechanism for kvikio to enable parallel file I/O operations on non-overlapping file regions. The implementation allows multiple threads or GPUs to write to different sections of a file simultaneously while serializing only overlapping operations.

Key Changes

Added RangeLockManager class for managing overlapping range detection and locking
Added FileHandleWithRangeLock extending FileHandle with range-aware parallel I/O operations
Comprehensive test coverage in both C++ and Python to validate parallel execution and data integrity

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
cpp/include/kvikio/range_lock.hpp	Core range lock manager with overlap detection and RAII lock semantics
cpp/include/kvikio/file_handle_rangelock.hpp	Extended file handle with range-locked read/write operations
cpp/tests/test_range_lock.cpp	C++ unit tests for range locking functionality and performance
python/kvikio/tests/test_range_lock.py	Python integration tests for concurrent file operations
cpp/tests/CMakeLists.txt	Build configuration for range lock tests

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-09-30T12:37:02Z

cpp/include/kvikio/file_handle_rangelock.hpp

+        return std::async(std::launch::deferred, [future = std::move(future),
+                                                  lock = std::move(range_lock)]() mutable {
+            auto result = future.get();
+            // Lock will be automatically released when this lambda exits
+            return result;
+        });


Using std::launch::deferred prevents the operation from running in parallel. The lambda will only execute when .get() is called, defeating the purpose of parallel I/O. Consider using std::launch::async or a different approach to maintain parallelism while ensuring proper lock cleanup.

Copilot · 2025-09-30T12:37:03Z

cpp/include/kvikio/file_handle_rangelock.hpp

+        return std::async(std::launch::deferred, [future = std::move(future),
+                                                  lock = std::move(range_lock)]() mutable {


Same deferred execution issue as in pwrite_rangelock. This prevents the read operation from running in parallel, contradicting the parallel I/O goals of the feature.

Suggested change

return std::async(std::launch::deferred, [future = std::move(future),

lock = std::move(range_lock)]() mutable {

return std::async(std::launch::async, [future = std::move(future),

lock = std::move(range_lock)]() mutable {

madsbk · 2025-09-30T13:41:39Z

/ok to test

copy-pr-bot · 2025-09-30T13:41:43Z

/ok to test

@madsbk, there was an error processing your request: E1

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/

madsbk · 2025-09-30T13:45:25Z

/ok to test 68ff040

kingcrimsontianyu · 2025-09-30T14:55:41Z

cpp/include/kvikio/file_handle_rangelock.hpp

+                                              bool sync_default_stream = true) {
+
+        // Acquire range lock for this write
+        auto range_lock = range_lock_manager_.lock_range(file_offset, file_offset + size);


KvikIO has transitioned from a header-only library to a shared library. So all the implementations should go to corresponding .cpp files.

kingcrimsontianyu · 2025-09-30T14:58:35Z

cpp/include/kvikio/file_handle_rangelock.hpp

+ * Copyright (c) 2025, NVIDIA CORPORATION.
+ *
+ * Modified FileHandle with range-based locking support
+ */


Preamble should follow this complete format: https://github.com/rapidsai/kvikio/blob/branch-25.12/cpp/include/kvikio/file_handle.hpp

kingcrimsontianyu · 2025-09-30T15:00:16Z

cpp/include/kvikio/file_handle_rangelock.hpp

+     *
+     * This version acquires a lock only for the specific range being written,
+     * allowing non-overlapping writes to proceed in parallel.
+     */


Function parameters and return values should be commented in Doxygen format.

kingcrimsontianyu · 2025-09-30T15:06:45Z

cpp/include/kvikio/file_handle_rangelock.hpp

+
+namespace kvikio {
+
+class FileHandleWithRangeLock : public FileHandle {


I think the class needs to be commented in detail too, to explain the purpose of this class and also key implementation details. For example, when multiple write requests contend on a common range, what would the behavior be?

madsbk · 2025-09-30T15:09:32Z

@a-hirota, thanks for contributing! When you say locking, do you mean locking via this API? We’re not doing any file-based locking here, right?

kingcrimsontianyu · 2025-09-30T15:13:16Z

cpp/include/kvikio/file_handle_rangelock.hpp

+        return std::async(std::launch::deferred, [future = std::move(future),
+                                                  lock = std::move(range_lock)]() mutable {
+            auto result = future.get();
+            // Lock will be automatically released when this lambda exits
+            return result;


For parallel I/O, the step to wait for the chunked tasks is performed in the thread pool (https://github.com/rapidsai/kvikio/blob/branch-25.12/cpp/include/kvikio/parallel_operation.hpp#L175-L184). Would it be possible that this unlock step here follows suit and gets pushed to the thread pool's task queue as well?

kingcrimsontianyu · 2025-10-01T21:57:35Z

Thanks for submitting the PR for review.

For completely new features, it is usually a good practice to file an issue first that explains the need/motivation/objective, before diving into implementation details. I think it would be of great help if you could elaborate on the use case of this range locking feature. Are you developing a database system based on KvikIO? You mentioned that this PR significantly improves the performance for multi-GPU. Is there any benchmark result to share?

Copilot AI review requested due to automatic review settings September 30, 2025 12:36

a-hirota requested review from a team as code owners September 30, 2025 12:36

Copilot AI reviewed Sep 30, 2025

View reviewed changes

madsbk added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Sep 30, 2025

kingcrimsontianyu reviewed Sep 30, 2025

View reviewed changes

		return std::async(std::launch::deferred, [future = std::move(future),
		lock = std::move(range_lock)]() mutable {


		namespace kvikio {

		class FileHandleWithRangeLock : public FileHandle {

Conversation

a-hirota commented Sep 30, 2025

Description

Key Changes

Performance Benefits

Usage Example

C++ Usage

Python Usage (Future API)

Testing

Performance Impact

Uh oh!

copy-pr-bot bot commented Sep 30, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Key Changes

Reviewed Changes

Uh oh!

Copilot AI Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

madsbk commented Sep 30, 2025

Uh oh!

copy-pr-bot bot commented Sep 30, 2025

Uh oh!

madsbk commented Sep 30, 2025

Uh oh!

kingcrimsontianyu Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

kingcrimsontianyu Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

kingcrimsontianyu Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

kingcrimsontianyu Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

madsbk commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kingcrimsontianyu Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

kingcrimsontianyu commented Oct 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

madsbk commented Sep 30, 2025 •

edited

Loading