feat: Add range-based locking for parallel file I/O#847
feat: Add range-based locking for parallel file I/O#847a-hirota wants to merge 1 commit intorapidsai:mainfrom
Conversation
Implement range-based locking mechanism to enable truly parallel non-overlapping writes to files. This improves performance for multi-GPU workloads by allowing concurrent writes to different file regions while serializing only overlapping operations. Key changes: - Add RangeLockManager class for managing non-overlapping range locks - Add FileHandleWithRangeLock extending FileHandle with range lock support - Add comprehensive C++ and Python tests for range lock functionality - Support move semantics for efficient lock transfer Performance benefits: - Non-overlapping writes can execute in parallel - Reduces contention for multi-GPU file I/O operations - Maintains data integrity by serializing overlapping ranges
There was a problem hiding this comment.
Pull Request Overview
This PR introduces a range-based locking mechanism for kvikio to enable parallel file I/O operations on non-overlapping file regions. The implementation allows multiple threads or GPUs to write to different sections of a file simultaneously while serializing only overlapping operations.
Key Changes
- Added
RangeLockManagerclass for managing overlapping range detection and locking - Added
FileHandleWithRangeLockextendingFileHandlewith range-aware parallel I/O operations - Comprehensive test coverage in both C++ and Python to validate parallel execution and data integrity
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| cpp/include/kvikio/range_lock.hpp | Core range lock manager with overlap detection and RAII lock semantics |
| cpp/include/kvikio/file_handle_rangelock.hpp | Extended file handle with range-locked read/write operations |
| cpp/tests/test_range_lock.cpp | C++ unit tests for range locking functionality and performance |
| python/kvikio/tests/test_range_lock.py | Python integration tests for concurrent file operations |
| cpp/tests/CMakeLists.txt | Build configuration for range lock tests |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| return std::async(std::launch::deferred, [future = std::move(future), | ||
| lock = std::move(range_lock)]() mutable { | ||
| auto result = future.get(); | ||
| // Lock will be automatically released when this lambda exits | ||
| return result; | ||
| }); |
There was a problem hiding this comment.
Using std::launch::deferred prevents the operation from running in parallel. The lambda will only execute when .get() is called, defeating the purpose of parallel I/O. Consider using std::launch::async or a different approach to maintain parallelism while ensuring proper lock cleanup.
| return std::async(std::launch::deferred, [future = std::move(future), | ||
| lock = std::move(range_lock)]() mutable { |
There was a problem hiding this comment.
Same deferred execution issue as in pwrite_rangelock. This prevents the read operation from running in parallel, contradicting the parallel I/O goals of the feature.
| return std::async(std::launch::deferred, [future = std::move(future), | |
| lock = std::move(range_lock)]() mutable { | |
| return std::async(std::launch::async, [future = std::move(future), | |
| lock = std::move(range_lock)]() mutable { |
|
/ok to test |
@madsbk, there was an error processing your request: See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/ |
|
/ok to test 68ff040 |
| bool sync_default_stream = true) { | ||
|
|
||
| // Acquire range lock for this write | ||
| auto range_lock = range_lock_manager_.lock_range(file_offset, file_offset + size); |
There was a problem hiding this comment.
KvikIO has transitioned from a header-only library to a shared library. So all the implementations should go to corresponding .cpp files.
| * Copyright (c) 2025, NVIDIA CORPORATION. | ||
| * | ||
| * Modified FileHandle with range-based locking support | ||
| */ |
There was a problem hiding this comment.
Preamble should follow this complete format: https://github.com/rapidsai/kvikio/blob/branch-25.12/cpp/include/kvikio/file_handle.hpp
| * | ||
| * This version acquires a lock only for the specific range being written, | ||
| * allowing non-overlapping writes to proceed in parallel. | ||
| */ |
There was a problem hiding this comment.
Function parameters and return values should be commented in Doxygen format.
|
|
||
| namespace kvikio { | ||
|
|
||
| class FileHandleWithRangeLock : public FileHandle { |
There was a problem hiding this comment.
I think the class needs to be commented in detail too, to explain the purpose of this class and also key implementation details. For example, when multiple write requests contend on a common range, what would the behavior be?
|
@a-hirota, thanks for contributing! When you say locking, do you mean locking via this API? We’re not doing any file-based locking here, right? |
| return std::async(std::launch::deferred, [future = std::move(future), | ||
| lock = std::move(range_lock)]() mutable { | ||
| auto result = future.get(); | ||
| // Lock will be automatically released when this lambda exits | ||
| return result; |
There was a problem hiding this comment.
For parallel I/O, the step to wait for the chunked tasks is performed in the thread pool (https://github.com/rapidsai/kvikio/blob/branch-25.12/cpp/include/kvikio/parallel_operation.hpp#L175-L184). Would it be possible that this unlock step here follows suit and gets pushed to the thread pool's task queue as well?
|
Thanks for submitting the PR for review. For completely new features, it is usually a good practice to file an issue first that explains the need/motivation/objective, before diving into implementation details. I think it would be of great help if you could elaborate on the use case of this range locking feature. Are you developing a database system based on KvikIO? You mentioned that this PR significantly improves the performance for multi-GPU. Is there any benchmark result to share? |
Description
This PR implements range-based locking mechanism for kvikio to enable truly parallel non-overlapping writes to files. This significantly
improves performance for multi-GPU workloads.
Key Changes
RangeLockManagerclass for managing non-overlapping range locksFileHandleWithRangeLockextendingFileHandlewith range lock supportPerformance Benefits
Usage Example
C++ Usage
Python Usage (Future API)
Testing
Performance Impact
In our tests with 2 GPUs writing to non-overlapping regions: