Native Arrow transport path with zero-copy transfer (For reference purpose only)#21253
Native Arrow transport path with zero-copy transfer (For reference purpose only)#21253rishabhmaurya wants to merge 1 commit intoopensearch-project:mainfrom
Conversation
PR Reviewer Guide 🔍(Review updated until commit ac83ec6)Here are some key observations to aid the review process:
|
28617d9 to
7d506e8
Compare
PR Code Suggestions ✨Latest suggestions up to ac83ec6 Explore these optional code suggestions:
Previous suggestionsSuggestions up to commit 7d506e8
Suggestions up to commit 28617d9
|
|
Persistent review updated to latest commit 7d506e8 |
Add zero-serialization path for Arrow data in the Flight transport. When a response extends ArrowBatchResponse, the framework does zero-copy transfer of typed Arrow vectors via the Flight stream - no byte serialization. Java owns all buffer management through the channel allocator. Supports pipelined batch production. Signed-off-by: Rishabh Maurya <rishma@amazon.com> Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>
7d506e8 to
ac83ec6
Compare
|
Persistent review updated to latest commit ac83ec6 |
|
❌ Gradle check result for ac83ec6: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Context
Alternative approach to #21240. Adds a zero-serialization path for Arrow data in the Flight transport without core server changes.
Summary
Send side
ArrowBatchResponse— abstract base for native Arrow responses.writeTo()is a final no-op. Framework detects this and does zero-copytransferTo()— moving buffer pointers from producer vectors into the channel's shared root on the executor thread.Receive side
VectorStreamInput.getRoot()—ArrowBatchResponse(StreamInput in)calls this to access typed vectors directly. No factory selection needed; the handler decides which methods to call.Allocator access
ArrowFlightChannel— public interface withgetAllocator()andfrom(channel)unwrap utility. Implemented byFlightTransportChannelandFlightServerChannel.Buffer management
Java owns all memory via the channel's allocator. Producers create
VectorSchemaRootfrom this allocator. The framework creates a shared root bound to Flight viastart(). Each batch is zero-copy transferred into the shared root beforeputNext(). Supports pipelined production — each batch has independent buffers, executor drains serially.Key design decisions
arrow-flight-rpcpluginVectorStreamOutputrefactored to abstract withByteSerialized(unchanged byte path) andNativeArrow(no-op writes) implementationsTransferPair.transfer()— buffer pointer swap, no memcpyTest plan
NativeArrowTransportIT— single batch, serial multi-batch, and parallel 100-batch (5 producer threads) with data integrity verificationNativeArrowStreamTransportExampleIT— example plugin demonstrating the APIDesign doc
See
plugins/arrow-flight-rpc/docs/native-arrow-transport-design.md