-
Notifications
You must be signed in to change notification settings - Fork 34
Open
Description
The following PR added support for micro benchmarking shuffle operations:
The benchmarks shipped in that PR have several cases, and all of them shuffle the same amount of data, but the output tasks and partitioning is different. I'd expect all cases to take the same time, as they shuffle the same amount of data, but there is some variations:
Benchmarking shuffle/stream/1producer_1consumer_8partitions_1000000rows_1024batch_Nonecompression:
time: [90.704 ms 91.729 ms 94.261 ms]
shuffle/stream/1producer_1consumer_8partitions_1000000rows_1024batch_Some(LZ4_FRAME)compression:
time: [115.04 ms 125.10 ms 139.33 ms]
Benchmarking shuffle/stream/8producer_1consumer_8partitions_1000000rows_1024batch_Nonecompression:
time: [60.487 ms 60.780 ms 61.057 ms]
Benchmarking shuffle/stream/1producer_8consumer_8partitions_1000000rows_1024batch_Nonecompression:
time: [197.23 ms 197.69 ms 198.10 ms]
Benchmarking shuffle/stream/8producer_8consumer_8partitions_1000000rows_1024batch_Nonecompression:
time: [76.776 ms 77.825 ms 80.544 ms]
I think there should be some low hanging fruits regarding performance, and some other more complicated improvements:
- I'm suspicious about
arrow-flightcloning data unnecessarily - RepartitionExec might be doing something funny depending on the amount of output tasks
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels