-
Notifications
You must be signed in to change notification settings - Fork 345
Open
Labels
cubFor all items related to CUBFor all items related to CUB
Description
Is this a duplicate?
- I confirmed there appear to be no duplicate issues for this request and that I agree to the Code of Conduct
Area
CUB
Is your feature request related to a problem? Please describe.
Block reductions return a single value in the first thread, but there are instances were N results are needed in different threads. At a warp level, cub allows you to create "logical" warps, which partition a hardware thread group further, but this functionality is missing at a block level. All non-exited threads have to participate in block API call.
Describe the solution you'd like
Two possible solutions:
- Provide segmented / batched block reduction
- Allow sub-partitions of a block
Describe alternatives you've considered
The current work-around is to utilize cub's warp reduction and implement something custom to then combine the warp level reductions for the block but it would be nice for this to be a single cub API call.
Additional context
No response
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
cubFor all items related to CUBFor all items related to CUB
Type
Projects
Status
Todo