Skip to content

[FEA]: Allow sub-partition of a block reduction #7698

@nv-cuchytil

Description

@nv-cuchytil

Is this a duplicate?

Area

CUB

Is your feature request related to a problem? Please describe.

Block reductions return a single value in the first thread, but there are instances were N results are needed in different threads. At a warp level, cub allows you to create "logical" warps, which partition a hardware thread group further, but this functionality is missing at a block level. All non-exited threads have to participate in block API call.

Describe the solution you'd like

Two possible solutions:

  • Provide segmented / batched block reduction
  • Allow sub-partitions of a block

Describe alternatives you've considered

The current work-around is to utilize cub's warp reduction and implement something custom to then combine the warp level reductions for the block but it would be nice for this to be a single cub API call.

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    cubFor all items related to CUB

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions