Skip to content

Fuse concat with reduce_by_index #2418

@FluxusMagna

Description

@FluxusMagna

This optimisation is conceptually very similar to how concat is fused into scatter. This is important for algorithms that produce many points to be reduced per item. A fairly minimal example of something that should fuse but currently does not is

entry test n =
    let (is, vs) = tabulate n (\i -> ((i, (i+1) % n), (1f32, 1f32))) |> unzip
    let unzip_concat as = unzip as |> uncurry concat
    in hist (+) 0f32 n (unzip_concat is) (unzip_concat vs)

here the lack off fusion is not too problematic, as the number of items to reduce scales with the array, but once the number of items to reduce increases beyond that there can be some really bad scaling issues, as (is, vs) needs to be manifested due to the lack of fusion. Unlike scatter where the number of points to be scattered are usually less than or equal to the size of the destination array, in reduce_by_index this is often not the case.

I have some code used to transfer point-value pairs a onto coarser grid using polynomial interpolation, and this produces many items to reduce per point to be transferred, especially in higher dimensions. This is necessary for implementing Particle Mesh Ewald (PME) summation which is a common method in Molecular Dynamics (MD) and more generally for solving linear partial differential equations with complex geometry.

As an example of the scaling issues, this is the results for 3d cubic interpolation where each point is reduced to 4x4x4 bins with different weights. The grid size is kept constant, only the number of points to be reduced is changed.

grid_test.fut:cubic_bench (no tuning file):
[3][100000]i32:         4475μs (95% CI: [    4464.6,     4510.0])
[3][1000000]i32:       40286μs (95% CI: [   39761.9,    42289.3])
[3][10000000]i32:   69413528μs |███       |  3 runs ^C

The smaller systems are also slower than expected, but as the array that needs to be manifested grows, the function becomes unusably slow.

In the worst case here, the intermediate data, that would be fused away, is far larger than the input data or result grid.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions