Skip to content

GPU scalar indexing when transpose(::CuArray) #970

@mattsignorelli

Description

@mattsignorelli

It's weird because x = CUDA.zeros(1,4) works fine but x = transpose(CUDA.zeros(4)) gives the scalar indexing problem.

MWE:

using KernelAbstractions, CUDA
import DifferentiationInterface as DI

@kernel function foo!(y, x)
    i = @index(Global)
    a = 2*i - 1
    b = 2*i
    offset = (i-1)*4
    y[a] = (offset+1)*x[a] + (offset+2)*x[b]
    y[b] = (offset+3)*x[a] + (offset+4)*x[b]
end

kernel! = foo!(CUDA.CUDABackend())
f!(y,x) = kernel!(y, x, ndrange=2)

# This works fine:
x = CUDA.zeros(1,4)
y = CUDA.rand(1,4)
prep = DI.prepare_jacobian(f!, y, DI.AutoForwardFromPrimitive(DI.AutoForwardDiff()), x);
DI.value_and_jacobian!(fun!, y, jac, prep, DI.AutoForwardFromPrimitive(DI.AutoForwardDiff()), x)
#= Output:
4×4 CuArray{Float32, 2, CUDA.DeviceMemory}:
 1.0  2.0  0.0  0.0
 3.0  4.0  0.0  0.0
 0.0  0.0  5.0  6.0
 0.0  0.0  7.0  8.0
=#

# This causes scalar indexing:
x = transpose(CUDA.zeros(4))
y = transpose(CUDA.rand(4))
prep = DI.prepare_jacobian(f!, y, DI.AutoForwardFromPrimitive(DI.AutoForwardDiff()), x);
DI.value_and_jacobian!(fun!, y, jac, prep, DI.AutoForwardFromPrimitive(DI.AutoForwardDiff()), x)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions