feat: support GEMM (C = alpha*A*B + beta*C) in matmul kernels

## Summary

The current matmul kernels implement `C = A * B` (effectively alpha=1, beta=0), always overwriting the output tensor. Supporting the full GEMM interface `C = alpha * A * B + beta * C` would enable fused matmul+bias operations without extra kernel launches.

## Motivation

Burn is adding a fused `linear` op to `ModuleOps` (tracel-ai/burn#4737) so backends can optimize the common `y = x @ W + b` pattern. With beta=1 support, the bias could be pre-loaded into the output tensor, and a single matmul kernel call would produce `C = A * B + bias` with no intermediate allocation or separate add kernel.

This is the single most common operation in transformer models (every attention projection and FFN layer), so even small gains compound across dozens of layers.

## Proposed API

Add optional `alpha` and `beta` parameters to the matmul launch path:

```rust
pub fn matmul<R: CubeRuntime>(
    lhs: CubeTensor<R>,
    rhs: CubeTensor<R>,
    out: Option<CubeTensor<R>>,
    strategy: MatmulStrategy,
    out_dtype: DType,
    alpha: f32,  // default 1.0
    beta: f32,   // default 0.0
) -> Result<CubeTensor<R>, MatmulSetupError>
```

When `beta != 0`, the kernel would accumulate into the existing output values instead of overwriting them.

## Use Case

Burn's `linear` forward pass would then become:

```rust
fn linear(x, weight, bias) -> Tensor {
    let out = bias.broadcast_to(output_shape); // or unsqueeze+expand
    matmul(x, weight, Some(out), strategy, dtype, 1.0, 1.0)
}
```

One kernel launch instead of two (matmul + add).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support GEMM (C = alphaAB + beta*C) in matmul kernels #1280

Summary

Motivation

Proposed API

Use Case

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat: support GEMM (C = alpha*A*B + beta*C) in matmul kernels #1280

Description

Summary

Motivation

Proposed API

Use Case

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

feat: support GEMM (C = alphaAB + beta*C) in matmul kernels #1280