[Question] Zero-initialization guarantees for `jax_kernel` outputs and best practices for reductions #1190

liblaf · 2026-01-23T14:10:02Z

liblaf
Jan 23, 2026

Hi,

I am using warp.jax_experimental.jax_kernel to integrate Warp kernels into a JAX workflow. I have a question regarding the initialization of pure output arrays.

In my experiments, output arrays allocated via jax_kernel appear to be zero-initialized. However, I'd like to confirm if this is guaranteed behavior or an implementation detail of the current allocator that I shouldn't rely on.

Looking at warp/_src/jax_experimental/ffi.py, it appears that FfiKernel receives output buffers from the XLA call_frame and passes them to the kernel. I didn't see explicit memset or zeroing logic in the callback.

Questions:

Are pure output arrays guaranteed to be zero-initialized?
If not, what is the recommended pattern to zero the output array before computation?

Here is the pattern I am currently exploring:

import jax
import jax.numpy as jnp
import warp as wp
import warp.jax_experimental
from jax import Array


@wp.kernel
def warp_kernel(
    inputs: wp.array(dtype=wp.float32), outputs: wp.array1d(dtype=wp.float32)
) -> None:
    idx = wp.tid()
    # # potentially unsafe if outputs[0] is garbage
    outputs[0] += inputs[idx]


jax_kernel = warp.jax_experimental.jax_kernel(warp_kernel)


@jax.jit
def fun(x: Array) -> Array:
    y: Array
    (y,) = jax_kernel(x, output_dims={"outputs": (1,)}, launch_dims=x.shape)
    return y[0]


def main() -> None:
    N: int = 10000000
    x: Array = jnp.ones((N,))
    y: Array = fun(x)
    assert jnp.allclose(y, jnp.sum(x))


if __name__ == "__main__":
    main()

Thanks for the help!

Answered by nvlukasz

Jan 23, 2026

Hi @liblaf,

You are correct, Warp doesn't explicitly zero-initialize the output buffers. That would be wasteful for pure output buffers that would just be overwritten by the kernel launch. I don't know whether XLA initializes the output buffers in any way, but I wouldn't count on it (for the same reasons).

The best way to ensure initialization it is to use in-out arguments. You can initialize the in-out array however you need (not just zero), then modify it in the kernel.

@wp.kernel
def warp_kernel(
    inputs: wp.array(dtype=wp.float32), output: wp.array(dtype=wp.float32)
) -> None:
    idx = wp.tid()
    wp.atomic_add(output, 0, inputs[idx])  # <--- note atomic_add()

jax_kernel = warp.j…

View full answer

nvlukasz · 2026-01-23T14:53:18Z

nvlukasz
Jan 23, 2026
Collaborator

Hi @liblaf,

You are correct, Warp doesn't explicitly zero-initialize the output buffers. That would be wasteful for pure output buffers that would just be overwritten by the kernel launch. I don't know whether XLA initializes the output buffers in any way, but I wouldn't count on it (for the same reasons).

The best way to ensure initialization it is to use in-out arguments. You can initialize the in-out array however you need (not just zero), then modify it in the kernel.

@wp.kernel
def warp_kernel(
    inputs: wp.array(dtype=wp.float32), output: wp.array(dtype=wp.float32)
) -> None:
    idx = wp.tid()
    wp.atomic_add(output, 0, inputs[idx])  # <--- note atomic_add()

jax_kernel = warp.jax_experimental.jax_kernel(
    warp_kernel,
    in_out_argnames=["output"],  # <--- note in_out_argnames
)

@jax.jit
def fun(x: Array) -> Array:
    output: Array = jnp.zeros(1)
    (y,) = jax_kernel(x, output)
    return y[0]

def main() -> None:
    N: int = 10000000
    x: Array = jnp.ones((N,))
    y: Array = fun(x)
    assert jnp.allclose(y, jnp.sum(x))

if __name__ == "__main__":
    main()

Note that JAX doesn't allow modifying arrays in-place, so the output array is a modified copy of the input.

Also note that wp.atomic_add() should be used for this kind of reduction, otherwise there are race conditions and results might be incorrect.

1 reply

liblaf Jan 23, 2026
Author

Thanks for your answer, it is really useful.

Also note that wp.atomic_add() should be used for this kind of reduction, otherwise there are race conditions and results might be incorrect.

The docs says wp.atomic_add is automatically invoked when using the syntax arr[i] += value, so I think it should be safe. But anyway, explicit is better than implicit.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Zero-initialization guarantees for `jax_kernel` outputs and best practices for reductions #1190

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[Question] Zero-initialization guarantees for jax_kernel outputs and best practices for reductions #1190

Uh oh!

Uh oh!

liblaf Jan 23, 2026

Replies: 1 comment · 1 reply

Uh oh!

nvlukasz Jan 23, 2026 Collaborator

Uh oh!

liblaf Jan 23, 2026 Author

[Question] Zero-initialization guarantees for `jax_kernel` outputs and best practices for reductions #1190

liblaf
Jan 23, 2026

Replies: 1 comment 1 reply

nvlukasz
Jan 23, 2026
Collaborator

liblaf Jan 23, 2026
Author