PoC Set variables async by BorisTheBrave · Pull Request #11171 · pydata/xarray

BorisTheBrave · 2026-02-14T09:55:59Z

This is my proposal for resolving #10622.

As suggested here, the idea is:

Internally, we use zarr's async API
Change the sync methods of xarray to use zarr's sync() function, which opts xarray methods into zarr's built in event loop

The advantage of using the async API is it's easy to resolve all the parallelism issues such as #9455. I demonstrate this for Store.set_variables. It will work for users without them making any changes, and can be applied incrementally to the codebase without poisoning everything with async methods. It doesn't require any new capabilities from zarr, or any significant structural changes.

zarr's synchronous methods are simply a wrapper around async anyway, so this is just lifting the wrapper one level higher.

The specific change I've done here just parallelizes writing the metadata of variables for a dataset. It doesn't help much for deeply nested trees. It also does not parallelize writing data. That still has to be done via dask, which has to be opted in by the user. One consequence of this is I've re-ordered write operations. It now writes all metadata, then all data, rather than interleaving. This is causing a few tests to fail.

I've had to duplicate a lot of code into a v3 and v2 branch, as it looks like xarray is designed to support both versions of zarr, but there's no async API in the former.

Another downside is that as I'm co-opting zarr's event loop, this approach only works in the zarr backend. That limits the scope compared with some other solutions, and makes it a bit more awkward to implement (as the async code can't leak into the rest of the xarray code base easily).

This is just PoC for disucssion. A full implementation might

Add tests
Change existing tests to understand the new order of data writing
expose async versions of xarray's sync API
Extend the async area of the code to more things
Figure out async ArrayWriter
Think about limiting concurrency
Avoid duplicating code, either:
- Drop support for v2
- Create an async->sync store shim so the async code can be the only path (I've prepared def _zarr_async_group() for this)

I did some quick performance benchmarking to determine that this change has the desired effect;

# %%
import xarray as xr
import numpy as np
from zarr.storage import LocalStore
from zarr.testing.store import LatencyStore

# Setup a dataset with many variables
data_vars = {f'x{i}': (('a', 'b'), np.arange(6).reshape(2, 3)) for i in range(10)}
ds = xr.Dataset(data_vars)
ds = ds.chunk() # Parallelize writing the data


# Start from empty
import shutil
shutil.rmtree('test.zarr', ignore_errors=True)

# Use latency store to make concurrency obvious
store = LocalStore('test.zarr')
store = LatencyStore(store, set_latency=1)

import time
start = time.time()
ds.to_zarr(store, mode='w')
end = time.time()
print(f"Time taken: {end - start:.3f} seconds")

Before: Time taken: 24.215 seconds
After: Time taken: 6.131 seconds

for more information, see https://pre-commit.ci

BorisTheBrave · 2026-02-14T10:14:51Z

pyproject.toml

 io = [
  "netCDF4>=1.6.0",
-  "h5netcdf>=1.4.0",
+  "h5netcdf[h5py]>=1.4.0",


I've isolated these toml changes to a separate PR #11172

BorisTheBrave added 8 commits February 13, 2026 22:52

Initial separation of set_variables and set_variables_async

dc64086

Test dependency fixes

f216821

Test fixes

6ef1312

Progress towards set_variables_async

71fe505

Test fix

25f0776

Rename

d60ed8c

Another fix

2a0671e

Set variables in parallel

f09cabb

github-actions bot added topic-backends topic-zarr Related to zarr storage library topic-typing io labels Feb 14, 2026

pre-commit-ci bot and others added 2 commits February 14, 2026 09:56

[pre-commit.ci] auto fixes from pre-commit.com hooks

7eac4cc

for more information, see https://pre-commit.ci

Cleanup

0308e5e

BorisTheBrave commented Feb 14, 2026

View reviewed changes

BorisTheBrave mentioned this pull request Feb 14, 2026

How should Xarray control asynchronous calls? #10622

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PoC Set variables async#11171

PoC Set variables async#11171
BorisTheBrave wants to merge 10 commits intopydata:mainfrom
BorisTheBrave:set_variables_async

BorisTheBrave commented Feb 14, 2026 •

edited

Loading

Uh oh!

BorisTheBrave Feb 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

BorisTheBrave commented Feb 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BorisTheBrave Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

BorisTheBrave commented Feb 14, 2026 •

edited

Loading