Open
Conversation
for more information, see https://pre-commit.ci
BorisTheBrave
commented
Feb 14, 2026
| io = [ | ||
| "netCDF4>=1.6.0", | ||
| "h5netcdf>=1.4.0", | ||
| "h5netcdf[h5py]>=1.4.0", |
Author
There was a problem hiding this comment.
I've isolated these toml changes to a separate PR #11172
7 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is my proposal for resolving #10622.
As suggested here, the idea is:
sync()function, which opts xarray methods into zarr's built in event loopThe advantage of using the async API is it's easy to resolve all the parallelism issues such as #9455. I demonstrate this for Store.set_variables. It will work for users without them making any changes, and can be applied incrementally to the codebase without poisoning everything with async methods. It doesn't require any new capabilities from zarr, or any significant structural changes.
zarr's synchronous methods are simply a wrapper around async anyway, so this is just lifting the wrapper one level higher.
The specific change I've done here just parallelizes writing the metadata of variables for a dataset. It doesn't help much for deeply nested trees. It also does not parallelize writing data. That still has to be done via dask, which has to be opted in by the user. One consequence of this is I've re-ordered write operations. It now writes all metadata, then all data, rather than interleaving. This is causing a few tests to fail.
I've had to duplicate a lot of code into a v3 and v2 branch, as it looks like xarray is designed to support both versions of zarr, but there's no async API in the former.
Another downside is that as I'm co-opting zarr's event loop, this approach only works in the zarr backend. That limits the scope compared with some other solutions, and makes it a bit more awkward to implement (as the async code can't leak into the rest of the xarray code base easily).
This is just PoC for disucssion. A full implementation might
def _zarr_async_group()for this)I did some quick performance benchmarking to determine that this change has the desired effect;
Before: Time taken: 24.215 seconds
After: Time taken: 6.131 seconds