-
Notifications
You must be signed in to change notification settings - Fork 53
Suggestion the default of chunks kwarg from 'auto' to {} or None #779
Description
Is your feature request related to a problem? Please describe.
I’ve noticed a error when upgrading from Xarray 2025.9.1 to 2026.2.0. In the older version, chunks='auto' handled these datasets without issue. However, the newer version now triggers a NotImplementedError when encountering object dtypes. This suggests that the internal size-estimation logic in recent Xarray/Dask updates is now strictly enforcing a check that was previously bypassed
Failed to load dataset with key='bvf2-theta-an-gauss.bvf2-theta-an-gauss'
You can use `cat['bvf2-theta-an-gauss.bvf2-theta-an-gauss'].df` to inspect the assets/files for this key.
File "/<REDACTED_PATH>/conda-envs/<ENV_NAME>/lib/python3.14/site-packages/intake_esm/source.py", line 292, in _open_dataset datasets = dask.compute(*datasets)
File "/<REDACTED_PATH>/conda-envs/<ENV_NAME>/lib/python3.14/site-packages/dask/base.py", line 685, in compute results = schedule(expr, keys, **kwargs)
File "/<REDACTED_PATH>/conda-envs/<ENV_NAME>/lib/python3.14/site-packages/intake_esm/source.py", line 67, in _delayed_open_ds return _open_dataset(*args, **kwargs)
File "/<REDACTED_PATH>/conda-envs/<ENV_NAME>/lib/python3.14/site-packages/intake_esm/source.py", line 109, in _open_dataset ds = xr.open_dataset(url, **xarray_open_kwargs)
File "/<REDACTED_PATH>/conda-envs/<ENV_NAME>/lib/python3.14/site-packages/xarray/backends/api.py", line 613, in open_dataset backend_ds, ^^^^^^^^^^
...<13 lines>... chunked_array_type,
File "/<REDACTED_PATH>/conda-envs/<ENV_NAME>/lib/python3.14/site-packages/xarray/backends/api.py", line 308, in _dataset_from_backend_dataset ds, ^^
...<9 lines>... inline_array,
File "/<REDACTED_PATH>/conda-envs/<ENV_NAME>/lib/python3.14/site-packages/xarray/backends/api.py", line 251, in _chunk_ds var._data, ^
...<5 lines>... preferred_chunks=var.encoding.get("preferred_chunks", {}),
File "/<REDACTED_PATH>/conda-envs/<ENV_NAME>/lib/python3.14/site-packages/xarray/namedarray/utils.py", line 239, in _get_chunk chunk_shape, ^^
...<5 lines>... limit=limit,
File "/<REDACTED_PATH>/conda-envs/<ENV_NAME>/lib/python3.14/site-packages/xarray/namedarray/daskmanager.py", line 57, in normalize_chunks chunks, ^^^^
...<5 lines>... dtype=dtype,
File "/<REDACTED_PATH>/conda-envs/<ENV_NAME>/lib/python3.14/site-packages/dask/array/core.py", line 3302, in auto_chunks "Can not use auto rechunking with object dtype. "
...<2 lines>...
NotImplementedError: Can not use auto rechunking with object dtype. We are unable to estimate the size in bytes of object data
Describe the solution you'd like
This error generally doesn't occur in standard xarray.open_dataset workflows because the default for chunks is None. However, since intake-esm defaults to chunks='auto', it forces a re-evaluation of chunk sizes that fails on object dtypes. I suggest changing the default chunks kwarg to either None (to align with xarray's default) or {} (to enable Dask while respecting the dataset's native on-disk structure).
Describe alternatives you've considered
I suggest changing the default for the chunks kwarg from 'auto' to None. This would align more closely with user expectations; if no xarray_open_kwargs are provided, most users assume xarray's native defaults apply. Alternatively, if a Dask-compatible default is preferred, using chunks={} would be a safer choice, as it preserves the original on-disk chunking and avoids the NotImplementedError currently triggered by object dtypes during 'auto' rechunking.
Additional context
I’m happy to open a PR to address this. I'd appreciate your thoughts on the best approach, as well as any insight into why auto chunking is being triggered for these specific dtypes. Thanks! I will also make a xarray related issue related to this issue.
The following code will reproduce the error above
import intake
catalog_url = "https://data.gdex.ucar.edu/d640000/catalogs/d640000-osdf.json"
cat = intake.open_esm_datastore(catalog_url)
cat_analysis = cat.search(variable='bvf2-theta-an-gauss')
dict_datasets = cat_analysis.to_dataset_dict(xarray_open_kwargs={'engine':'kerchunk'})
the conda env yml
name: arco_test
channels:
- conda-forge
- defaults
dependencies:
- python>3.11
- pip
# Core Data & Geospatial
- xarray == 2026.2.0
- netcdf4
- zarr
- fastparquet
- kerchunk
- dask-jobqueue
- intake-esm >=2025.12.12
# Visualization & Utilities
- matplotlib
- jupyterlab
- pip:
- pelicanfs>=1.3.1