Skip to content

Suggestion the default of chunks kwarg from 'auto' to {} or None #779

@chiaweh2

Description

@chiaweh2

Is your feature request related to a problem? Please describe.
I’ve noticed a error when upgrading from Xarray 2025.9.1 to 2026.2.0. In the older version, chunks='auto' handled these datasets without issue. However, the newer version now triggers a NotImplementedError when encountering object dtypes. This suggests that the internal size-estimation logic in recent Xarray/Dask updates is now strictly enforcing a check that was previously bypassed

Failed to load dataset with key='bvf2-theta-an-gauss.bvf2-theta-an-gauss'
                 You can use `cat['bvf2-theta-an-gauss.bvf2-theta-an-gauss'].df` to inspect the assets/files for this key.
                 
  File "/<REDACTED_PATH>/conda-envs/<ENV_NAME>/lib/python3.14/site-packages/intake_esm/source.py", line 292, in _open_dataset    datasets = dask.compute(*datasets)  
  File "/<REDACTED_PATH>/conda-envs/<ENV_NAME>/lib/python3.14/site-packages/dask/base.py", line 685, in compute    results = schedule(expr, keys, **kwargs)  
  File "/<REDACTED_PATH>/conda-envs/<ENV_NAME>/lib/python3.14/site-packages/intake_esm/source.py", line 67, in _delayed_open_ds    return _open_dataset(*args, **kwargs)  
  File "/<REDACTED_PATH>/conda-envs/<ENV_NAME>/lib/python3.14/site-packages/intake_esm/source.py", line 109, in _open_dataset    ds = xr.open_dataset(url, **xarray_open_kwargs)  
  File "/<REDACTED_PATH>/conda-envs/<ENV_NAME>/lib/python3.14/site-packages/xarray/backends/api.py", line 613, in open_dataset        backend_ds,         ^^^^^^^^^^    
  ...<13 lines>...        chunked_array_type,      
  File "/<REDACTED_PATH>/conda-envs/<ENV_NAME>/lib/python3.14/site-packages/xarray/backends/api.py", line 308, in _dataset_from_backend_dataset        ds,         ^^    
  ...<9 lines>...        inline_array,      
  File "/<REDACTED_PATH>/conda-envs/<ENV_NAME>/lib/python3.14/site-packages/xarray/backends/api.py", line 251, in _chunk_ds        var._data,                 ^    
  ...<5 lines>...        preferred_chunks=var.encoding.get("preferred_chunks", {}),      
  File "/<REDACTED_PATH>/conda-envs/<ENV_NAME>/lib/python3.14/site-packages/xarray/namedarray/utils.py", line 239, in _get_chunk        chunk_shape,                  ^^    
  ...<5 lines>...        limit=limit,      
  File "/<REDACTED_PATH>/conda-envs/<ENV_NAME>/lib/python3.14/site-packages/xarray/namedarray/daskmanager.py", line 57, in normalize_chunks        chunks,           ^^^^    
  ...<5 lines>...        dtype=dtype,      
  File "/<REDACTED_PATH>/conda-envs/<ENV_NAME>/lib/python3.14/site-packages/dask/array/core.py", line 3302, in auto_chunks        "Can not use auto rechunking with object dtype. "    
  ...<2 lines>...    
  NotImplementedError: Can not use auto rechunking with object dtype. We are unable to estimate the size in bytes of object data

Describe the solution you'd like
This error generally doesn't occur in standard xarray.open_dataset workflows because the default for chunks is None. However, since intake-esm defaults to chunks='auto', it forces a re-evaluation of chunk sizes that fails on object dtypes. I suggest changing the default chunks kwarg to either None (to align with xarray's default) or {} (to enable Dask while respecting the dataset's native on-disk structure).

Describe alternatives you've considered
I suggest changing the default for the chunks kwarg from 'auto' to None. This would align more closely with user expectations; if no xarray_open_kwargs are provided, most users assume xarray's native defaults apply. Alternatively, if a Dask-compatible default is preferred, using chunks={} would be a safer choice, as it preserves the original on-disk chunking and avoids the NotImplementedError currently triggered by object dtypes during 'auto' rechunking.

Additional context
I’m happy to open a PR to address this. I'd appreciate your thoughts on the best approach, as well as any insight into why auto chunking is being triggered for these specific dtypes. Thanks! I will also make a xarray related issue related to this issue.

The following code will reproduce the error above

import intake
catalog_url = "https://data.gdex.ucar.edu/d640000/catalogs/d640000-osdf.json"
cat = intake.open_esm_datastore(catalog_url)
cat_analysis = cat.search(variable='bvf2-theta-an-gauss')
dict_datasets = cat_analysis.to_dataset_dict(xarray_open_kwargs={'engine':'kerchunk'})

the conda env yml

name: arco_test
channels:
  - conda-forge
  - defaults
dependencies:
  - python>3.11
  - pip
  # Core Data & Geospatial
  - xarray == 2026.2.0
  - netcdf4
  - zarr
  - fastparquet
  - kerchunk
  - dask-jobqueue
  - intake-esm >=2025.12.12
  # Visualization & Utilities
  - matplotlib
  - jupyterlab
  - pip:
    - pelicanfs>=1.3.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions