Skip to content

roc bug, possibly due to dimension names? #442

@paololucchino

Description

@paololucchino

Bug: Incorrect ROC AUC calculation, possibly related to dimension order in multi-dimensional arrays

When computing ROC AUC on multi-dimensional arrays where observations and forecasts have different dimension orders (but same dimension names), xss.roc() produces incorrect results that differ significantly from sklearn's ground truth implementation.

Code Sample, a copy-pastable example if possible

import numpy as np
import xarray as xr
import xskillscore as xss
from sklearn.metrics import roc_auc_score

# Create test data with specific seed
np.random.seed(1512)
obs_raw = xr.DataArray(
    np.random.normal(0.5, 0.2, size=(20, 10)),
    coords=[("time", np.arange(20)), ("points", np.arange(10))],
)
da_obs = (obs_raw > 0.5).astype(int)

# Create forecast with different dimension order via broadcasting
alpha = xr.DataArray(np.linspace(0, 1, num=10), coords=[("points", np.arange(10))])
error = xr.DataArray(np.random.normal(0.0, 0.03, size=20), coords=[("time", np.arange(20))])
da_forecast = alpha + obs_raw + error

print(f"da_obs dims: {da_obs.dims}, shape: {da_obs.shape}")
print(f"da_forecast dims: {da_forecast.dims}, shape: {da_forecast.shape}")
# Output: da_obs dims: ('time', 'points'), da_forecast dims: ('points', 'time')

# Compute using xskillscore
xss_result = xss.roc(da_obs, da_forecast, dim="time", return_results="area")

# Compare against sklearn (ground truth) for each point
print("\nComparison with sklearn:")
print(f"{'Point':<6} {'sklearn':<10} {'xskillscore':<12} {'Error':<10}")
print("-" * 40)

for point in range(10):
    obs_p = da_obs.isel(points=point).values
    fc_p = da_forecast.isel(points=point).values

    sklearn_auc = roc_auc_score(obs_p, fc_p)
    xss_auc = xss_result.isel(points=point).values
    error = abs(sklearn_auc - xss_auc)

    print(f"{point:<6} {sklearn_auc:<10.6f} {xss_auc:<12.6f} {error:<10.6f}")

Output:

da_obs dims: ('time', 'points'), shape: (20, 10)
da_forecast dims: ('points', 'time'), shape: (10, 20)

Comparison with sklearn:

Comparison with sklearn:
Point  sklearn    xskillscore  Error
----------------------------------------
0      0.939394   0.939394     0.000000
1      0.979798   0.979798     0.000000
2      1.000000   0.990909     0.009091
3      1.000000   1.000000     0.000000
4      0.958333   0.845833     0.112500
5      0.958333   0.733333     0.225000
6      0.927083   0.200000     0.727083
7      1.000000   0.140909     0.859091
8      0.968750   0.175000     0.793750
9      1.000000   0.000000     1.000000

For point 9 specifically:

  • The data shows strong positive correlation (0.869 between obs and forecasts)
  • High forecast values consistently correspond to positive observations
  • sklearn correctly returns AUC = 1.0 (perfect classifier)
  • xskillscore incorrectly returns AUC = 0.0 (completely wrong)

*** Expected Output ***

xskillscore should produce results matching sklearn regardless of dimension order, as long as dimension names match.

Environment:

  • xskillscore version: 0.0.26
  • xarray version: 2025.3.1
  • numpy version: 1.26.4

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions