Skip to content

Handle zero error scale gracefully in scaled metrics#3059

Merged
dennisbader merged 12 commits intounit8co:masterfrom
mahi-ma:feature/scaled-metrics-zero-division
Apr 10, 2026
Merged

Handle zero error scale gracefully in scaled metrics#3059
dennisbader merged 12 commits intounit8co:masterfrom
mahi-ma:feature/scaled-metrics-zero-division

Conversation

@mahi-ma
Copy link
Copy Markdown
Contributor

@mahi-ma mahi-ma commented Apr 5, 2026

Summary

Scaled metrics (ase, sse, mase, msse, rmsse) previously raised a hard ValueError when the insample series had zero error scale (i.e., constant or perfectly seasonal signals). This made batch evaluation pipelines brittle — a single problematic series would abort the entire run.

There are two distinct cases when the error scale (denominator) is zero:

  • Case 1 — denominator zero, numerator non-zero: The naive error is zero (constant or perfectly seasonal insample) but the model prediction error is not. There's no meaningful scale to compare against, so returning np.nan is the right call — it signals "undefined" rather than crashing.
  • Case 2 — both zero: The model prediction error is also zero, meaning the model is at least as good as the naive baseline. Returning 1.0 here makes semantic sense (performance on par with naive). Note: we cannot distinguish whether the model trivially is the seasonal naive or made a non-trivial prediction that happens to match — 1.0 is the right practical default.

This PR introduces a zero_division parameter (modelled after scikit-learn's convention) to all scaled metrics that controls this behaviour:

  • "warn" (default) — applies the smart defaults described above (np.nan for Case 1, 1.0 for Case 2) and emits a UserWarning.
  • "raise" — preserves the legacy ValueError behaviour for users who want strict validation.
  • Numeric value (e.g. 0.0, np.nan, 1.0) — fills all zero-scale entries with the given value regardless of the numerator, giving full control to the caller. Useful for automated pipelines where NaN propagation causes downstream failures.

Design decision: no m=1 fallback

When the seasonal naive (m=m) produces zero scale, we do not silently fall back to the non-seasonal naive (m=1). Rationale: MASE with m=8 and MASE with m=1 measure against different baselines. If the user chose m=8, the zero scale tells them the seasonal naive is already perfect on this data — that's useful information. Silently switching denominators would change the metric's semantics and be hard to debug. The zero_division parameter gives callers full control instead.

Changes

  • darts/metrics/utils.py: Added _safe_scaled_divide() helper using a clean single-pass np.where pattern for vectorized conditional division, replacing the previous hard raise in _get_error_scale.
  • darts/metrics/metrics.py: Added zero_division parameter to ase, mase, sse, msse, and rmsse. Updated each to use _safe_scaled_divide instead of raw division.
  • darts/tests/metrics/test_metrics.py: Added comprehensive test_scaled_errors_zero_division covering all three modes, constant and seasonal insample series, 0/0 edge cases, and explicit np.nan fill values. Updated test_season to verify the new default warning behaviour.

Test plan

  • test_scaled_errors_zero_division — covers "warn", "raise", and numeric fill for all 5 scaled metrics
  • test_season — updated to verify warning is emitted by default and "raise" still raises
  • Run full test suite: pytest darts/tests/metrics/test_metrics.py

… msse, rmsse)

Previously, scaled metrics raised a hard ValueError when the insample
series had zero error scale (constant or perfectly seasonal signals).
This made batch evaluation pipelines brittle — a single problematic
series would abort the entire run.

This commit introduces a `zero_division` parameter (modelled after
scikit-learn's convention) that controls the behaviour:
- "warn" (default): returns NaN or 0.0 depending on the numerator,
  with a UserWarning
- "raise": preserves the legacy ValueError behaviour
- numeric value: fills all zero-scale entries with the given value

Adds `_safe_scaled_divide` helper in metrics/utils.py and comprehensive
tests covering all three modes, constant and seasonal insample series,
and edge cases like 0/0.
@mahi-ma mahi-ma requested a review from dennisbader as a code owner April 5, 2026 06:04
mahi-ma added 2 commits April 4, 2026 23:11
… rationale

- Replace mask + safe division + post-hoc assignment with clean nested
  np.where, matching the idiomatic pattern for vectorized conditional
  division.
- Add docstring note explaining why 1.0 is the right default for the
  0/0 case (model matches naive, but we cannot distinguish trivial from
  non-trivial predictions).
- Update all 5 metric docstrings with the same clarification.
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 6, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.72%. Comparing base (e803ef6) to head (aec9826).
⚠️ Report is 2 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3059      +/-   ##
==========================================
- Coverage   95.79%   95.72%   -0.07%     
==========================================
  Files         158      158              
  Lines       17293    17303      +10     
==========================================
- Hits        16565    16564       -1     
- Misses        728      739      +11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Collaborator

@jakubchlapek jakubchlapek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @mahi-ma, thanks for your work and @Whatsonyourmind for the valid suggestions :)

I also agree with the chosen directions and the design decisions. I think the code looks good and clean, I like the zero_division parameter solution.

Regarding the code review there are a few minor things, mainly regarding the codebase conventions, which I commented on below. I can also see the CI/CD is failing at ruff, so would also need updating. If you didn't use the pre-commit I would suggest running ruff check --fix :). Feel free to update the changelog accordingly under the Unreleased Improved section and credit yourself :).

mahi-ma added 4 commits April 6, 2026 11:45
- Replace warnings.warn with logger.warning as per library standard
- Add explicit validation for zero_division parameter values
- Switch tests from pytest.warns to caplog.at_level pattern
- Fix import sorting and apply black formatting
- Add changelog entry under Unreleased Improved section
@mahi-ma
Copy link
Copy Markdown
Contributor Author

mahi-ma commented Apr 7, 2026

Hey @mahi-ma, thanks for your work and @Whatsonyourmind for the valid suggestions :)

I also agree with the chosen directions and the design decisions. I think the code looks good and clean, I like the zero_division parameter solution.

Regarding the code review there are a few minor things, mainly regarding the codebase conventions, which I commented on below. I can also see the CI/CD is failing at ruff, so would also need updating. If you didn't use the pre-commit I would suggest running ruff check --fix :). Feel free to update the changelog accordingly under the Unreleased Improved section and credit yourself :).

The ruff version I was using was wrong, so thats why linting error keeps coming, fixed it now, should fix now

@mahi-ma
Copy link
Copy Markdown
Contributor Author

mahi-ma commented Apr 7, 2026

Hey @mahi-ma, thanks for your work and @Whatsonyourmind for the valid suggestions :)
I also agree with the chosen directions and the design decisions. I think the code looks good and clean, I like the zero_division parameter solution.
Regarding the code review there are a few minor things, mainly regarding the codebase conventions, which I commented on below. I can also see the CI/CD is failing at ruff, so would also need updating. If you didn't use the pre-commit I would suggest running ruff check --fix :). Feel free to update the changelog accordingly under the Unreleased Improved section and credit yourself :).

The ruff version I was using was wrong, so thats why linting error keeps coming, fixed it now, should fix now

This is good to go now, once i get approvals I will sync it with master branch and merge it
@jakubchlapek @dennisbader please review

Copy link
Copy Markdown
Collaborator

@dennisbader dennisbader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice PR, thanks a lot @mahi-ma and also congrats to your first contribution 🚀

Your implementation is solid, and I like the solution. I did push some changes to remove the support for float in the zero_division parameter. While this is implemented for some sklearn metrics (such as precision, where 0. actually means infinite error), for our scaled metrics I think it would rather create ambiguity.

For example setting it to 0. would also score bad model forecasts as 0. (because the scale is zero). In that case it's more safe to return NaN.

I believe it's fine for now to treat on-par forecasts with zero scale error as 1., since the meaning behind it is actually correct (as good as naive).

Apart from that I added some more tests for multivariate, and probabilistic multi-quantile support. Everything looks good now, and once the tests passed, I'll merge 💯

Thanks again 👏

@dennisbader dennisbader merged commit 5711b61 into unit8co:master Apr 10, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants