Handle zero error scale gracefully in scaled metrics by mahi-ma · Pull Request #3059 · unit8co/darts

mahi-ma · 2026-04-05T06:04:39Z

Summary

Scaled metrics (ase, sse, mase, msse, rmsse) previously raised a hard ValueError when the insample series had zero error scale (i.e., constant or perfectly seasonal signals). This made batch evaluation pipelines brittle — a single problematic series would abort the entire run.

There are two distinct cases when the error scale (denominator) is zero:

Case 1 — denominator zero, numerator non-zero: The naive error is zero (constant or perfectly seasonal insample) but the model prediction error is not. There's no meaningful scale to compare against, so returning np.nan is the right call — it signals "undefined" rather than crashing.
Case 2 — both zero: The model prediction error is also zero, meaning the model is at least as good as the naive baseline. Returning 1.0 here makes semantic sense (performance on par with naive). Note: we cannot distinguish whether the model trivially is the seasonal naive or made a non-trivial prediction that happens to match — 1.0 is the right practical default.

This PR introduces a zero_division parameter (modelled after scikit-learn's convention) to all scaled metrics that controls this behaviour:

"warn" (default) — applies the smart defaults described above (np.nan for Case 1, 1.0 for Case 2) and emits a UserWarning.
"raise" — preserves the legacy ValueError behaviour for users who want strict validation.
Numeric value (e.g. 0.0, np.nan, 1.0) — fills all zero-scale entries with the given value regardless of the numerator, giving full control to the caller. Useful for automated pipelines where NaN propagation causes downstream failures.

Design decision: no m=1 fallback

When the seasonal naive (m=m) produces zero scale, we do not silently fall back to the non-seasonal naive (m=1). Rationale: MASE with m=8 and MASE with m=1 measure against different baselines. If the user chose m=8, the zero scale tells them the seasonal naive is already perfect on this data — that's useful information. Silently switching denominators would change the metric's semantics and be hard to debug. The zero_division parameter gives callers full control instead.

Changes

darts/metrics/utils.py: Added _safe_scaled_divide() helper using a clean single-pass np.where pattern for vectorized conditional division, replacing the previous hard raise in _get_error_scale.
darts/metrics/metrics.py: Added zero_division parameter to ase, mase, sse, msse, and rmsse. Updated each to use _safe_scaled_divide instead of raw division.
darts/tests/metrics/test_metrics.py: Added comprehensive test_scaled_errors_zero_division covering all three modes, constant and seasonal insample series, 0/0 edge cases, and explicit np.nan fill values. Updated test_season to verify the new default warning behaviour.

Test plan

test_scaled_errors_zero_division — covers "warn", "raise", and numeric fill for all 5 scaled metrics
test_season — updated to verify warning is emitted by default and "raise" still raises
Run full test suite: pytest darts/tests/metrics/test_metrics.py

… msse, rmsse) Previously, scaled metrics raised a hard ValueError when the insample series had zero error scale (constant or perfectly seasonal signals). This made batch evaluation pipelines brittle — a single problematic series would abort the entire run. This commit introduces a `zero_division` parameter (modelled after scikit-learn's convention) that controls the behaviour: - "warn" (default): returns NaN or 0.0 depending on the numerator, with a UserWarning - "raise": preserves the legacy ValueError behaviour - numeric value: fills all zero-scale entries with the given value Adds `_safe_scaled_divide` helper in metrics/utils.py and comprehensive tests covering all three modes, constant and seasonal insample series, and edge cases like 0/0.

… rationale - Replace mask + safe division + post-hoc assignment with clean nested np.where, matching the idiomatic pattern for vectorized conditional division. - Add docstring note explaining why 1.0 is the right default for the 0/0 case (model matches naive, but we cannot distinguish trivial from non-trivial predictions). - Update all 5 metric docstrings with the same clarification.

codecov · 2026-04-06T09:51:22Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.72%. Comparing base (e803ef6) to head (aec9826).
⚠️ Report is 2 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #3059      +/-   ##
==========================================
- Coverage   95.79%   95.72%   -0.07%     
==========================================
  Files         158      158              
  Lines       17293    17303      +10     
==========================================
- Hits        16565    16564       -1     
- Misses        728      739      +11

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

jakubchlapek

Hey @mahi-ma, thanks for your work and @Whatsonyourmind for the valid suggestions :)

I also agree with the chosen directions and the design decisions. I think the code looks good and clean, I like the zero_division parameter solution.

Regarding the code review there are a few minor things, mainly regarding the codebase conventions, which I commented on below. I can also see the CI/CD is failing at ruff, so would also need updating. If you didn't use the pre-commit I would suggest running ruff check --fix :). Feel free to update the changelog accordingly under the Unreleased Improved section and credit yourself :).

darts/metrics/utils.py

darts/tests/metrics/test_metrics.py

- Replace warnings.warn with logger.warning as per library standard - Add explicit validation for zero_division parameter values - Switch tests from pytest.warns to caplog.at_level pattern - Fix import sorting and apply black formatting - Add changelog entry under Unreleased Improved section

mahi-ma · 2026-04-07T06:17:14Z

Hey @mahi-ma, thanks for your work and @Whatsonyourmind for the valid suggestions :)

I also agree with the chosen directions and the design decisions. I think the code looks good and clean, I like the zero_division parameter solution.

Regarding the code review there are a few minor things, mainly regarding the codebase conventions, which I commented on below. I can also see the CI/CD is failing at ruff, so would also need updating. If you didn't use the pre-commit I would suggest running ruff check --fix :). Feel free to update the changelog accordingly under the Unreleased Improved section and credit yourself :).

The ruff version I was using was wrong, so thats why linting error keeps coming, fixed it now, should fix now

mahi-ma · 2026-04-07T16:00:15Z

Hey @mahi-ma, thanks for your work and @Whatsonyourmind for the valid suggestions :)
I also agree with the chosen directions and the design decisions. I think the code looks good and clean, I like the zero_division parameter solution.
Regarding the code review there are a few minor things, mainly regarding the codebase conventions, which I commented on below. I can also see the CI/CD is failing at ruff, so would also need updating. If you didn't use the pre-commit I would suggest running ruff check --fix :). Feel free to update the changelog accordingly under the Unreleased Improved section and credit yourself :).

The ruff version I was using was wrong, so thats why linting error keeps coming, fixed it now, should fix now

This is good to go now, once i get approvals I will sync it with master branch and merge it
@jakubchlapek @dennisbader please review

dennisbader

Really nice PR, thanks a lot @mahi-ma and also congrats to your first contribution 🚀

Your implementation is solid, and I like the solution. I did push some changes to remove the support for float in the zero_division parameter. While this is implemented for some sklearn metrics (such as precision, where 0. actually means infinite error), for our scaled metrics I think it would rather create ambiguity.

For example setting it to 0. would also score bad model forecasts as 0. (because the scale is zero). In that case it's more safe to return NaN.

I believe it's fine for now to treat on-par forecasts with zero scale error as 1., since the meaning behind it is actually correct (as good as naive).

Apart from that I added some more tests for multivariate, and probabilistic multi-quantile support. Everything looks good now, and once the tests passed, I'll merge 💯

Thanks again 👏

mahi-ma requested a review from dennisbader as a code owner April 5, 2026 06:04

mahi-ma added 2 commits April 4, 2026 23:11

fix both zero cases

38d1389

mahi-ma mentioned this pull request Apr 5, 2026

Allow scaled metrics to not fail if the insample is perfectly seasonal (m>1) or constant #3054

Closed

jakubchlapek reviewed Apr 6, 2026

View reviewed changes

darts/metrics/utils.py Outdated Show resolved Hide resolved

darts/metrics/utils.py Show resolved Hide resolved

darts/tests/metrics/test_metrics.py Outdated Show resolved Hide resolved

darts/tests/metrics/test_metrics.py Outdated Show resolved Hide resolved

mahi-ma added 4 commits April 6, 2026 11:45

Apply ruff format

8d61b4f

Add test for invalid zero_division string to cover validation branch

d996af1

Apply ruff format

c840f77

dennisbader added 4 commits April 10, 2026 08:46

Merge branch 'master' into feature/scaled-metrics-zero-division

7c3d06d

docs updates

26b0263

remove float support from zero_division param

0fb1536

update tests

e5d17bb

dennisbader approved these changes Apr 10, 2026

View reviewed changes

Merge branch 'master' into feature/scaled-metrics-zero-division

aec9826

dennisbader merged commit 5711b61 into unit8co:master Apr 10, 2026
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle zero error scale gracefully in scaled metrics#3059

Handle zero error scale gracefully in scaled metrics#3059
dennisbader merged 12 commits intounit8co:masterfrom
mahi-ma:feature/scaled-metrics-zero-division

mahi-ma commented Apr 5, 2026 •

edited

Loading

Uh oh!

codecov bot commented Apr 6, 2026 •

edited

Loading

Uh oh!

jakubchlapek left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mahi-ma commented Apr 7, 2026

Uh oh!

mahi-ma commented Apr 7, 2026 •

edited

Loading

Uh oh!

dennisbader left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mahi-ma commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Design decision: no m=1 fallback

Changes

Test plan

Uh oh!

codecov bot commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jakubchlapek left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mahi-ma commented Apr 7, 2026

Uh oh!

mahi-ma commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dennisbader left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mahi-ma commented Apr 5, 2026 •

edited

Loading

codecov bot commented Apr 6, 2026 •

edited

Loading

jakubchlapek left a comment •

edited

Loading

mahi-ma commented Apr 7, 2026 •

edited

Loading