Handle zero error scale gracefully in scaled metrics#3059
Handle zero error scale gracefully in scaled metrics#3059dennisbader merged 12 commits intounit8co:masterfrom
Conversation
… msse, rmsse) Previously, scaled metrics raised a hard ValueError when the insample series had zero error scale (constant or perfectly seasonal signals). This made batch evaluation pipelines brittle — a single problematic series would abort the entire run. This commit introduces a `zero_division` parameter (modelled after scikit-learn's convention) that controls the behaviour: - "warn" (default): returns NaN or 0.0 depending on the numerator, with a UserWarning - "raise": preserves the legacy ValueError behaviour - numeric value: fills all zero-scale entries with the given value Adds `_safe_scaled_divide` helper in metrics/utils.py and comprehensive tests covering all three modes, constant and seasonal insample series, and edge cases like 0/0.
… rationale - Replace mask + safe division + post-hoc assignment with clean nested np.where, matching the idiomatic pattern for vectorized conditional division. - Add docstring note explaining why 1.0 is the right default for the 0/0 case (model matches naive, but we cannot distinguish trivial from non-trivial predictions). - Update all 5 metric docstrings with the same clarification.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #3059 +/- ##
==========================================
- Coverage 95.79% 95.72% -0.07%
==========================================
Files 158 158
Lines 17293 17303 +10
==========================================
- Hits 16565 16564 -1
- Misses 728 739 +11 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Hey @mahi-ma, thanks for your work and @Whatsonyourmind for the valid suggestions :)
I also agree with the chosen directions and the design decisions. I think the code looks good and clean, I like the zero_division parameter solution.
Regarding the code review there are a few minor things, mainly regarding the codebase conventions, which I commented on below. I can also see the CI/CD is failing at ruff, so would also need updating. If you didn't use the pre-commit I would suggest running ruff check --fix :). Feel free to update the changelog accordingly under the Unreleased Improved section and credit yourself :).
- Replace warnings.warn with logger.warning as per library standard - Add explicit validation for zero_division parameter values - Switch tests from pytest.warns to caplog.at_level pattern - Fix import sorting and apply black formatting - Add changelog entry under Unreleased Improved section
The ruff version I was using was wrong, so thats why linting error keeps coming, fixed it now, should fix now |
This is good to go now, once i get approvals I will sync it with master branch and merge it |
dennisbader
left a comment
There was a problem hiding this comment.
Really nice PR, thanks a lot @mahi-ma and also congrats to your first contribution 🚀
Your implementation is solid, and I like the solution. I did push some changes to remove the support for float in the zero_division parameter. While this is implemented for some sklearn metrics (such as precision, where 0. actually means infinite error), for our scaled metrics I think it would rather create ambiguity.
For example setting it to 0. would also score bad model forecasts as 0. (because the scale is zero). In that case it's more safe to return NaN.
I believe it's fine for now to treat on-par forecasts with zero scale error as 1., since the meaning behind it is actually correct (as good as naive).
Apart from that I added some more tests for multivariate, and probabilistic multi-quantile support. Everything looks good now, and once the tests passed, I'll merge 💯
Thanks again 👏
Summary
Scaled metrics (
ase,sse,mase,msse,rmsse) previously raised a hardValueErrorwhen the insample series had zero error scale (i.e., constant or perfectly seasonal signals). This made batch evaluation pipelines brittle — a single problematic series would abort the entire run.There are two distinct cases when the error scale (denominator) is zero:
np.nanis the right call — it signals "undefined" rather than crashing.1.0here makes semantic sense (performance on par with naive). Note: we cannot distinguish whether the model trivially is the seasonal naive or made a non-trivial prediction that happens to match —1.0is the right practical default.This PR introduces a
zero_divisionparameter (modelled after scikit-learn's convention) to all scaled metrics that controls this behaviour:"warn"(default) — applies the smart defaults described above (np.nanfor Case 1,1.0for Case 2) and emits aUserWarning."raise"— preserves the legacyValueErrorbehaviour for users who want strict validation.0.0,np.nan,1.0) — fills all zero-scale entries with the given value regardless of the numerator, giving full control to the caller. Useful for automated pipelines whereNaNpropagation causes downstream failures.Design decision: no m=1 fallback
When the seasonal naive (m=m) produces zero scale, we do not silently fall back to the non-seasonal naive (m=1). Rationale: MASE with m=8 and MASE with m=1 measure against different baselines. If the user chose m=8, the zero scale tells them the seasonal naive is already perfect on this data — that's useful information. Silently switching denominators would change the metric's semantics and be hard to debug. The
zero_divisionparameter gives callers full control instead.Changes
darts/metrics/utils.py: Added_safe_scaled_divide()helper using a clean single-passnp.wherepattern for vectorized conditional division, replacing the previous hardraisein_get_error_scale.darts/metrics/metrics.py: Addedzero_divisionparameter toase,mase,sse,msse, andrmsse. Updated each to use_safe_scaled_divideinstead of raw division.darts/tests/metrics/test_metrics.py: Added comprehensivetest_scaled_errors_zero_divisioncovering all three modes, constant and seasonal insample series, 0/0 edge cases, and explicitnp.nanfill values. Updatedtest_seasonto verify the new default warning behaviour.Test plan
test_scaled_errors_zero_division— covers"warn","raise", and numeric fill for all 5 scaled metricstest_season— updated to verify warning is emitted by default and"raise"still raisespytest darts/tests/metrics/test_metrics.py