Skip to content

Comments

Fix zero/division safety gaps in utility and inference paths#7855

Draft
harshang03 wants to merge 1 commit intodeepspeedai:masterfrom
harshang03:fix/issue-7838-zero-guards
Draft

Fix zero/division safety gaps in utility and inference paths#7855
harshang03 wants to merge 1 commit intodeepspeedai:masterfrom
harshang03:fix/issue-7838-zero-guards

Conversation

@harshang03
Copy link

Describe your changes

  • Added a shared non-zero divisor validator and wired it into group divisibility checks and inference ceil_div.
  • Added strict steps_per_output validation in ThroughputTimer so invalid values fail early instead of triggering modulo-by-zero at runtime.
  • Hardened HPU FP8 dequantization to reject zero or non-finite scales before inverse-scale computation.
  • Added targeted regression tests for groups, timer, inference utils, and HPU quantizer scale validation.

Screenshot or video (only for visual changes)

  • N/A

GitHub Issue Link (if applicable)

Testing Plan

  • Explanation of why no additional tests are needed:
    • Added focused unit tests that directly cover each reported failure mode and guard path.
  • Unit Tests (JS and/or Python):
    • ./.venv/bin/python -m pytest tests/unit/utils/test_groups.py tests/unit/utils/test_timer.py tests/unit/inference/test_inference_utils.py tests/unit/ops/fp_quantizer/test_fp_quantizer_scale_validation.py
  • E2E Tests:
    • Not run (change is utility-level and covered by unit tests).
  • Any manual testing needed?:
    • No.

Contribution License Agreement
By submitting this pull request you agree that all contributions to this project are made under the Apache 2.0 license.

Add explicit validation for divisor inputs in groups and inference utilities, enforce valid throughput report intervals, and reject invalid HPU dequantization scales to avoid ZeroDivisionError and silent inf/nan propagation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant