Fix: Integer overflow in calculate_overlap_ratio (utils.py:248) a…#5007
Closed
albcunha wants to merge 22 commits intoPaddlePaddle:developfrom
Closed
Fix: Integer overflow in calculate_overlap_ratio (utils.py:248) a…#5007albcunha wants to merge 22 commits intoPaddlePaddle:developfrom
calculate_overlap_ratio (utils.py:248) a…#5007albcunha wants to merge 22 commits intoPaddlePaddle:developfrom
Conversation
…addlePaddle#4961) * bugfix: unexpected change of the constant IMAGE_LABELS * update doc
Co-authored-by: duqiemng <1640472053@qq.com>
* vllm 0.10.2 needs transformers 4.x * update
* fix(doc_vlm): cancel pending futures on batch request failure When a batch of requests is sent to the VLM service and one fails, the remaining pending futures are now properly cancelled to avoid wasting VLM service resources. * chore: remove test file and documentation for async cancellation fix
…ePaddle#4996) * Use cache mount for genai docker (PaddlePaddle#4954) * Fix HPS order bug (PaddlePaddle#4955) * Fix transformers version (PaddlePaddle#4956) * Fix HPS and remove scipy from required deps (PaddlePaddle#4957) * [Cherry-Pick]bugfix: unexpected change of the constant IMAGE_LABELS (PaddlePaddle#4961) * bugfix: unexpected change of the constant IMAGE_LABELS * update doc * [METAX] add ppdoclayv3 to METAX_GPU_WHITELIST (PaddlePaddle#4959) * vllm 0.10.2 needs transformers 4.x (PaddlePaddle#4963) * vllm 0.10.2 needs transformers 4.x * update * Bump version to 3.4.1 * Support setting PDF rendering scale factor (PaddlePaddle#4967) * Fix/doc vlm async cancellation (PaddlePaddle#4969) (PaddlePaddle#4971) * fix(doc_vlm): cancel pending futures on batch request failure When a batch of requests is sent to the VLM service and one fails, the remaining pending futures are now properly cancelled to avoid wasting VLM service resources. * chore: remove test file and documentation for async cancellation fix * Fix typo (PaddlePaddle#4982) * Revert "Fix typo (PaddlePaddle#4982)" This reverts commit 0a936ba. * feat(ROCm): Add ROCm 7.0 compatibility patches * version --------- Co-authored-by: Lin Manhui <bob1998425@hotmail.com> Co-authored-by: changdazhou <142379845+changdazhou@users.noreply.github.com> Co-authored-by: SuperNova <91192235+handsomecoderyang@users.noreply.github.com> Co-authored-by: duqiemng <1640472053@qq.com> Co-authored-by: zhang-prog <69562787+zhang-prog@users.noreply.github.com> Co-authored-by: Bobholamovic <mhlin425@whu.edu.cn> Co-authored-by: Bvicii <98971614+scyyh11@users.noreply.github.com>
* Support setting expiration for BOS URLs * Fix docs * Fix bugs
…t paddleocr ### What it does Update calculate_overlap_ratio at utils.py to use numpy types and functions ### Problem `calculate_overlap_ratio` in `paddlex/inference/pipelines/layout_parsing/utils.py` triggers a `RuntimeWarning: overflow encountered in scalar multiply` at line 248: ```python inter_area = inter_width * inter_height ``` ### Root cause The root cause is at lines 237-238: ```python bbox1 = np.array(bbox1) bbox2 = np.array(bbox2) ``` `np.array()` without an explicit `dtype` preserves the input's original type. When bounding boxes come from detection models as `int32` arrays, all subsequent arithmetic stays in `int32`, which maxes out at ~2.1 billion. Two moderately large bounding box dimensions multiplied together (e.g., 50000 × 50000 = 2.5 billion) exceed this limit, producing an overflow and incorrect overlap ratios. ### Fix Two changes in `calculate_overlap_ratio`: 1. **Cast inputs to `float64`** — prevents overflow in all downstream arithmetic: ```python bbox1 = np.array(bbox1, dtype=np.float64) bbox2 = np.array(bbox2, dtype=np.float64) ``` 2. **Use `np.multiply` with explicit dtype** — belt-and-suspenders on the exact line that overflows: ```python inter_area = np.multiply(inter_width, inter_height, dtype=np.float64) ``` ### Why `float64`? - `float64` is numpy's default float type and supports values up to ~1.8×10³⁰⁸ - The function returns a floating-point ratio (0.0–1.0), and `calculate_bbox_area` already uses `float` internally — `float64` keeps all arithmetic in one consistent type - `int64` would also prevent the overflow, but the intermediate values would be implicitly upcast to float at the division step anyway ### Impact This function is called by several other functions in the same module (`_get_minbox_if_overlap_by_ratio`, `remove_overlap_blocks`, `shrink_supplement_region_bbox`) and is imported directly by `xycut_enhanced/xycuts.py` via `from ..utils import calculate_overlap_ratio`. The fix is fully backward-compatible — the function signature, behavior, and return type are unchanged.
|
Thanks for your contribution! |
Author
|
This is a bug that is happening at paddleocr, when calling paddlex |
Author
|
Monkey patch while the pr is checked: |
Collaborator
|
Please submit the PR to the develop branch at first. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
…t paddleocr
What it does
Update calculate_overlap_ratio at utils.py to use numpy types and functions
Problem
calculate_overlap_ratioinpaddlex/inference/pipelines/layout_parsing/utils.pytriggers aRuntimeWarning: overflow encountered in scalar multiplyat line 248:Root cause
The root cause is at lines 237-238:
np.array()without an explicitdtypepreserves the input's original type. When bounding boxes come from detection models asint32arrays, all subsequent arithmetic stays inint32, which maxes out at ~2.1 billion. Two moderately large bounding box dimensions multiplied together (e.g., 50000 × 50000 = 2.5 billion) exceed this limit, producing an overflow and incorrect overlap ratios.Fix
Two changes in
calculate_overlap_ratio:float64— prevents overflow in all downstream arithmetic:np.multiplywith explicit dtype — belt-and-suspenders on the exact line that overflows:Why
float64?float64is numpy's default float type and supports values up to ~1.8×10³⁰⁸calculate_bbox_areaalready usesfloatinternally —float64keeps all arithmetic in one consistent typeint64would also prevent the overflow, but the intermediate values would be implicitly upcast to float at the division step anywayImpact
This function is called by several other functions in the same module (
_get_minbox_if_overlap_by_ratio,remove_overlap_blocks,shrink_supplement_region_bbox) and is imported directly byxycut_enhanced/xycuts.pyviafrom ..utils import calculate_overlap_ratio. The fix is fully backward-compatible — the function signature, behavior, and return type are unchanged.