Cumulative defect fixes from recent Transformers PRs#41
Open
Cumulative defect fixes from recent Transformers PRs#41
Conversation
…into fix/43242
…into fix/43242
Route EP through the standard (non-zero3) loading path when both EP and is_deepspeed_zero3_enabled() are active, then let deepspeed.initialize() wrap the EP-sharded model afterwards. - Add PreTrainedModel.has_ep property; use it in tp_plan - get_init_context: meta device for EP+DS (not zero.Init) - from_pretrained: clear device_map for EP+DS - _load_pretrained_model: skip zero3 path for EP+DS, pass model.tp_plan - _move_missing_keys_from_meta_to_device: do not early-return for EP+DS - _initialize_missing_keys: standard init (no GatheredParameters) for EP+DS - configuration_utils: strip distributed_config from serialized config
`generation_config.eos_token_id` can be `int | list[int]`, but the whisper long-form generation code compared it as a scalar in two places, causing silent wrong behavior or a RuntimeError. Normalize to a list and use membership checks instead of equality. Made-with: Cursor
Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
LoRA hotswapping was added in huggingface#41297. Due to changes in huggingface#43261, it stopped working. This PR restores the functionality. The tests already cover this and are failing, but probably no one noticed because they're slow tests. On main, they fail with mismatched sizes, which is expected as the padding of the LoRA weights is not being applied. With this PR, I can confirm that the tests pass locally. Since the two PRs were released in together in v5, there was never a Transformers release with working hotswapping functionality. Notes: The hotswap path does not use _load_pretrained_model, which means that loading the state_dict if not present is required. I hoisted that functionality from the TP path, which was already there, to re-use the same logic. I also apply weight renamings for that reason. Moreover, I moved the inference model logic to a local function, again to avoid duplicating the logic.
Owner
Author
All-defects flow statusProcessed terminal records: 134
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Cumulative defect fixes from recent Transformers PRs
This PR is generated by the all-defects mergeability flow. It accumulates defect-fix PRs from
huggingface/transformersthat could be applied cleanly to the current base.all-defectsevalstate/transformers:mainf342edbf6cStatus counts
Category counts
Validation
Each applied defect fix was followed by the configured lightweight validation profile:
compileall -q src/transformersutils/checkers.py ruff_check,ruff_format,init_isort,sort_auto_mappingsutils/tests_fetcher.py ... && pytest ...when impacted pytest targets are selectedNote: this is intentionally not an end-to-end or slow-test validation pass.
Details
A detailed status table is posted as a PR comment and is also available locally in:
.mergeability/defect-merge-state.jsonl.mergeability/pr-classifications.jsonlall-defects-report.md