Skip to content

Cumulative defect fixes from recent Transformers PRs#41

Open
evalstate wants to merge 219 commits intomainfrom
all-defects
Open

Cumulative defect fixes from recent Transformers PRs#41
evalstate wants to merge 219 commits intomainfrom
all-defects

Conversation

@evalstate
Copy link
Copy Markdown
Owner

Cumulative defect fixes from recent Transformers PRs

This PR is generated by the all-defects mergeability flow. It accumulates defect-fix PRs from huggingface/transformers that could be applied cleanly to the current base.

  • Source branch: all-defects
  • Base: evalstate/transformers:main
  • Head: f342edbf6c
  • PRs classified: 134
  • PRs with terminal state: 134
  • Applied/merged/already-present defect fixes: 65
  • Aborted defect fixes: 5
  • Validation failures reverted: 1
  • Non-defect skipped: 63

Status counts

  • aborted: 5
  • already_present: 6
  • applied: 1
  • merged: 58
  • skipped: 63
  • validation_failed: 1

Category counts

  • defect: 71
  • documentation: 12
  • feature: 30
  • other: 21

Validation

Each applied defect fix was followed by the configured lightweight validation profile:

  1. compileall -q src/transformers
  2. utils/checkers.py ruff_check,ruff_format,init_isort,sort_auto_mappings
  3. utils/tests_fetcher.py ... && pytest ... when impacted pytest targets are selected

Note: this is intentionally not an end-to-end or slow-test validation pass.

Details

A detailed status table is posted as a PR comment and is also available locally in:

  • .mergeability/defect-merge-state.jsonl
  • .mergeability/pr-classifications.jsonl
  • all-defects-report.md

jasiecky and others added 30 commits January 13, 2026 12:31
Route EP through the standard (non-zero3) loading path when both EP
and is_deepspeed_zero3_enabled() are active, then let deepspeed.initialize()
wrap the EP-sharded model afterwards.

- Add PreTrainedModel.has_ep property; use it in tp_plan
- get_init_context: meta device for EP+DS (not zero.Init)
- from_pretrained: clear device_map for EP+DS
- _load_pretrained_model: skip zero3 path for EP+DS, pass model.tp_plan
- _move_missing_keys_from_meta_to_device: do not early-return for EP+DS
- _initialize_missing_keys: standard init (no GatheredParameters) for EP+DS
- configuration_utils: strip distributed_config from serialized config
`generation_config.eos_token_id` can be `int | list[int]`, but the
whisper long-form generation code compared it as a scalar in two
places, causing silent wrong behavior or a RuntimeError. Normalize
to a list and use membership checks instead of equality.

Made-with: Cursor
Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
evalstate and others added 29 commits April 28, 2026 12:13
LoRA hotswapping was added in huggingface#41297. Due to changes in huggingface#43261, it
stopped working. This PR restores the functionality.

The tests already cover this and are failing, but probably no one
noticed because they're slow tests. On main, they fail with mismatched
sizes, which is expected as the padding of the LoRA weights is not being
applied. With this PR, I can confirm that the tests pass locally.

Since the two PRs were released in together in v5, there was never a
Transformers release with working hotswapping functionality.

Notes:

The hotswap path does not use _load_pretrained_model, which means that
loading the state_dict if not present is required. I hoisted that
functionality from the TP path, which was already there, to re-use the
same logic. I also apply weight renamings for that reason.

Moreover, I moved the inference model logic to a local function, again
to avoid duplicating the logic.
@evalstate
Copy link
Copy Markdown
Owner Author

All-defects flow status

Processed terminal records: 134

PR Category Status Method Validation Summary
huggingface#45679 other skipped TST Run fast PEFT tests in normal CI — not a defect fix
huggingface#45678 defect merged passed Fix shared config mutation issue in flash_attn_from_config — merged cleanly; validation commands exited 0
huggingface#45677 defect merged passed No serving in quality docker image — merged cleanly; validation commands exited 0
huggingface#45675 defect merged passed Fix UnboundLocalError in shard_and_distribute_module for replicated parameters — merged cleanly; validation commands exited 0
huggingface#45673 feature skipped Laguna XS.2 implementation — not a defect fix
huggingface#45671 defect merged passed Update latest revision for Phi-4-multimodal test — merged cleanly; validation commands exited 0
huggingface#45670 defect merged passed [nit] glmasr should be in AutoModelForMultimodalLM — merged cleanly; validation commands exited 0
huggingface#45669 documentation skipped zero_shot_object_detection ValueError fix for python 3.13 — not a defect fix
huggingface#45668 feature skipped [GGUF] Add support for Qwen3.5 MoE (qwen35moe arch) — not a defect fix
huggingface#45667 other skipped chore(typing): add ty type checking for 3 pipeline files — not a defect fix
huggingface#45681 defect merged passed Restore TokenizersBackend override for DeepSeek V3/R1 tokenizer dispatch — merged cleanly; validation commands exited 0
huggingface#45680 defect aborted change got reverted — overlaps with already merged huggingface#45681 DeepSeek tokenizer dispatch fix; conflicts in same AutoTokenizer logic and regression tests, so applying both would be duplicate/speculative
huggingface#45666 feature skipped Extended n-to-1 kernel fusion via KernelConfig — not a defect fix
huggingface#45665 defect merged passed Fix pageable H2D copies in Gated DeltaNet PyTorch fallback — merged cleanly; validation commands exited 0
huggingface#45664 documentation skipped Doc translate to Persian(farsi) — not a defect fix
huggingface#45662 defect merged passed Fix EP + FSDP2: experts silently overwritten by rank-0 broadcast — merged cleanly; validation commands exited 0
huggingface#45661 defect merged passed [Weight Converter] More fine-grained mappings on classes, scoping for every transforms (including weight converter) — merged cleanly; validation commands exited 0
huggingface#45660 documentation skipped [docs] cpu offloading — not a defect fix
huggingface#45659 documentation skipped [docs] dtype — not a defect fix
huggingface#45658 defect merged passed Fix NameError: PeftConfigLike triggered by PreTrainedModel.__init_subclass__ — merged cleanly; validation commands exited 0
huggingface#45655 defect merged passed Fix the order of cls.config resolution — merged cleanly; validation commands exited 0
huggingface#45654 feature skipped [CB] Refactor any model-related code in a separate class — not a defect fix
huggingface#45653 feature skipped [CB] Better overall script and decode bucketting — not a defect fix
huggingface#45652 defect merged passed Fix colmodernvbert tests — merged cleanly; validation commands exited 0
huggingface#45651 defect merged passed [Trainer] Optimize LengthGroupedSampler computation with select_columns and tqdm — merged cleanly; validation commands exited 0
huggingface#45650 defect merged passed Fix KeyError for flash_attn in import_utils.py on Python 3.13Fix — merged cleanly; validation commands exited 0
huggingface#45649 defect merged passed Fix OOM regression for FSDP2 + cpu_ram_efficient_loading on large models — merged cleanly; validation commands exited 0
huggingface#45648 defect merged passed Fix SDPA inference tolerances for MPS backend — merged cleanly; validation commands exited 0
huggingface#45645 defect merged passed Fix xdist collisions for captured_info artifacts and preserve CI debug logs — merged cleanly; validation commands exited 0
huggingface#45643 feature skipped Add DeepSeek V4 — not a defect fix
huggingface#45642 defect merged passed Fix trust_remote_code local cache collisions for local models (huggingface#45632) — fixed local trust_remote_code cache key collisions and validated cleanly
huggingface#45641 defect merged passed Fix NameError in serving CLI due to conditional import asymmetry — serving CLI conditional import NameError fix merged and validated cleanly
huggingface#45640 feature skipped 🚨🚨🚨 [Trainer] Default to FSDP2, simplify API around fsdp + fsdp_config — not a defect fix
huggingface#45639 defect aborted Make patched testing debug logs xdist-safe — codebase moved on: xdist captured_info handling was already merged via huggingface#45645 with different helper names and notification artifact aggregation; this PR conflicts in testing_u
huggingface#45638 feature skipped Add Multi-Token Prediction (MTP) support for Qwen3.5 — not a defect fix
huggingface#45637 feature skipped Add Multi-Token Prediction (MTP) support for Qwen3.5 — not a defect fix
huggingface#45635 other skipped qa: speed up dtype regex weight load + reduce dtype tests to 3 random — not a defect fix
huggingface#45634 feature skipped DeepGEMM BF16, isolation, refactor — not a defect fix
huggingface#45633 other skipped CircleCI with torch 2.11 — not a defect fix
huggingface#45631 other skipped chore: bump doc-builder SHA for main doc build workflow — not a defect fix
huggingface#45630 feature skipped Add new model: Kimi2-6 — not a defect fix
huggingface#45629 other skipped Allow more artifacts to be download in CI — not a defect fix
huggingface#45628 defect merged passed [MistralCommonBackend] Soften validation mode and apply_chat_template arguments check — Mistral common backend validation softening merged cleanly; validation commands exited 0
huggingface#45627 defect merged passed Processing Utils: honor pre-built sub-processor kwargs in from_pretrained — AutoProcessor prebuilt sub-processor kwarg fix merged cleanly; validation commands exited 0
huggingface#45626 feature skipped [Model] Add PP-FormulaNet Model Support — not a defect fix
huggingface#45625 defect merged passed Add supports_gradient_checkpointing to NemotronHPreTrainedModel — NemotronH gradient checkpointing support flag merged cleanly; validation commands exited 0
huggingface#45624 defect merged passed Skip failing offloading tests — Gemma4 offloading test skip merged cleanly; validation commands exited 0
huggingface#45623 other skipped Glm5 change — not a defect fix
huggingface#45622 defect merged passed Fix peft constructors — PEFT constructor fix merged cleanly; validation commands exited 0
huggingface#45621 feature skipped Better Grouped GEMM + EP — not a defect fix
huggingface#45620 defect aborted Fix TypeError in video_processor_class_from_name when torchvision is not installed — codebase moved on: PR branch appears based on an obsolete fork and replays broad repository changes while only intending a video proces
huggingface#45619 defect merged passed Remove unnecessary generate warnings — generation warning cleanup was already present in cumulative branch; validation commands exited 0
huggingface#45618 feature skipped Add MTP speculative decoding via MTPCandidateGenerator — not a defect fix
huggingface#45617 feature skipped Add Multi-Token Prediction (MTP) inference support — not a defect fix
huggingface#45616 feature skipped Add DeepSeek V4 — not a defect fix
huggingface#45615 defect merged passed fix(qianfan_ocr): add XPU expectations — Qianfan OCR XPU test expectation fix merged cleanly; validation commands exited 0
huggingface#45614 defect validation_failed failed Add missing requests dependency to transformers[serving] — ruff_format failed after merge: setup.py would be reformatted
huggingface#45613 feature skipped [New Model] Add MiniCPM3 support — not a defect fix
huggingface#45612 documentation skipped [docs] update model cards — not a defect fix
huggingface#45611 defect merged passed Raise clear error for problem_type="single_label_classification" with num_labels=1 — invalid single-label classification config error merged cleanly; validation commands exited 0
huggingface#45610 defect aborted Fix configuration reading and error handling for kernels — codebase moved on: conversion_mapping registry has since gained Laguna and generated mapping changes around qwen3_5_moe_text/pp_chart2table; PR intent is small F
huggingface#45609 feature skipped make it possible to ser/deser HF MoE models with torchao — not a defect fix
huggingface#45608 documentation skipped Python code in model docs — not a defect fix
huggingface#45607 other skipped Add regression test for Gemma4 audio relative positional range — not a defect fix
huggingface#45604 feature skipped Agent first cli with skill — not a defect fix
huggingface#45606 defect merged passed [gemma4] infer from config instead of hardcoding — clean merge; validation commands passed
huggingface#45605 defect merged passed Processing Utils: continue when content is a string — clean merge; validation commands passed
huggingface#45603 defect merged passed [Auto] Pass kwargs to fixed_cross_entropy (cluster-43240-3): merged 2 of 2 PRs — clean merge of clustered fixed_cross_entropy kwargs fix; validation commands passed
huggingface#45602 defect merged passed [AMD CI] Fix expectations for Gemma3n — clean merge; validation commands passed
huggingface#45601 defect merged passed fix: compute auxiliary losses when denoising is disabled in D-FINE — clean merge; validation commands passed
huggingface#45599 defect merged passed qa: more lazy loading — clean merge; validation commands passed
huggingface#45682 defect merged passed FIX Restore LoRA hotswapping functionality — LoRA hotswapping fix merged cleanly; validation commands exited 0
huggingface#45681 defect merged passed Restore TokenizersBackend override for DeepSeek V3/R1 tokenizer dispatch — TokenizersBackend dispatch fix was already present in cumulative branch; validation commands exited 0
huggingface#45680 defect aborted change got reverted — codebase moved on: tokenizer regression is already covered by the newer 45681 change in the cumulative branch; this older alternative conflicts in AutoTokenizer dispatch and overlapping auto tokeniz
huggingface#45679 other skipped TST Run fast PEFT tests in normal CI — not a defect fix
huggingface#45678 defect merged passed Fix shared config mutation issue in flash_attn_from_config — shared config mutation fix was already present in cumulative branch; validation commands exited 0
huggingface#45677 other skipped No serving in quality docker image — not a defect fix
huggingface#45675 defect merged passed Fix UnboundLocalError in shard_and_distribute_module for replicated parameters — tensor-parallel UnboundLocalError fix was already present in cumulative branch; validation commands exited 0
huggingface#45673 feature skipped Laguna XS.2 implementation — not a defect fix
huggingface#45671 defect merged passed Update latest revision for Phi-4-multimodal test — Phi-4 multimodal test revision fix was already present in cumulative branch; validation commands exited 0
huggingface#45670 defect merged passed [nit] glmasr should be in AutoModelForMultimodalLM — auto-model mapping fix was already present in cumulative branch; validation commands exited 0
huggingface#45598 defect merged merge passed Align latest model attention function dispatch — attention dispatch alignment merged cleanly; validation commands exited 0 with no pytest targets selected
huggingface#45597 feature skipped Add Granite 4.1 Vision (granite4_vision) — not a defect fix
huggingface#45596 defect merged merge passed fix 2 failed test cases for blt model on XPU — BLT XPU test expectation fix merged cleanly; validation commands exited 0 with no pytest targets selected
huggingface#45595 feature skipped Add unified Cache-layer management for GLM-5 DSA Indexer keys — not a defect fix
huggingface#45594 defect merged merge passed fix(utils): Resolve backbone utils test regressions — backbone utility test regression fix merged cleanly; validation commands exited 0 with no pytest targets selected
huggingface#45592 defect merged merge passed fix padding side issue for fast_vlm tests — fast_vlm padding-side test fix merged cleanly; validation commands exited 0 with no pytest targets selected
huggingface#45591 defect merged merge passed [nemotron_h] respect _no_reinit flag on dt_bias and out_proj.weight — Nemotron-H _no_reinit guard fix merged cleanly; validation commands exited 0 with no pytest targets selected
huggingface#45590 defect already_present none passed fix huggingface#45588: guard s_aux against None in flash_attention_forward — s_aux None guard was already present in cumulative branch with equivalent formatting; attempted merge conflicted only on that already-applied line and was
huggingface#45589 defect merged merge passed Fix AttributeError on s_aux=None in flash_attention_forward — flash_attention s_aux None guard merged cleanly as an empty tree merge because equivalent fix was already present; validation commands exited 0
huggingface#45587 documentation skipped [docs] cb memory management — not a defect fix
huggingface#45586 feature skipped Add Audio-Visual Flamingo model — not a defect fix
huggingface#45585 other skipped qa: bumped mlinter and allow local override — not a defect fix
huggingface#45583 other skipped Update dev version — not a defect fix
huggingface#45582 defect merged merge passed generate: drop stale num_return_sequences warning on continuous batching path — stale continuous-batching num_return_sequences warning fix merged cleanly; validation commands exited 0 with no pytest targets selected
huggingface#45581 other skipped Add automated reviewer assignment script — not a defect fix
huggingface#45580 feature skipped [Privacy Filter] Add model — not a defect fix
huggingface#45579 other skipped Update assign_reviewers.py — not a defect fix
huggingface#45581 other skipped Add automated reviewer assignment script — not a defect fix
huggingface#45580 feature skipped [Privacy Filter] Add model — not a defect fix
huggingface#45579 other skipped Update assign_reviewers.py — not a defect fix
huggingface#45578 defect merged merge passed Remove attribute_map from GptOssConfig — GptOssConfig attribute_map removal merged cleanly; validation commands exited 0 with no pytest targets selected
huggingface#45577 defect merged merge passed Allow for registered experts from kernels hub — registered expert kernel loading fix merged cleanly; validation commands exited 0 with no pytest targets selected
huggingface#45576 documentation skipped docs(pipeline): fix num_workers docstring default from 8 to 0 — not a defect fix
huggingface#45575 defect already_present none passed fix(generation): remove stale warning for num_return_sequences in paged generate — equivalent stale num_return_sequences warning removal is already present from the earlier cumulative generation fix
huggingface#45574 feature skipped Fix typos — not a defect fix
huggingface#45573 defect merged merge passed fix transformers + torchao nvfp4 serialization — merged torchao NVFP4 serialization fix cleanly; validation commands exited 0 with no pytest targets selected
huggingface#45572 other skipped refactor(Dots1): drop Dots1MoE override to pass (inherits from DSV3 MoE) — not a defect fix
huggingface#45570 defect merged merge passed Fix whisper long-form generation when eos_token_id is a list — merged Whisper eos_token_id list handling fix cleanly; validation commands exited 0 with no pytest targets selected
huggingface#45569 feature skipped Proper nemotron H and 3 and 2 — not a defect fix
huggingface#45568 defect merged merge passed Gemma4: fix failed test cases — merged Gemma4 failing-test fixes cleanly; validation commands exited 0 with no pytest targets selected
huggingface#45567 other skipped Move some conversion mappings to PrefixChange — not a defect fix
huggingface#45566 defect merged merge passed fix: raise clear error when tokenizer config uses v5 list format on older versions — merged tokenizer v5 extra_special_tokens list guard cleanly; validation commands exited 0 with no pytest targets selected
huggingface#45565 defect already_present none passed fix: remove stale num_return_sequences warning in paged generate — equivalent stale num_return_sequences warning removal is already present from the earlier cumulative generation fix
huggingface#45564 defect merged merge passed Gemma3n and Gemma4 cannot use rotary kernel — Gemma3n/Gemma4 rotary kernel disablement merged cleanly; validation commands exited 0 with no pytest targets selected
huggingface#45562 defect merged merge passed Updated the image cache for Paddle models according to the latest API — Paddle model cached image/API test updates merged cleanly; validation commands exited 0 with no pytest targets selected
huggingface#45560 documentation skipped Update torchao usage for XPU and CPU — not a defect fix
huggingface#45559 defect already_present none passed Drop noisy generate warnings when do_sample=False (or num_beams=1) — generation config user-set-attribute warning suppression is already present in a refined form; direct PR head also includes many unrelated branch diffe
huggingface#45558 feature skipped feat(trainer): log individual losses from loss_dict — not a defect fix
huggingface#45556 documentation skipped Add image processors refactor to v5 migration guide — not a defect fix
huggingface#45555 other skipped perf: avoid recomputing rotary_emb for each layer in some Google and ModernBERT models — not a defect fix
huggingface#45554 documentation skipped [docs] multi-turn tool calling — not a defect fix
huggingface#45553 documentation skipped [docs] per-request sampling params — not a defect fix
huggingface#45552 defect applied patch passed Remove warnings for modernbert — applied minimal auto_docstring modular file inclusion to suppress ModernBERT import warnings; validation commands exited 0 with no pytest targets selected
huggingface#45551 feature skipped Add ForSequenceClassification heads for the OLMo family — not a defect fix
huggingface#45550 other skipped Add runner selection for mi325 GPU type — not a defect fix
huggingface#45549 defect merged merge passed fix: apply channel averaging correctly in audio feature extractors — merged audio channel-averaging axis fixes cleanly; validation commands exited 0 with no pytest targets selected
huggingface#45548 defect merged merge passed Fix EP + DeepSpeed ZeRO-3 loading via accelerate launch — merged EP plus DeepSpeed ZeRO-3 loading guard cleanly; validation commands exited 0 with no pytest targets selected
huggingface#45547 defect already_present none passed Add disable_mmap kwarg to from_pretrained with hf-mount auto-detection — disable_mmap and hf-mount safetensors fallback are already present in the cumulative branch; direct merge only conflicted on the same load_state_di
huggingface#45546 feature skipped feat: Add GGUF loading support for Llama 4 (text) — not a defect fix
huggingface#45544 defect already_present none passed fix table update versions — Python 3.10 patch-version table-update condition is already present; direct merge conflicted only with newer transformers-mlinter dependency version 0.1.1
huggingface#45543 other skipped ci: OTEL support — not a defect fix
huggingface#45541 defect merged merge passed Fix local_files_only tokenizer fallback when tokenizer files are missing (Issue 45538) — merged local_files_only tokenizer fallback fix cleanly; validation commands exited 0 with no pytest targets selected
huggingface#45540 defect merged merge passed Fix cross-attention cache layer type for T5Gemma2 long inputs — merged T5Gemma2 cross-attention cache fix cleanly; tree already had equivalent contents and validation commands exited 0 with no pytest targets selected

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.