Skip to content

Add GLM5 SFT support#1844

Open
samaritan1998 wants to merge 1 commit intoTHUDM:mainfrom
samaritan1998:add-glm5-sft-support
Open

Add GLM5 SFT support#1844
samaritan1998 wants to merge 1 commit intoTHUDM:mainfrom
samaritan1998:add-glm5-sft-support

Conversation

@samaritan1998
Copy link
Copy Markdown

Summary

  • Add a GLM5-specific SFT loss mask type that follows GLM-style stop markers.
  • Add a GLM5 SFT launch script using sft_loss and the existing GLM5 Megatron model config.
  • Add unit coverage for multi-turn GLM5 masking, tool calls, and step_loss_mask handling.

Details

The GLM5 mask generation renders the full chat template, supervises assistant spans, and treats <|user|>, <|observation|>, and <|endoftext|> as stop boundaries. This mirrors ms-swift's GLM5/GLM4.7 behavior, where <|user|> is the suffix used to teach the model to stop instead of appending tokenizer EOS.

Validation

  • Ran GLM5 mask tests and adjacent Qwen3.5 mask tests via direct Python invocation with a lightweight transformers stub, because this local environment does not have pytest or transformers installed.
  • bash -n scripts/run-glm5-744B-A40B-sft.sh
  • git diff --check

Copy link
Copy Markdown
Author

@samaritan1998 samaritan1998 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Local validation passed for GLM5 loss mask behavior and script syntax.

@samaritan1998 samaritan1998 marked this pull request as ready for review April 20, 2026 04:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant