hfconfig中,vlm模型(下图是qwen2-vl)的llm部分的参数,包在了text_config中,导致llm相关key均无法正常获取,会导致后续功能异常。

[WARNING] [mcore_adapter.models.converter.template]: key='vocab_size' not exists in hf_config for get_hf_config_value
[WARNING] [mcore_adapter.models.converter.template]: key='intermediate_size' not exists in hf_config for get_hf_config_value
[WARNING] [mcore_adapter.models.converter.template]: key='attention_dropout' not exists in hf_config for get_hf_config_value
【xxxx此处省略多个key】
最终导致初始化异常
[rank1]: Traceback (most recent call last):
[rank1]: File "/yisu/LlamaFactory-main/src/llamafactory/launcher.py", line 185, in
[rank1]: run_exp()
[rank1]: File "/yisu/LlamaFactory-main/src/llamafactory/train/tuner.py", line 139, in run_exp
[rank1]: _training_function(config={"args": args, "callbacks": callbacks})
[rank1]: File "/yisu/LlamaFactory-main/src/llamafactory/train/tuner.py", line 98, in _training_function
[rank1]: run_sft_mca(model_args, data_args, training_args, finetuning_args, callbacks)
[rank1]: File "/yisu/LlamaFactory-main/src/llamafactory/train/mca/workflow.py", line 229, in run_sft
[rank1]: model = AutoModel.from_pretrained(model_args.model_name_or_path, training_args)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/opt/conda/lib/python3.12/site-packages/mcore_adapter/models/auto/modeling_auto.py", line 58, in from_pretrained
[rank1]: return model_class.from_pretrained(model_name_or_path, *args, **kwargs)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/opt/conda/lib/python3.12/site-packages/mcore_adapter/models/model_factory.py", line 213, in from_pretrained
[rank1]: config = cls.config_class.from_pretrained(model_name_or_path, args)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/opt/conda/lib/python3.12/site-packages/mcore_adapter/models/model_config.py", line 184, in from_pretrained
[rank1]: config.post_init()
[rank1]: File "/opt/conda/lib/python3.12/site-packages/mcore_adapter/models/model_config.py", line 48, in post_init
[rank1]: self.post_init()
[rank1]: File "/opt/conda/lib/python3.12/site-packages/mcore_adapter/models/qwen2_vl/config_qwen2_vl.py", line 28, in post_init
[rank1]: super().post_init()
[rank1]: File "/opt/conda/lib/python3.12/site-packages/mcore_adapter/models/model_config.py", line 379, in post_init
[rank1]: super().post_init()
[rank1]: File "/opt/conda/lib/python3.12/site-packages/megatron/core/transformer/transformer_config.py", line 934, in post_init
[rank1]: self.kv_channels = self.hidden_size // self.num_attention_heads
[rank1]: ~~~~~~~~~~~~~~~~~^^~~~~~~~~~~~~~~~~~~~~~~~~~
[rank1]: ZeroDivisionError: integer division or modulo by zero
纯LLM没有上述问题
hfconfig中,vlm模型(下图是qwen2-vl)的llm部分的参数,包在了text_config中,导致llm相关key均无法正常获取,会导致后续功能异常。

[WARNING] [mcore_adapter.models.converter.template]: key='vocab_size' not exists in hf_config for get_hf_config_value
[WARNING] [mcore_adapter.models.converter.template]: key='intermediate_size' not exists in hf_config for get_hf_config_value
[WARNING] [mcore_adapter.models.converter.template]: key='attention_dropout' not exists in hf_config for get_hf_config_value
【xxxx此处省略多个key】
最终导致初始化异常
[rank1]: Traceback (most recent call last):
[rank1]: File "/yisu/LlamaFactory-main/src/llamafactory/launcher.py", line 185, in
[rank1]: run_exp()
[rank1]: File "/yisu/LlamaFactory-main/src/llamafactory/train/tuner.py", line 139, in run_exp
[rank1]: _training_function(config={"args": args, "callbacks": callbacks})
[rank1]: File "/yisu/LlamaFactory-main/src/llamafactory/train/tuner.py", line 98, in _training_function
[rank1]: run_sft_mca(model_args, data_args, training_args, finetuning_args, callbacks)
[rank1]: File "/yisu/LlamaFactory-main/src/llamafactory/train/mca/workflow.py", line 229, in run_sft
[rank1]: model = AutoModel.from_pretrained(model_args.model_name_or_path, training_args)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/opt/conda/lib/python3.12/site-packages/mcore_adapter/models/auto/modeling_auto.py", line 58, in from_pretrained
[rank1]: return model_class.from_pretrained(model_name_or_path, *args, **kwargs)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/opt/conda/lib/python3.12/site-packages/mcore_adapter/models/model_factory.py", line 213, in from_pretrained
[rank1]: config = cls.config_class.from_pretrained(model_name_or_path, args)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/opt/conda/lib/python3.12/site-packages/mcore_adapter/models/model_config.py", line 184, in from_pretrained
[rank1]: config.post_init()
[rank1]: File "/opt/conda/lib/python3.12/site-packages/mcore_adapter/models/model_config.py", line 48, in post_init
[rank1]: self.post_init()
[rank1]: File "/opt/conda/lib/python3.12/site-packages/mcore_adapter/models/qwen2_vl/config_qwen2_vl.py", line 28, in post_init
[rank1]: super().post_init()
[rank1]: File "/opt/conda/lib/python3.12/site-packages/mcore_adapter/models/model_config.py", line 379, in post_init
[rank1]: super().post_init()
[rank1]: File "/opt/conda/lib/python3.12/site-packages/megatron/core/transformer/transformer_config.py", line 934, in post_init
[rank1]: self.kv_channels = self.hidden_size // self.num_attention_heads
[rank1]: ~~~~~~~~~~~~~~~~~^^~~~~~~~~~~~~~~~~~~~~~~~~~
[rank1]: ZeroDivisionError: integer division or modulo by zero
纯LLM没有上述问题