When using LocalBackend (colocate mode), I noticed that the LoRA adapter was not being sync properly. If I killed the run after several steps and resumed from a checkpoint, the reward would climb up abruptly as ART was forced to reload the last checkpoint as the first/base model. Upon inspection, it seems that the adapter is correctly making its way to vLLM, but the OpenAI serving layer is not "seeing" the new adapter. As a result, rollouts (which are obtaining the current model using model.get_inference_name()) are just seeing/using the initial/base adapter (step @ 0) and so the whole training process falls apart (silently) due to the stale inference.
This seems to be a regression that may have affected multiple versions, because I do remember this working properly in last year's versions.
While I don't have the means to properly submit a PR at the moment, I wanted to share one possible solution here, in case it helps maintainers:
The problem arises from unsloth/service.py, where at the end of _train_shared() it adds the new adapter to vLLM, but it forgets to also register it within _openai_serving_models. A quick fix would be to add the following snippet right after the llm.add_lora() code block:
lora_request = LoRARequest(
lora_name=f"{self.model_name}@{new_step}",
lora_int_id=self._next_lora_id(),
lora_path=checkpoint_dir,
)
added = await llm.add_lora(lora_request)
if not added:
raise RuntimeError(f"Failed to add LoRA adapter for step {new_step} at {checkpoint_dir}")
# -- Patch here:
import art.vllm.server as _vllm_server_mod
serving_models = _vllm_server_mod._openai_serving_models
if serving_models is not None:
serving_models.lora_requests[lora_name] = lora_request
logger.info(
"Registered '%s' in OpenAI serving models registry", lora_name
)
else:
logger.warning(
"_openai_serving_models is None — LoRA loaded into vLLM "
"but NOT registered in the OpenAI serving layer. Inference requests "
"may still use the previous adapter."
)
# --
self._latest_step = new_step
Notes:
- Affected versions: I believe multiple versions are affected, but I personally discovered this with version 0.5.16.
- I acknowledge that this may not be the best place to fix it, but I've verified that ART works correctly after patching this.
- I don't know if train_sft is sensitive to this, but if so, it could be patched in exactly the same way I believe.
@bradhilton this is what we briefly discussed on Discord.
I hope it helps.
When using LocalBackend (colocate mode), I noticed that the LoRA adapter was not being sync properly. If I killed the run after several steps and resumed from a checkpoint, the reward would climb up abruptly as ART was forced to reload the last checkpoint as the first/base model. Upon inspection, it seems that the adapter is correctly making its way to vLLM, but the OpenAI serving layer is not "seeing" the new adapter. As a result, rollouts (which are obtaining the current model using model.get_inference_name()) are just seeing/using the initial/base adapter (step @ 0) and so the whole training process falls apart (silently) due to the stale inference.
This seems to be a regression that may have affected multiple versions, because I do remember this working properly in last year's versions.
While I don't have the means to properly submit a PR at the moment, I wanted to share one possible solution here, in case it helps maintainers:
The problem arises from unsloth/service.py, where at the end of _train_shared() it adds the new adapter to vLLM, but it forgets to also register it within _openai_serving_models. A quick fix would be to add the following snippet right after the llm.add_lora() code block:
Notes:
@bradhilton this is what we briefly discussed on Discord.
I hope it helps.