Skip to content

feat: implement single-GPU (24GB) INT4 QLoRA + offload-rollout + external PRM for OpenClaw-Combine and test#97

Open
wuwanchun wants to merge 2 commits intoGen-Verse:mainfrom
wuwanchun:main
Open

feat: implement single-GPU (24GB) INT4 QLoRA + offload-rollout + external PRM for OpenClaw-Combine and test#97
wuwanchun wants to merge 2 commits intoGen-Verse:mainfrom
wuwanchun:main

Conversation

@wuwanchun
Copy link
Copy Markdown

@wuwanchun wuwanchun commented Apr 20, 2026

This PR adds support for running OpenClaw-Combine on a single 24 GB GPU using INT4 QLoRA. I’ve included new scripts, updated documentation, and made significant changes to the PRM API server to support external PRM scoring via any OpenAI-compatible API.

Single-GPU INT4 QLoRA Support

  • Added a new script, run_qwen3_4b_openclaw_combine_qlora_single_gpu_int4.sh, that enables full training and inference with QLoRA (NF4 quantization) on a single 24 GB GPU. This includes rollout offloading and integration with the external PRM API.
  • Added single-GPU memory management features: colocated training + rollout on one GPU, rollout offloading during training (--offload-rollout), sequence truncation control (SLIME_TRAIN_MAX_SEQ_LEN), and chunked log-prob/entropy computation (SLIME_LOGIT_CHUNK_SIZE).
  • Updated documentation in README.md, openclaw-combine/README.md, and openclaw-test/README.md to provide detailed setup, environment variables, and step-by-step instructions for single-GPU INT4 QLoRA workflows. [1] [2] [3] [4]

External PRM API Integration

  • Enhanced openclaw_opd_api_server.py to support PRM scoring via external OpenAI-compatible APIs. This includes new environment and config options, request logic, and result parsing. Local PRM is now optional—all PRM-related evaluations can be routed to an external service.

Single-GPU Evaluation Scripts

  • Added student_chat_single_gpu.py and teacher_chat_single_gpu.py. These scripts include logic to automatically poll and resume when the rollout engine is offloaded for training, ensuring reliable evaluation in single-GPU colocate mode.

What has been tested:

  • 10 problems with a maximum of 8 turns per conversation, for both student chat and teacher chat running on a single RTX 4090 (24 GB).

  • Saving the INT4 model and optimizers.

With these changes, OpenClaw-RL is now much more accessible for users with a single high-memory GPU. It also enables seamless integration with external LLM APIs for reward modeling and evaluation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant