feat: implement single-GPU (24GB) INT4 QLoRA + offload-rollout + external PRM for OpenClaw-Combine and test by wuwanchun · Pull Request #97 · Gen-Verse/OpenClaw-RL

wuwanchun · 2026-04-20T14:57:23Z

This PR adds support for running OpenClaw-Combine on a single 24 GB GPU using INT4 QLoRA. I’ve included new scripts, updated documentation, and made significant changes to the PRM API server to support external PRM scoring via any OpenAI-compatible API.

Single-GPU INT4 QLoRA Support

Added a new script, run_qwen3_4b_openclaw_combine_qlora_single_gpu_int4.sh, that enables full training and inference with QLoRA (NF4 quantization) on a single 24 GB GPU. This includes rollout offloading and integration with the external PRM API.
Added single-GPU memory management features: colocated training + rollout on one GPU, rollout offloading during training (--offload-rollout), sequence truncation control (SLIME_TRAIN_MAX_SEQ_LEN), and chunked log-prob/entropy computation (SLIME_LOGIT_CHUNK_SIZE).
Updated documentation in README.md, openclaw-combine/README.md, and openclaw-test/README.md to provide detailed setup, environment variables, and step-by-step instructions for single-GPU INT4 QLoRA workflows. [1] [2] [3] [4]

External PRM API Integration

Enhanced openclaw_opd_api_server.py to support PRM scoring via external OpenAI-compatible APIs. This includes new environment and config options, request logic, and result parsing. Local PRM is now optional—all PRM-related evaluations can be routed to an external service.

Single-GPU Evaluation Scripts

Added student_chat_single_gpu.py and teacher_chat_single_gpu.py. These scripts include logic to automatically poll and resume when the rollout engine is offloaded for training, ensuring reliable evaluation in single-GPU colocate mode.

What has been tested:

10 problems with a maximum of 8 turns per conversation, for both student chat and teacher chat running on a single RTX 4090 (24 GB).
Saving the INT4 model and optimizers.

With these changes, OpenClaw-RL is now much more accessible for users with a single high-memory GPU. It also enables seamless integration with external LLM APIs for reward modeling and evaluation.

… support PRM model via API

wuwanchun added 2 commits April 20, 2026 20:52

feat: Add QLoRA, test runner and offload-rollout for single-GPU 24GB;…

b69c2c2

… support PRM model via API

feat:Update README for low-rank train and test

bb20642

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: implement single-GPU (24GB) INT4 QLoRA + offload-rollout + external PRM for OpenClaw-Combine and test#97

feat: implement single-GPU (24GB) INT4 QLoRA + offload-rollout + external PRM for OpenClaw-Combine and test#97
wuwanchun wants to merge 2 commits intoGen-Verse:mainfrom
wuwanchun:main

wuwanchun commented Apr 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

wuwanchun commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

wuwanchun commented Apr 20, 2026 •

edited

Loading