feat: implement single-GPU (24GB) INT4 QLoRA + offload-rollout + external PRM for OpenClaw-Combine and test#97
Open
wuwanchun wants to merge 2 commits intoGen-Verse:mainfrom
Open
feat: implement single-GPU (24GB) INT4 QLoRA + offload-rollout + external PRM for OpenClaw-Combine and test#97wuwanchun wants to merge 2 commits intoGen-Verse:mainfrom
wuwanchun wants to merge 2 commits intoGen-Verse:mainfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds support for running OpenClaw-Combine on a single 24 GB GPU using INT4 QLoRA. I’ve included new scripts, updated documentation, and made significant changes to the PRM API server to support external PRM scoring via any OpenAI-compatible API.
Single-GPU INT4 QLoRA Support
run_qwen3_4b_openclaw_combine_qlora_single_gpu_int4.sh, that enables full training and inference with QLoRA (NF4 quantization) on a single 24 GB GPU. This includes rollout offloading and integration with the external PRM API.--offload-rollout), sequence truncation control (SLIME_TRAIN_MAX_SEQ_LEN), and chunked log-prob/entropy computation (SLIME_LOGIT_CHUNK_SIZE).README.md,openclaw-combine/README.md, andopenclaw-test/README.mdto provide detailed setup, environment variables, and step-by-step instructions for single-GPU INT4 QLoRA workflows. [1] [2] [3] [4]External PRM API Integration
Single-GPU Evaluation Scripts
student_chat_single_gpu.pyandteacher_chat_single_gpu.py. These scripts include logic to automatically poll and resume when the rollout engine is offloaded for training, ensuring reliable evaluation in single-GPU colocate mode.What has been tested:
10 problems with a maximum of 8 turns per conversation, for both student chat and teacher chat running on a single RTX 4090 (24 GB).
Saving the INT4 model and optimizers.
With these changes, OpenClaw-RL is now much more accessible for users with a single high-memory GPU. It also enables seamless integration with external LLM APIs for reward modeling and evaluation.