Skip to content

rollout阶段很慢 #88

@wizkdc

Description

@wizkdc

我用的是gui-rl,8*h20 96G,4卡推理4卡训练 跑的是/root/workspace/pw_2/OpenClaw-RL/gui-rl/gui_qwen35_4b_rl.sh,用的两个环境并行
(RolloutManager pid=3181587) [2026-04-16 20:09:15] generate_with_gui.py:40 - [GUI] action sample=0 step=3 action=pyautogui.click(440, 277) reward=0.0000 done=False
...
(RolloutManager pid=3181587) [2026-04-16 20:13:46] generate_with_gui.py:40 - [GUI] action sample=0 step=4 action=pyautogui.click(607, 265) reward=0.0000 done=False

可以看到两步之间隔了4分钟

一些pd的日志如下
(SGLangEngine pid=3186370) [2026-04-16 20:11:29] Prefill batch, #new-seq: 1, #new-token: 8192, #cached-token: 0, full token usage: 0.02, mamba usage: 0.01, #running-req: 0, #queue-req: 0, input throughput (token/s): 14.39, cuda graph: False
(SGLangEngine pid=3186373) [2026-04-16 20:11:55] INFO: 10.0.0.137:44000 - "GET /health HTTP/1.1" 200 OK [repeated 4x across cluster]
(SGLangEngine pid=3186373) [2026-04-16 20:12:55] INFO: 10.0.0.137:41748 - "GET /health HTTP/1.1" 200 OK [repeated 4x across cluster]
(SGLangEngine pid=3186370) [2026-04-16 20:13:39] Prefill batch, #new-seq: 2, #new-token: 8192, #cached-token: 0, full token usage: 0.05, mamba usage: 0.02, #running-req: 0, #queue-req: 0, input throughput (token/s): 63.15, cuda graph: False
(SGLangEngine pid=3186370) [2026-04-16 20:13:39] Prefill batch, #new-seq: 1, #new-token: 2252, #cached-token: 0, full token usage: 0.05, mamba usage: 0.02, #running-req: 1, #queue-req: 0, input throughput (token/s): 15753.25, cuda graph: False
(SGLangEngine pid=3186370) [2026-04-16 20:13:39] Decode batch, #running-req: 2, #full token: 16214, full token usage: 0.05, mamba num: 4, mamba usage: 0.02, cuda graph: True, gen throughput (token/s): 0.22, #queue-req: 0
(SGLangEngine pid=3186370) [2026-04-16 20:13:40] Decode batch, #running-req: 2, #full token: 16294, full token usage: 0.05, mamba num: 4, mamba usage: 0.02, cuda graph: True, gen throughput (token/s): 161.51, #queue-req: 0
(SGLangEngine pid=3186370) [2026-04-16 20:13:40] Decode batch, #running-req: 2, #full token: 16374, full token usage: 0.05, mamba num: 4, mamba usage: 0.02, cuda graph: True, gen throughput (token/s): 161.71, #queue-req: 0
(SGLangEngine pid=3186370) [2026-04-16 20:13:41] Decode batch, #running-req: 2, #full token: 16454, full token usage: 0.05, mamba num: 4, mamba usage: 0.02, cuda graph: True, gen throughput (token/s): 161.58, #queue-req: 0
(SGLangEngine pid=3186370) [2026-04-16 20:13:41] Decode batch, #running-req: 2, #full token: 16534, full token usage: 0.05, mamba num: 4, mamba usage: 0.02, cuda graph: True, gen throughput (token/s): 161.65, #queue-req: 0
(SGLangEngine pid=3186370) [2026-04-16 20:13:42] Decode batch, #running-req: 2, #full token: 16614, full token usage: 0.05, mamba num: 4, mamba usage: 0.02, cuda graph: True, gen throughput (token/s): 161.63, #queue-req: 0
(SGLangEngine pid=3186370) [2026-04-16 20:13:42] Decode batch, #running-req: 2, #full token: 16694, full token usage: 0.05, mamba num: 4, mamba usage: 0.02, cuda graph: True, gen throughput (token/s): 161.49, #queue-req: 0
(SGLangEngine pid=3186370) [2026-04-16 20:13:43] Decode batch, #running-req: 2, #full token: 16774, full token usage: 0.05, mamba num: 4, mamba usage: 0.02, cuda graph: True, gen throughput (token/s): 161.76, #queue-req: 0
(SGLangEngine pid=3186370) [2026-04-16 20:13:43] Decode batch, #running-req: 2, #full token: 16854, full token usage: 0.05, mamba num: 4, mamba usage: 0.02, cuda graph: True, gen throughput (token/s): 161.46, #queue-req: 0

想问下这个速度是正常的吗?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions