我用的是gui-rl,8*h20 96G,4卡推理4卡训练 跑的是/root/workspace/pw_2/OpenClaw-RL/gui-rl/gui_qwen35_4b_rl.sh,用的两个环境并行
(RolloutManager pid=3181587) [2026-04-16 20:09:15] generate_with_gui.py:40 - [GUI] action sample=0 step=3 action=pyautogui.click(440, 277) reward=0.0000 done=False
...
(RolloutManager pid=3181587) [2026-04-16 20:13:46] generate_with_gui.py:40 - [GUI] action sample=0 step=4 action=pyautogui.click(607, 265) reward=0.0000 done=False
可以看到两步之间隔了4分钟
一些pd的日志如下
(SGLangEngine pid=3186370) [2026-04-16 20:11:29] Prefill batch, #new-seq: 1, #new-token: 8192, #cached-token: 0, full token usage: 0.02, mamba usage: 0.01, #running-req: 0, #queue-req: 0, input throughput (token/s): 14.39, cuda graph: False
(SGLangEngine pid=3186373) [2026-04-16 20:11:55] INFO: 10.0.0.137:44000 - "GET /health HTTP/1.1" 200 OK [repeated 4x across cluster]
(SGLangEngine pid=3186373) [2026-04-16 20:12:55] INFO: 10.0.0.137:41748 - "GET /health HTTP/1.1" 200 OK [repeated 4x across cluster]
(SGLangEngine pid=3186370) [2026-04-16 20:13:39] Prefill batch, #new-seq: 2, #new-token: 8192, #cached-token: 0, full token usage: 0.05, mamba usage: 0.02, #running-req: 0, #queue-req: 0, input throughput (token/s): 63.15, cuda graph: False
(SGLangEngine pid=3186370) [2026-04-16 20:13:39] Prefill batch, #new-seq: 1, #new-token: 2252, #cached-token: 0, full token usage: 0.05, mamba usage: 0.02, #running-req: 1, #queue-req: 0, input throughput (token/s): 15753.25, cuda graph: False
(SGLangEngine pid=3186370) [2026-04-16 20:13:39] Decode batch, #running-req: 2, #full token: 16214, full token usage: 0.05, mamba num: 4, mamba usage: 0.02, cuda graph: True, gen throughput (token/s): 0.22, #queue-req: 0
(SGLangEngine pid=3186370) [2026-04-16 20:13:40] Decode batch, #running-req: 2, #full token: 16294, full token usage: 0.05, mamba num: 4, mamba usage: 0.02, cuda graph: True, gen throughput (token/s): 161.51, #queue-req: 0
(SGLangEngine pid=3186370) [2026-04-16 20:13:40] Decode batch, #running-req: 2, #full token: 16374, full token usage: 0.05, mamba num: 4, mamba usage: 0.02, cuda graph: True, gen throughput (token/s): 161.71, #queue-req: 0
(SGLangEngine pid=3186370) [2026-04-16 20:13:41] Decode batch, #running-req: 2, #full token: 16454, full token usage: 0.05, mamba num: 4, mamba usage: 0.02, cuda graph: True, gen throughput (token/s): 161.58, #queue-req: 0
(SGLangEngine pid=3186370) [2026-04-16 20:13:41] Decode batch, #running-req: 2, #full token: 16534, full token usage: 0.05, mamba num: 4, mamba usage: 0.02, cuda graph: True, gen throughput (token/s): 161.65, #queue-req: 0
(SGLangEngine pid=3186370) [2026-04-16 20:13:42] Decode batch, #running-req: 2, #full token: 16614, full token usage: 0.05, mamba num: 4, mamba usage: 0.02, cuda graph: True, gen throughput (token/s): 161.63, #queue-req: 0
(SGLangEngine pid=3186370) [2026-04-16 20:13:42] Decode batch, #running-req: 2, #full token: 16694, full token usage: 0.05, mamba num: 4, mamba usage: 0.02, cuda graph: True, gen throughput (token/s): 161.49, #queue-req: 0
(SGLangEngine pid=3186370) [2026-04-16 20:13:43] Decode batch, #running-req: 2, #full token: 16774, full token usage: 0.05, mamba num: 4, mamba usage: 0.02, cuda graph: True, gen throughput (token/s): 161.76, #queue-req: 0
(SGLangEngine pid=3186370) [2026-04-16 20:13:43] Decode batch, #running-req: 2, #full token: 16854, full token usage: 0.05, mamba num: 4, mamba usage: 0.02, cuda graph: True, gen throughput (token/s): 161.46, #queue-req: 0
想问下这个速度是正常的吗?
我用的是gui-rl,8*h20 96G,4卡推理4卡训练 跑的是/root/workspace/pw_2/OpenClaw-RL/gui-rl/gui_qwen35_4b_rl.sh,用的两个环境并行
(RolloutManager pid=3181587) [2026-04-16 20:09:15] generate_with_gui.py:40 - [GUI] action sample=0 step=3 action=pyautogui.click(440, 277) reward=0.0000 done=False
...
(RolloutManager pid=3181587) [2026-04-16 20:13:46] generate_with_gui.py:40 - [GUI] action sample=0 step=4 action=pyautogui.click(607, 265) reward=0.0000 done=False
可以看到两步之间隔了4分钟
一些pd的日志如下
(SGLangEngine pid=3186370) [2026-04-16 20:11:29] Prefill batch, #new-seq: 1, #new-token: 8192, #cached-token: 0, full token usage: 0.02, mamba usage: 0.01, #running-req: 0, #queue-req: 0, input throughput (token/s): 14.39, cuda graph: False
(SGLangEngine pid=3186373) [2026-04-16 20:11:55] INFO: 10.0.0.137:44000 - "GET /health HTTP/1.1" 200 OK [repeated 4x across cluster]
(SGLangEngine pid=3186373) [2026-04-16 20:12:55] INFO: 10.0.0.137:41748 - "GET /health HTTP/1.1" 200 OK [repeated 4x across cluster]
(SGLangEngine pid=3186370) [2026-04-16 20:13:39] Prefill batch, #new-seq: 2, #new-token: 8192, #cached-token: 0, full token usage: 0.05, mamba usage: 0.02, #running-req: 0, #queue-req: 0, input throughput (token/s): 63.15, cuda graph: False
(SGLangEngine pid=3186370) [2026-04-16 20:13:39] Prefill batch, #new-seq: 1, #new-token: 2252, #cached-token: 0, full token usage: 0.05, mamba usage: 0.02, #running-req: 1, #queue-req: 0, input throughput (token/s): 15753.25, cuda graph: False
(SGLangEngine pid=3186370) [2026-04-16 20:13:39] Decode batch, #running-req: 2, #full token: 16214, full token usage: 0.05, mamba num: 4, mamba usage: 0.02, cuda graph: True, gen throughput (token/s): 0.22, #queue-req: 0
(SGLangEngine pid=3186370) [2026-04-16 20:13:40] Decode batch, #running-req: 2, #full token: 16294, full token usage: 0.05, mamba num: 4, mamba usage: 0.02, cuda graph: True, gen throughput (token/s): 161.51, #queue-req: 0
(SGLangEngine pid=3186370) [2026-04-16 20:13:40] Decode batch, #running-req: 2, #full token: 16374, full token usage: 0.05, mamba num: 4, mamba usage: 0.02, cuda graph: True, gen throughput (token/s): 161.71, #queue-req: 0
(SGLangEngine pid=3186370) [2026-04-16 20:13:41] Decode batch, #running-req: 2, #full token: 16454, full token usage: 0.05, mamba num: 4, mamba usage: 0.02, cuda graph: True, gen throughput (token/s): 161.58, #queue-req: 0
(SGLangEngine pid=3186370) [2026-04-16 20:13:41] Decode batch, #running-req: 2, #full token: 16534, full token usage: 0.05, mamba num: 4, mamba usage: 0.02, cuda graph: True, gen throughput (token/s): 161.65, #queue-req: 0
(SGLangEngine pid=3186370) [2026-04-16 20:13:42] Decode batch, #running-req: 2, #full token: 16614, full token usage: 0.05, mamba num: 4, mamba usage: 0.02, cuda graph: True, gen throughput (token/s): 161.63, #queue-req: 0
(SGLangEngine pid=3186370) [2026-04-16 20:13:42] Decode batch, #running-req: 2, #full token: 16694, full token usage: 0.05, mamba num: 4, mamba usage: 0.02, cuda graph: True, gen throughput (token/s): 161.49, #queue-req: 0
(SGLangEngine pid=3186370) [2026-04-16 20:13:43] Decode batch, #running-req: 2, #full token: 16774, full token usage: 0.05, mamba num: 4, mamba usage: 0.02, cuda graph: True, gen throughput (token/s): 161.76, #queue-req: 0
(SGLangEngine pid=3186370) [2026-04-16 20:13:43] Decode batch, #running-req: 2, #full token: 16854, full token usage: 0.05, mamba num: 4, mamba usage: 0.02, cuda graph: True, gen throughput (token/s): 161.46, #queue-req: 0
想问下这个速度是正常的吗?