这是运行脚本
run_qwen3_4b_openclaw_opd_topk_lora.sh
直接运行会无法启动主的推理服务
(SGLangEngine pid=3080582) [2026-04-23 11:02:56] INFO: Started server process [3080920]
(SGLangEngine pid=3080582) [2026-04-23 11:02:56] INFO: Waiting for application startup.
(SGLangEngine pid=3080582) [2026-04-23 11:02:56] Using default chat sampling params from model generation config: {'repetition_penalty': 1.0, 'temperature': 0.6, 'top_k': 20, 'top_p': 0.95}
(SGLangEngine pid=3080582) thread '' (3080920) panicked at /root/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/rayon-core-1.13.0/src/registry.rs:171:10:
(SGLangEngine pid=3080582) The global thread pool has not been initialized. ThreadPoolBuildError { kind: IOError(Os { code: 11, kind: WouldBlock, message: "Resource temporarily unavailable" }) }
(SGLangEngine pid=3080582) note: run with RUST_BACKTRACE=1 environment variable to display a backtrace
(SGLangEngine pid=3080582) [2026-04-23 11:02:56] ERROR: Traceback (most recent call last):
(SGLangEngine pid=3080582) File "/var/ai-cloud/project/qwenpaw-rl-dev/lib/python3.12/site-packages/starlette/routing.py", line 694, in lifespan
(SGLangEngine pid=3080582) async with self.lifespan_context(app) as maybe_state:
(SGLangEngine pid=3080582) File "/var/ai-cloud/project/python3.12/lib/python3.12/contextlib.py", line 210, in aenter
(SGLangEngine pid=3080582) return await anext(self.gen)
(SGLangEngine pid=3080582) File "/var/ai-cloud/project/qwenpaw-rl-dev/lib/python3.12/site-packages/fastapi/routing.py", line 201, in merged_lifespan
(SGLangEngine pid=3080582) async with original_context(app) as maybe_original_state:
(SGLangEngine pid=3080582) File "/var/ai-cloud/project/python3.12/lib/python3.12/contextlib.py", line 210, in aenter
(SGLangEngine pid=3080582) return await anext(self.gen)
(SGLangEngine pid=3080582) File "/var/ai-cloud/project/zhongkaipeng/qwenpaw_train/own_train_packages/sglang-d566816d838ce92d3ae044209f7d67eaa58ce74a/python/sglang/srt/entrypoints/http_server.py", line 308, in lifespan
(SGLangEngine pid=3080582) fast_api_app.state.openai_serving_rerank = OpenAIServingRerank(
(SGLangEngine pid=3080582) File "/var/ai-cloud/project/zhongkaipeng/qwenpaw_train/own_train_packages/sglang-d566816d838ce92d3ae044209f7d67eaa58ce74a/python/sglang/srt/entrypoints/openai/serving_rerank.py", line 212, in init
(SGLangEngine pid=3080582) self._yes_token_id, self._no_token_id = _get_yes_no_token_ids(
(SGLangEngine pid=3080582) File "/var/ai-cloud/project/zhongkaipeng/qwenpaw_train/own_train_packages/sglang-d566816d838ce92d3ae044209f7d67eaa58ce74a/python/sglang/srt/entrypoints/openai/serving_rerank.py", line 30, in _get_yes_no_token_ids
(SGLangEngine pid=3080582) yes_tokens = tokenizer.encode("yes", add_special_tokens=False)
(SGLangEngine pid=3080582) File "/var/ai-cloud/project/qwenpaw-rl-dev/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 2732, in encode
(SGLangEngine pid=3080582) encoded_inputs = self.encode_plus(
(SGLangEngine pid=3080582) File "/var/ai-cloud/project/qwenpaw-rl-dev/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 3123, in encode_plus
(SGLangEngine pid=3080582) return self._encode_plus(
(SGLangEngine pid=3080582) File "/var/ai-cloud/project/qwenpaw-rl-dev/lib/python3.12/site-packages/transformers/tokenization_utils_fast.py", line 627, in _encode_plus
(SGLangEngine pid=3080582) batched_output = self._batch_encode_plus(
(SGLangEngine pid=3080582) File "/var/ai-cloud/project/qwenpaw-rl-dev/lib/python3.12/site-packages/transformers/tokenization_utils_fast.py", line 553, in _batch_encode_plus
(SGLangEngine pid=3080582) encodings = self._tokenizer.encode_batch(
(SGLangEngine pid=3080582) pyo3_runtime.PanicException: The global thread pool has not been initialized. ThreadPoolBuildError { kind: IOError(Os { code: 11, kind: WouldBlock, message: "Resource temporarily unavailable" }) }
(SGLangEngine pid=3080582) [2026-04-23 11:02:56] ERROR: Application startup failed. Exiting.
如果加上一句话:
export TOKENIZERS_PARALLELISM=false
那么脚本可以正常启动,但是每次qwenpaw调工具都会导致prm报错cuda oom,并且崩掉一个显卡的prm推理服务。
这是运行脚本
run_qwen3_4b_openclaw_opd_topk_lora.sh
直接运行会无法启动主的推理服务
(SGLangEngine pid=3080582) [2026-04-23 11:02:56] INFO: Started server process [3080920]
(SGLangEngine pid=3080582) [2026-04-23 11:02:56] INFO: Waiting for application startup.
(SGLangEngine pid=3080582) [2026-04-23 11:02:56] Using default chat sampling params from model generation config: {'repetition_penalty': 1.0, 'temperature': 0.6, 'top_k': 20, 'top_p': 0.95}
(SGLangEngine pid=3080582) thread '' (3080920) panicked at /root/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/rayon-core-1.13.0/src/registry.rs:171:10:
(SGLangEngine pid=3080582) The global thread pool has not been initialized. ThreadPoolBuildError { kind: IOError(Os { code: 11, kind: WouldBlock, message: "Resource temporarily unavailable" }) }
(SGLangEngine pid=3080582) note: run with
RUST_BACKTRACE=1environment variable to display a backtrace(SGLangEngine pid=3080582) [2026-04-23 11:02:56] ERROR: Traceback (most recent call last):
(SGLangEngine pid=3080582) File "/var/ai-cloud/project/qwenpaw-rl-dev/lib/python3.12/site-packages/starlette/routing.py", line 694, in lifespan
(SGLangEngine pid=3080582) async with self.lifespan_context(app) as maybe_state:
(SGLangEngine pid=3080582) File "/var/ai-cloud/project/python3.12/lib/python3.12/contextlib.py", line 210, in aenter
(SGLangEngine pid=3080582) return await anext(self.gen)
(SGLangEngine pid=3080582) File "/var/ai-cloud/project/qwenpaw-rl-dev/lib/python3.12/site-packages/fastapi/routing.py", line 201, in merged_lifespan
(SGLangEngine pid=3080582) async with original_context(app) as maybe_original_state:
(SGLangEngine pid=3080582) File "/var/ai-cloud/project/python3.12/lib/python3.12/contextlib.py", line 210, in aenter
(SGLangEngine pid=3080582) return await anext(self.gen)
(SGLangEngine pid=3080582) File "/var/ai-cloud/project/zhongkaipeng/qwenpaw_train/own_train_packages/sglang-d566816d838ce92d3ae044209f7d67eaa58ce74a/python/sglang/srt/entrypoints/http_server.py", line 308, in lifespan
(SGLangEngine pid=3080582) fast_api_app.state.openai_serving_rerank = OpenAIServingRerank(
(SGLangEngine pid=3080582) File "/var/ai-cloud/project/zhongkaipeng/qwenpaw_train/own_train_packages/sglang-d566816d838ce92d3ae044209f7d67eaa58ce74a/python/sglang/srt/entrypoints/openai/serving_rerank.py", line 212, in init
(SGLangEngine pid=3080582) self._yes_token_id, self._no_token_id = _get_yes_no_token_ids(
(SGLangEngine pid=3080582) File "/var/ai-cloud/project/zhongkaipeng/qwenpaw_train/own_train_packages/sglang-d566816d838ce92d3ae044209f7d67eaa58ce74a/python/sglang/srt/entrypoints/openai/serving_rerank.py", line 30, in _get_yes_no_token_ids
(SGLangEngine pid=3080582) yes_tokens = tokenizer.encode("yes", add_special_tokens=False)
(SGLangEngine pid=3080582) File "/var/ai-cloud/project/qwenpaw-rl-dev/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 2732, in encode
(SGLangEngine pid=3080582) encoded_inputs = self.encode_plus(
(SGLangEngine pid=3080582) File "/var/ai-cloud/project/qwenpaw-rl-dev/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 3123, in encode_plus
(SGLangEngine pid=3080582) return self._encode_plus(
(SGLangEngine pid=3080582) File "/var/ai-cloud/project/qwenpaw-rl-dev/lib/python3.12/site-packages/transformers/tokenization_utils_fast.py", line 627, in _encode_plus
(SGLangEngine pid=3080582) batched_output = self._batch_encode_plus(
(SGLangEngine pid=3080582) File "/var/ai-cloud/project/qwenpaw-rl-dev/lib/python3.12/site-packages/transformers/tokenization_utils_fast.py", line 553, in _batch_encode_plus
(SGLangEngine pid=3080582) encodings = self._tokenizer.encode_batch(
(SGLangEngine pid=3080582) pyo3_runtime.PanicException: The global thread pool has not been initialized. ThreadPoolBuildError { kind: IOError(Os { code: 11, kind: WouldBlock, message: "Resource temporarily unavailable" }) }
(SGLangEngine pid=3080582) [2026-04-23 11:02:56] ERROR: Application startup failed. Exiting.
如果加上一句话:
export TOKENIZERS_PARALLELISM=false
那么脚本可以正常启动,但是每次qwenpaw调工具都会导致prm报错cuda oom,并且崩掉一个显卡的prm推理服务。