Reminder
System Info
在4090 的机器上启动approachingai/ktransformers:v0.5.3 docker 镜像. 根据文档 https://github.com/kvcache-ai/ktransformers/blob/main/kt-kernel/README.md#complete-example-qwen3-30b-a3b 启动 Qwen3-30B-A3B,
具体而言, 命令是:
python -m sglang.launch_server
--host 0.0.0.0
--port 30000
--model /host_root/models/Qwen3-30B-A3B
--kt-weight-path /host_root/models/Qwen3-30B-A3B
--kt-cpuinfer 64
--kt-threadpool-count 2
--kt-num-gpu-experts 32
--kt-method BF16
--attention-backend flashinfer
--trust-remote-code
--mem-fraction-static 0.80
--chunked-prefill-size 16384
--max-running-requests 4
--served-model-name Qwen3
--enable-mixed-chunk
--tensor-parallel-size 1
--enable-p2p-check
--disable-shared-experts-fusion
--kt-gpu-prefill-token-threshold 4096
--kt-enable-dynamic-expert-update
大模型幻觉严重,基本不知所云.
使用sglang accuracy test 工具, 运行:
python -m sglang.test.run_eval --eval-name mmlu --port 30000 --num-examples 10 --max-tokens 8192
10 道题总得分为0. 部分输出如下:
——————————————————————————————
我机器配置如下:
nvidia-smi
Thu Apr 23 06:15:21 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 575.51.03 Driver Version: 575.51.03 CUDA Version: 12.9 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4090 On | 00000000:65:00.0 Off | Off |
| 44% 29C P8 9W / 450W | 22717MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA GeForce RTX 4090 On | 00000000:65:01.0 Off | Off |
| 45% 31C P8 21W / 450W | 3MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA GeForce RTX 4090 On | 00000000:67:00.0 Off | Off |
| 45% 29C P8 14W / 450W | 3MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 3 NVIDIA GeForce RTX 4090 On | 00000000:67:01.0 Off | Off |
| 44% 27C P8 15W / 450W | 3MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 4 NVIDIA GeForce RTX 4090 On | 00000000:69:00.0 Off | Off |
| 45% 29C P8 8W / 450W | 3MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 5 NVIDIA GeForce RTX 4090 On | 00000000:69:01.0 Off | Off |
| 44% 28C P8 26W / 450W | 3MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 6 NVIDIA GeForce RTX 4090 On | 00000000:6B:00.0 Off | Off |
| 44% 26C P8 29W / 450W | 3MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 7 NVIDIA GeForce RTX 4090 On | 00000000:6B:01.0 Off | Off |
| 43% 28C P8 12W / 450W | 3MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
————————————————————————————————————————————————————
lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 52 bits physical, 57 bits virtual
CPU(s): 180
On-line CPU(s) list: 0-179
Thread(s) per core: 2
Core(s) per socket: 45
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 143
Model name: Intel(R) Xeon(R) Platinum 8457C
Stepping: 8
CPU MHz: 2599.889
BogoMIPS: 5199.77
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 4.2 MiB
L1i cache: 2.8 MiB
L2 cache: 180 MiB
L3 cache: 195 MiB
NUMA node0 CPU(s): 0-89
NUMA node1 CPU(s): 90-179
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Unknown: No mitigations
Vulnerability Retbleed: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Mitigation; TSX disabled
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_goo
d nopl xtopology nonstop_tsc cpuid pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdra
nd hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpci
d avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves avx512_bf16 wbnoinvd arat
avx512vbmi umip pku ospke waitpkg avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid cldemote movdiri movdir64b md_clear
arch_capabilities
Reproduction
Others
No response
Reminder
System Info
在4090 的机器上启动approachingai/ktransformers:v0.5.3 docker 镜像. 根据文档 https://github.com/kvcache-ai/ktransformers/blob/main/kt-kernel/README.md#complete-example-qwen3-30b-a3b 启动 Qwen3-30B-A3B,
具体而言, 命令是:
python -m sglang.launch_server
--host 0.0.0.0
--port 30000
--model /host_root/models/Qwen3-30B-A3B
--kt-weight-path /host_root/models/Qwen3-30B-A3B
--kt-cpuinfer 64
--kt-threadpool-count 2
--kt-num-gpu-experts 32
--kt-method BF16
--attention-backend flashinfer
--trust-remote-code
--mem-fraction-static 0.80
--chunked-prefill-size 16384
--max-running-requests 4
--served-model-name Qwen3
--enable-mixed-chunk
--tensor-parallel-size 1
--enable-p2p-check
--disable-shared-experts-fusion
--kt-gpu-prefill-token-threshold 4096
--kt-enable-dynamic-expert-update
大模型幻觉严重,基本不知所云.
使用sglang accuracy test 工具, 运行:
python -m sglang.test.run_eval --eval-name mmlu --port 30000 --num-examples 10 --max-tokens 8192
10 道题总得分为0. 部分输出如下:
——————————————————————————————
我机器配置如下:
nvidia-smi
Thu Apr 23 06:15:21 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 575.51.03 Driver Version: 575.51.03 CUDA Version: 12.9 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4090 On | 00000000:65:00.0 Off | Off |
| 44% 29C P8 9W / 450W | 22717MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA GeForce RTX 4090 On | 00000000:65:01.0 Off | Off |
| 45% 31C P8 21W / 450W | 3MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA GeForce RTX 4090 On | 00000000:67:00.0 Off | Off |
| 45% 29C P8 14W / 450W | 3MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 3 NVIDIA GeForce RTX 4090 On | 00000000:67:01.0 Off | Off |
| 44% 27C P8 15W / 450W | 3MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 4 NVIDIA GeForce RTX 4090 On | 00000000:69:00.0 Off | Off |
| 45% 29C P8 8W / 450W | 3MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 5 NVIDIA GeForce RTX 4090 On | 00000000:69:01.0 Off | Off |
| 44% 28C P8 26W / 450W | 3MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 6 NVIDIA GeForce RTX 4090 On | 00000000:6B:00.0 Off | Off |
| 44% 26C P8 29W / 450W | 3MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 7 NVIDIA GeForce RTX 4090 On | 00000000:6B:01.0 Off | Off |
| 43% 28C P8 12W / 450W | 3MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
————————————————————————————————————————————————————
lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 52 bits physical, 57 bits virtual
CPU(s): 180
On-line CPU(s) list: 0-179
Thread(s) per core: 2
Core(s) per socket: 45
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 143
Model name: Intel(R) Xeon(R) Platinum 8457C
Stepping: 8
CPU MHz: 2599.889
BogoMIPS: 5199.77
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 4.2 MiB
L1i cache: 2.8 MiB
L2 cache: 180 MiB
L3 cache: 195 MiB
NUMA node0 CPU(s): 0-89
NUMA node1 CPU(s): 90-179
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Unknown: No mitigations
Vulnerability Retbleed: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Mitigation; TSX disabled
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_goo
d nopl xtopology nonstop_tsc cpuid pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdra
nd hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpci
d avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves avx512_bf16 wbnoinvd arat
avx512vbmi umip pku ospke waitpkg avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid cldemote movdiri movdir64b md_clear
arch_capabilities
Reproduction
Others
No response