FP8 crashes on sm_86 (RTX 3090) even with --kt-num-gpu-experts 0

## `--kt-method FP8` crashes on sm_86 (RTX 3090) even with `--kt-num-gpu-experts 0`

### Environment
- GPU: RTX 3090 (sm_86)
- CPU: AMD EPYC 7663 (AVX2)
- KTransformers: v0.5.3
- sgl-kernel: 0.3.21
- Model: `Qwen/Qwen3.5-122B-A10B-FP8`

### Steps to reproduce

```
python -m sglang.launch_server \
  --model /path/to/Qwen3.5-122B-A10B-FP8 \
  --kt-weight-path /path/to/Qwen3.5-122B-A10B-FP8 \
  --kt-method FP8 \
  --kt-cpuinfer 56 \
  --kt-threadpool-count 1 \
  --kt-num-gpu-experts 0 \
  --attention-backend triton \
  --tensor-parallel-size 2 \
  --trust-remote-code \
  --disable-shared-experts-fusion \
  --disable-custom-all-reduce
```

### What happens

AVX2 FP8 MoE layers initialize successfully on CPU:
```
TP MOE layer 47, pool: 0x6512e0e0, expert num: 256, num_experts_per_tok: 8
Created AVX2_FP8_MOE_TP 0 at numa 0
```

Model loads, KV cache allocates, then during CUDA graph capture:
```
triton.compiler.errors.CompilationError: at 1:0:
def fused_moe_kernel(
^
ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
```

Setting `--moe-runner-backend cutlass` hits:
```
AssertionError: cutlass_fp8 MoE requires CUDA 12.0+ with SM90 or CUDA 12.4+ with SM89
```

### Notes

- `--kt-num-gpu-experts` is set to 0
- The AVX2 FP8 CPU kernels load and initialize without error
- The crash occurs in the GPU-side MoE kernel compilation path
- `--kt-method BF16` does not crash
- RTX 3090 (sm_86) supports `fp8e4b15` and `fp8e5` per the Triton error message, but not `fp8e4nv`
- The [AVX2 tutorial](https://github.com/kvcache-ai/ktransformers/blob/main/doc/en/kt-kernel/AVX2-Tutorial.md) lists RTX 3090 as supported hardware and includes an FP8 example

### Related
- #934


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FP8 crashes on sm_86 (RTX 3090) even with --kt-num-gpu-experts 0 #1930

`--kt-method FP8` crashes on sm_86 (RTX 3090) even with `--kt-num-gpu-experts 0`

Environment

Steps to reproduce

What happens

Notes

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

FP8 crashes on sm_86 (RTX 3090) even with --kt-num-gpu-experts 0 #1930

Description

--kt-method FP8 crashes on sm_86 (RTX 3090) even with --kt-num-gpu-experts 0

Environment

Steps to reproduce

What happens

Notes

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`--kt-method FP8` crashes on sm_86 (RTX 3090) even with `--kt-num-gpu-experts 0`