KVSwitch: Accelerating Distributed LLM Inference with In-Network Prefix-Aware Routing

KVSwitch offloads prefix-aware routing for distributed LLM inference from centralized layer-7 routers into the network fabric itself. A client SDK tokenizes each prompt, computes a cumulative prefix hash chain, and embeds it in a compact shim header. Programmable switches perform hierarchical TCAM matching and per-prefix weighted ECMP routing at line rate, coordinated by a cache-event-driven SDN controller that keeps the forwarding state synchronized with distributed KV cache state.

Architecture overview of KVSwitch.

This repository contains a research prototype built on BMv2 and Mininet with a trace-driven LLM serving simulator. On its evaluation topology, KVSwitch reduces median TTFT by up to 27% and tail TTFT by up to 76% relative to a state-of-the-art layer-7 prefix-aware router.

Running Experiments

Experiments run inside Docker with a pre-built Mininet + BMv2 image. No local dependency installation is required — Docker pulls the image automatically on first run.

Prerequisites

Docker (with --privileged support)
Compiled P4 artifacts in build/p4/kvswitch/ (included in the repo)
ShareGPT dataset at data/ShareGPT_V3_unfiltered_cleaned_split.json (download)
Access to meta-llama/Llama-3.2-3B-Instruct on HuggingFace (request access)
HuggingFace model cache (for tokenizer; downloaded on first run)

Pre-computed results

We provide evaluation results (download) containing all raw results, profiling traces, experimental logs, and generated figures.

To use the pre-computed results, extract the archive into the repository root:

unzip kvswitch-results.zip -d .

This populates results/ with experiment data and figures. You can then rerun the analysis notebooks (see notebooks) without running experiments:

jupyter notebook notebooks/result_analysis.ipynb

Run all experiments

bash exp/run_exp.sh

Results are saved to results/exp/.

Run specific experiments

bash exp/run_exp.sh 1          # Microbenchmark: routing overhead
bash exp/run_exp.sh 2          # End-to-end: rate sweep
bash exp/run_exp.sh 3a         # Ablation: ECMP vs pinning
bash exp/run_exp.sh 3b         # Ablation: warm-up impact
bash exp/run_exp.sh 4a         # Sensitivity: prefix sharing ratio
bash exp/run_exp.sh 4b         # Sensitivity: KV cache capacity
bash exp/run_exp.sh 4c         # Sensitivity: number of workers
bash exp/run_exp.sh 2 4b       # Multiple experiments

Run a single evaluation

bash exp/run_eval.sh --baselines l4_ecmp,l7_rr,l7_pa,kvswitch \
  --num-requests 200 --request-rate 10

Rebuild the Docker image

Pass --build to run_eval.sh to recompile the P4 program and rebuild the Docker image:

bash exp/run_eval.sh --build --baselines kvswitch --num-requests 50

Recompile P4 artifacts

Pre-compiled artifacts are committed in the repo for zero-setup reproduction. If you modify p4/, recompile and recommit:

bash scripts/compile_p4.sh p4/kvswitch.p4 build/p4/kvswitch

The script uses a locally installed p4c if available, otherwise falls back to a p4c Docker image (built from p4lang/p4c, tagged as p4c:latest).

Development Setup

Local installation is only needed for development (editing code, running tests, profiling). This is not required to run experiments, which use Docker.

bash scripts/install.sh

This creates a Python 3.12 virtual environment in .venv, installs all dependencies (including vLLM for GPU profiling), and may take a long time due to the vLLM build.

Run tests

uv run pytest tests/ -q

Lint and format

uv run bash scripts/format.sh --all

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
3rdparty		3rdparty
assets		assets
build/p4/kvswitch		build/p4/kvswitch
data		data
docker		docker
exp		exp
kvswitch		kvswitch
notebooks		notebooks
p4		p4
results		results
scripts		scripts
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
.gitmodules		.gitmodules
.python-version		.python-version
README.md		README.md
kvswitch.pdf		kvswitch.pdf
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KVSwitch: Accelerating Distributed LLM Inference with In-Network Prefix-Aware Routing

Running Experiments

Prerequisites

Pre-computed results

Run all experiments

Run specific experiments

Run a single evaluation

Rebuild the Docker image

Recompile P4 artifacts

Development Setup

Run tests

Lint and format

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

KVSwitch: Accelerating Distributed LLM Inference with In-Network Prefix-Aware Routing

Running Experiments

Prerequisites

Pre-computed results

Run all experiments

Run specific experiments

Run a single evaluation

Rebuild the Docker image

Recompile P4 artifacts

Development Setup

Run tests

Lint and format

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages