feat: add kvcache samples#123

Merged

Alanxtl merged 5 commits intoapache:mainfrom

Chen-BUPT:kvcache

Mar 13, 2026

Contributor

Chen-BUPT commented Feb 20, 2026

No description provided.

Chen-BUPT added 2 commits

February 20, 2026 13:25


          feat: add kvcache samples

bbf4d2e


          fix some problems

450237b

AlexStocks requested a review from Copilot

February 20, 2026 07:08

Copilot started reviewing on behalf of AlexStocks

February 20, 2026 07:09

Copilot AI reviewed

View reviewed changes

Contributor

Copilot AI left a comment

Pull request overview

Adds KVCache sample projects (mock + real-engine/BYOE) to demonstrate Pixiu KVCache routing and provide runnable verification scripts/tests for local and external environments.

Changes:

Introduce a fully local “mock” sample (controller + two engines) with one-command runner, request script, and Go integration test.
Introduce a “real-engine” BYOE sample with Pixiu config template, request script, Go smoke tests, and a metrics/latency verification script.
Add English/Chinese README documentation plus a top-level KVCache samples index.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 13 comments.

Show a summary per file

File	Description
ai/kvcache/real-engine/verify.sh	End-to-end BYOE verification script (config render, run workloads, parse lookup + metrics).
ai/kvcache/real-engine/test/pixiu_test.go	BYOE environment probe + smoke request test.
ai/kvcache/real-engine/request.sh	Manual request helper for BYOE Pixiu gateway.
ai/kvcache/real-engine/pixiu/conf.yaml	Envsubst-driven Pixiu config template enabling KVCache + proxy filters.
ai/kvcache/real-engine/README.md	Real-engine sample usage documentation (EN).
ai/kvcache/real-engine/README_zh.md	Real-engine sample usage documentation (ZH).
ai/kvcache/mock/test/pixiu_test.go	Local mock integration test validating routing and side-effect calls.
ai/kvcache/mock/server/engine-b/main.go	Mock engine B HTTP server (chat completions, stats/reset).
ai/kvcache/mock/server/engine-a/main.go	Mock engine A HTTP server (tokenize + chat completions, stats/reset).
ai/kvcache/mock/server/controller/main.go	Mock LMCache controller (lookup/pin/compress/evict + stats/reset).
ai/kvcache/mock/run.sh	One-command startup + validation runner for the mock sample.
ai/kvcache/mock/request.sh	Manual request + stats script for the mock sample.
ai/kvcache/mock/pixiu/conf.yaml	Pixiu config wired to the mock controller/engines.
ai/kvcache/mock/README.md	Mock sample usage documentation (EN).
ai/kvcache/mock/README_zh.md	Mock sample usage documentation (ZH).
ai/kvcache/README.md	KVCache samples index (EN).
ai/kvcache/README_zh.md	KVCache samples index (ZH).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

ai/kvcache/mock/run.sh Outdated Show resolved Hide resolved

ai/kvcache/real-engine/test/pixiu_test.go Outdated Show resolved Hide resolved

ai/kvcache/real-engine/verify.sh Outdated

Comment on lines +214 to +234

+                local tokenize_resp
+                tokenize_resp="$(curl -sS -H 'Content-Type: application/json' -X POST "${VLLM_ENDPOINT}/tokenize" -d "${tokenize_body}")"
+                local tokens_json
+                tokens_json="$(jq -c '.tokens // []' <<<"${tokenize_resp}")"
+                if [[ "${tokens_json}" == "[]" ]]; then
+                  echo "lookup_probe_error: tokenize returned empty tokens"
+                  exit 1
+                fi
+                local lookup_body
+                lookup_body="$(jq -nc --argjson t "${tokens_json}" '{tokens:$t}')"
+                local lookup_resp
+                lookup_resp="$(curl -sS -H 'Content-Type: application/json' -X POST "${LMCACHE_ENDPOINT}/lookup" -d "${lookup_body}")"
+                local preferred
+                preferred="$(jq -r '.layout_info | to_entries | max_by(.value["1"]) | .key // empty' <<<"${lookup_resp}")"
+                if [[ -z "${preferred}" ]]; then
+                  echo "lookup_probe_error: cannot parse preferred endpoint from lookup response"
+                  exit 1
+                fi

Copilot AI Feb 20, 2026

lookup_probe assumes successful JSON responses from /tokenize and /lookup; with set -e, any non-JSON/partial response will cause jq to exit non-zero and the script will terminate with a jq error instead of a clear message. Consider checking HTTP status codes (e.g., curl -f with captured status/body) and making the jq queries resilient (e.g., defaulting .layout_info to {}) so failures produce actionable errors.

Copilot uses AI. Check for mistakes.

ai/kvcache/mock/run.sh Show resolved Hide resolved

ai/kvcache/mock/run.sh Outdated Show resolved Hide resolved

ai/kvcache/mock/test/pixiu_test.go Show resolved Hide resolved

ai/kvcache/real-engine/verify.sh

+                if [[ -n "${PIXIU_PID}" ]] && kill -0 "${PIXIU_PID}" >/dev/null 2>&1; then
+                  kill "${PIXIU_PID}" >/dev/null 2>&1
+                  wait "${PIXIU_PID}" >/dev/null 2>&1
+                fi

Copilot AI Feb 20, 2026

WORK_DIR is created with mktemp but never removed. This leaves temp directories/logs under /tmp across runs; consider extending cleanup() to rm -rf "$WORK_DIR" (and ensure it still prints paths when needed).

Suggested change

      
              fi
          
              fi
          
              if [[ -n "${WORK_DIR:-}" && -d "${WORK_DIR}" ]]; then
          
                rm -rf "${WORK_DIR}"
          
              fi

Copilot uses AI. Check for mistakes.

ai/kvcache/real-engine/verify.sh Outdated Show resolved Hide resolved

ai/kvcache/real-engine/verify.sh

Comment on lines +189 to +203

+                  local result
+                  result="$(curl -sS -o /tmp/kvcache-real-${mode}.out -w '%{http_code} %{time_total}' \
+                    -H 'Content-Type: application/json' \
+                    -X POST "${PIXIU_URL}/v1/chat/completions" \
+                    -d "${body}")"
+                  local status
+                  status="${result%% *}"
+                  local timing
+                  timing="${result##* }"
+                  if [[ "${status}" != "200" ]]; then
+                    echo "request failed in ${mode} mode: status=${status}"
+                    cat /tmp/kvcache-real-${mode}.out
+                    exit 1

Copilot AI Feb 20, 2026

run_load writes responses to a fixed file (/tmp/kvcache-real-${mode}.out). This will overwrite across runs and mixes baseline/cached output between parallel executions; consider writing these files into WORK_DIR (or using mktemp) and including the request index in the filename.

Copilot uses AI. Check for mistakes.

ai/kvcache/real-engine/verify.sh Outdated Show resolved Hide resolved


          fix:fix some bugs and add headers

9a60793

Alanxtl reviewed

View reviewed changes

Contributor

Alanxtl left a comment •

edited

Loading

fix ci
这个sample可以加入作为一个集成测试吗
根目录的readme里面更新一下对应的说明


          feat:add integrate test

1a7102a

Alanxtl approved these changes

View reviewed changes

Contributor

Alanxtl left a comment

根目录的readme里面写一下这个sample的介绍


          feat: add descriptions to readme

5935af4

Contributor Author

Chen-BUPT commented Mar 13, 2026

根目录的readme里面写一下这个sample的介绍

done

Alanxtl approved these changes

View reviewed changes

Alanxtl merged commit 0af1179 into apache:main

2 checks passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet