Conversation
There was a problem hiding this comment.
Pull request overview
Adds KVCache sample projects (mock + real-engine/BYOE) to demonstrate Pixiu KVCache routing and provide runnable verification scripts/tests for local and external environments.
Changes:
- Introduce a fully local “mock” sample (controller + two engines) with one-command runner, request script, and Go integration test.
- Introduce a “real-engine” BYOE sample with Pixiu config template, request script, Go smoke tests, and a metrics/latency verification script.
- Add English/Chinese README documentation plus a top-level KVCache samples index.
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
| ai/kvcache/real-engine/verify.sh | End-to-end BYOE verification script (config render, run workloads, parse lookup + metrics). |
| ai/kvcache/real-engine/test/pixiu_test.go | BYOE environment probe + smoke request test. |
| ai/kvcache/real-engine/request.sh | Manual request helper for BYOE Pixiu gateway. |
| ai/kvcache/real-engine/pixiu/conf.yaml | Envsubst-driven Pixiu config template enabling KVCache + proxy filters. |
| ai/kvcache/real-engine/README.md | Real-engine sample usage documentation (EN). |
| ai/kvcache/real-engine/README_zh.md | Real-engine sample usage documentation (ZH). |
| ai/kvcache/mock/test/pixiu_test.go | Local mock integration test validating routing and side-effect calls. |
| ai/kvcache/mock/server/engine-b/main.go | Mock engine B HTTP server (chat completions, stats/reset). |
| ai/kvcache/mock/server/engine-a/main.go | Mock engine A HTTP server (tokenize + chat completions, stats/reset). |
| ai/kvcache/mock/server/controller/main.go | Mock LMCache controller (lookup/pin/compress/evict + stats/reset). |
| ai/kvcache/mock/run.sh | One-command startup + validation runner for the mock sample. |
| ai/kvcache/mock/request.sh | Manual request + stats script for the mock sample. |
| ai/kvcache/mock/pixiu/conf.yaml | Pixiu config wired to the mock controller/engines. |
| ai/kvcache/mock/README.md | Mock sample usage documentation (EN). |
| ai/kvcache/mock/README_zh.md | Mock sample usage documentation (ZH). |
| ai/kvcache/README.md | KVCache samples index (EN). |
| ai/kvcache/README_zh.md | KVCache samples index (ZH). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
ai/kvcache/real-engine/verify.sh
Outdated
| local tokenize_resp | ||
| tokenize_resp="$(curl -sS -H 'Content-Type: application/json' -X POST "${VLLM_ENDPOINT}/tokenize" -d "${tokenize_body}")" | ||
|
|
||
| local tokens_json | ||
| tokens_json="$(jq -c '.tokens // []' <<<"${tokenize_resp}")" | ||
| if [[ "${tokens_json}" == "[]" ]]; then | ||
| echo "lookup_probe_error: tokenize returned empty tokens" | ||
| exit 1 | ||
| fi | ||
|
|
||
| local lookup_body | ||
| lookup_body="$(jq -nc --argjson t "${tokens_json}" '{tokens:$t}')" | ||
| local lookup_resp | ||
| lookup_resp="$(curl -sS -H 'Content-Type: application/json' -X POST "${LMCACHE_ENDPOINT}/lookup" -d "${lookup_body}")" | ||
|
|
||
| local preferred | ||
| preferred="$(jq -r '.layout_info | to_entries | max_by(.value["1"]) | .key // empty' <<<"${lookup_resp}")" | ||
| if [[ -z "${preferred}" ]]; then | ||
| echo "lookup_probe_error: cannot parse preferred endpoint from lookup response" | ||
| exit 1 | ||
| fi |
There was a problem hiding this comment.
lookup_probe assumes successful JSON responses from /tokenize and /lookup; with set -e, any non-JSON/partial response will cause jq to exit non-zero and the script will terminate with a jq error instead of a clear message. Consider checking HTTP status codes (e.g., curl -f with captured status/body) and making the jq queries resilient (e.g., defaulting .layout_info to {}) so failures produce actionable errors.
| if [[ -n "${PIXIU_PID}" ]] && kill -0 "${PIXIU_PID}" >/dev/null 2>&1; then | ||
| kill "${PIXIU_PID}" >/dev/null 2>&1 | ||
| wait "${PIXIU_PID}" >/dev/null 2>&1 | ||
| fi |
There was a problem hiding this comment.
WORK_DIR is created with mktemp but never removed. This leaves temp directories/logs under /tmp across runs; consider extending cleanup() to rm -rf "$WORK_DIR" (and ensure it still prints paths when needed).
| fi | |
| fi | |
| if [[ -n "${WORK_DIR:-}" && -d "${WORK_DIR}" ]]; then | |
| rm -rf "${WORK_DIR}" | |
| fi |
| local result | ||
| result="$(curl -sS -o /tmp/kvcache-real-${mode}.out -w '%{http_code} %{time_total}' \ | ||
| -H 'Content-Type: application/json' \ | ||
| -X POST "${PIXIU_URL}/v1/chat/completions" \ | ||
| -d "${body}")" | ||
|
|
||
| local status | ||
| status="${result%% *}" | ||
| local timing | ||
| timing="${result##* }" | ||
|
|
||
| if [[ "${status}" != "200" ]]; then | ||
| echo "request failed in ${mode} mode: status=${status}" | ||
| cat /tmp/kvcache-real-${mode}.out | ||
| exit 1 |
There was a problem hiding this comment.
run_load writes responses to a fixed file (/tmp/kvcache-real-${mode}.out). This will overwrite across runs and mixes baseline/cached output between parallel executions; consider writing these files into WORK_DIR (or using mktemp) and including the request index in the filename.
Alanxtl
left a comment
There was a problem hiding this comment.
根目录的readme里面写一下这个sample的介绍
done |
No description provided.