Skip to content

feat: add kvcache samples#123

Merged
Alanxtl merged 5 commits intoapache:mainfrom
Chen-BUPT:kvcache
Mar 13, 2026
Merged

feat: add kvcache samples#123
Alanxtl merged 5 commits intoapache:mainfrom
Chen-BUPT:kvcache

Conversation

@Chen-BUPT
Copy link
Contributor

No description provided.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds KVCache sample projects (mock + real-engine/BYOE) to demonstrate Pixiu KVCache routing and provide runnable verification scripts/tests for local and external environments.

Changes:

  • Introduce a fully local “mock” sample (controller + two engines) with one-command runner, request script, and Go integration test.
  • Introduce a “real-engine” BYOE sample with Pixiu config template, request script, Go smoke tests, and a metrics/latency verification script.
  • Add English/Chinese README documentation plus a top-level KVCache samples index.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
ai/kvcache/real-engine/verify.sh End-to-end BYOE verification script (config render, run workloads, parse lookup + metrics).
ai/kvcache/real-engine/test/pixiu_test.go BYOE environment probe + smoke request test.
ai/kvcache/real-engine/request.sh Manual request helper for BYOE Pixiu gateway.
ai/kvcache/real-engine/pixiu/conf.yaml Envsubst-driven Pixiu config template enabling KVCache + proxy filters.
ai/kvcache/real-engine/README.md Real-engine sample usage documentation (EN).
ai/kvcache/real-engine/README_zh.md Real-engine sample usage documentation (ZH).
ai/kvcache/mock/test/pixiu_test.go Local mock integration test validating routing and side-effect calls.
ai/kvcache/mock/server/engine-b/main.go Mock engine B HTTP server (chat completions, stats/reset).
ai/kvcache/mock/server/engine-a/main.go Mock engine A HTTP server (tokenize + chat completions, stats/reset).
ai/kvcache/mock/server/controller/main.go Mock LMCache controller (lookup/pin/compress/evict + stats/reset).
ai/kvcache/mock/run.sh One-command startup + validation runner for the mock sample.
ai/kvcache/mock/request.sh Manual request + stats script for the mock sample.
ai/kvcache/mock/pixiu/conf.yaml Pixiu config wired to the mock controller/engines.
ai/kvcache/mock/README.md Mock sample usage documentation (EN).
ai/kvcache/mock/README_zh.md Mock sample usage documentation (ZH).
ai/kvcache/README.md KVCache samples index (EN).
ai/kvcache/README_zh.md KVCache samples index (ZH).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +214 to +234
local tokenize_resp
tokenize_resp="$(curl -sS -H 'Content-Type: application/json' -X POST "${VLLM_ENDPOINT}/tokenize" -d "${tokenize_body}")"

local tokens_json
tokens_json="$(jq -c '.tokens // []' <<<"${tokenize_resp}")"
if [[ "${tokens_json}" == "[]" ]]; then
echo "lookup_probe_error: tokenize returned empty tokens"
exit 1
fi

local lookup_body
lookup_body="$(jq -nc --argjson t "${tokens_json}" '{tokens:$t}')"
local lookup_resp
lookup_resp="$(curl -sS -H 'Content-Type: application/json' -X POST "${LMCACHE_ENDPOINT}/lookup" -d "${lookup_body}")"

local preferred
preferred="$(jq -r '.layout_info | to_entries | max_by(.value["1"]) | .key // empty' <<<"${lookup_resp}")"
if [[ -z "${preferred}" ]]; then
echo "lookup_probe_error: cannot parse preferred endpoint from lookup response"
exit 1
fi
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lookup_probe assumes successful JSON responses from /tokenize and /lookup; with set -e, any non-JSON/partial response will cause jq to exit non-zero and the script will terminate with a jq error instead of a clear message. Consider checking HTTP status codes (e.g., curl -f with captured status/body) and making the jq queries resilient (e.g., defaulting .layout_info to {}) so failures produce actionable errors.

Copilot uses AI. Check for mistakes.
if [[ -n "${PIXIU_PID}" ]] && kill -0 "${PIXIU_PID}" >/dev/null 2>&1; then
kill "${PIXIU_PID}" >/dev/null 2>&1
wait "${PIXIU_PID}" >/dev/null 2>&1
fi
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WORK_DIR is created with mktemp but never removed. This leaves temp directories/logs under /tmp across runs; consider extending cleanup() to rm -rf "$WORK_DIR" (and ensure it still prints paths when needed).

Suggested change
fi
fi
if [[ -n "${WORK_DIR:-}" && -d "${WORK_DIR}" ]]; then
rm -rf "${WORK_DIR}"
fi

Copilot uses AI. Check for mistakes.
Comment on lines +189 to +203
local result
result="$(curl -sS -o /tmp/kvcache-real-${mode}.out -w '%{http_code} %{time_total}' \
-H 'Content-Type: application/json' \
-X POST "${PIXIU_URL}/v1/chat/completions" \
-d "${body}")"

local status
status="${result%% *}"
local timing
timing="${result##* }"

if [[ "${status}" != "200" ]]; then
echo "request failed in ${mode} mode: status=${status}"
cat /tmp/kvcache-real-${mode}.out
exit 1
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run_load writes responses to a fixed file (/tmp/kvcache-real-${mode}.out). This will overwrite across runs and mixes baseline/cached output between parallel executions; consider writing these files into WORK_DIR (or using mktemp) and including the request index in the filename.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@Alanxtl Alanxtl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. fix ci
  2. 这个sample可以加入作为一个集成测试吗
  3. 根目录的readme里面更新一下对应的说明

Copy link
Contributor

@Alanxtl Alanxtl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

根目录的readme里面写一下这个sample的介绍

@Chen-BUPT
Copy link
Contributor Author

根目录的readme里面写一下这个sample的介绍

done

@Alanxtl Alanxtl merged commit 0af1179 into apache:main Mar 13, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants