expose Hami MCP by haitwang-cloud · Pull Request #1925 · Project-HAMi/HAMi

haitwang-cloud · 2026-06-04T05:39:47Z

What type of PR is this?
/kind feature

What this PR does / why we need it:

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

Signed-off-by: Tim <tim.wang03@sap.com>

hami-robot · 2026-06-04T05:39:52Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: haitwang-cloud
Once this PR has been reviewed and has the lgtm label, please assign wawa0210 for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

gemini-code-assist

Code Review

This pull request introduces a comprehensive execution plan (hami-mcp-server.md) to add a Model Context Protocol (MCP) server to HAMi for exposing read-only GPU and scheduler states. The review feedback provides valuable improvements to the plan, including correcting Kubernetes RBAC constraints for ConfigMaps, expanding secret redaction to cover init and ephemeral containers, reusing existing HAMi client initialization utilities, refining static check regexes to avoid false positives in tests, and adjusting the E2E test client constructor to be more idiomatic for Ginkgo suites.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-04T05:41:04Z

+# ServiceAccount: hami-mcp-server
+# ClusterRole verbs (HARD LIMIT):
+#   - get/list/watch on: nodes, pods, namespaces, configmaps (hami-scheduler-config only)
+#   - NO write verbs anywhere
+#   - NO get on secrets


In Kubernetes RBAC, you cannot restrict list or watch requests by resourceNames. The API server does not support filtering lists by name at the authorization level, so any attempt to list or watch ConfigMaps with this ClusterRole will result in a 403 Forbidden error. Since the MCP server likely only needs to read the specific hami-scheduler-config ConfigMap, you should only grant the get verb for the specific ConfigMap name under resourceNames or allow listing all ConfigMaps.

Suggested change

# ServiceAccount: hami-mcp-server

# ClusterRole verbs (HARD LIMIT):

# - get/list/watch on: nodes, pods, namespaces, configmaps (hami-scheduler-config only)

# - NO write verbs anywhere

# - NO get on secrets

# ServiceAccount: hami-mcp-server

# ClusterRole verbs (HARD LIMIT):

# - get/list/watch on: nodes, pods, namespaces

# - get on: configmaps (restricted to hami-scheduler-config via resourceNames)

# - NO write verbs anywhere

# - NO get on secrets

gemini-code-assist · 2026-06-04T05:41:04Z

+
+### 3.B Output redaction
+`pkg/mcp/redact/redact.go` strips:
+- All container `env` entries whose name matches `(?i)(token|secret|password|key|cred)`


The redaction plan currently only mentions stripping secrets from "container env entries". Kubernetes pods can also run initContainers and ephemeralContainers, which can similarly contain sensitive environment variables. To prevent accidental leaks, the redaction logic in pkg/mcp/redact/redact.go should explicitly process all container types (spec.containers, spec.initContainers, and spec.ephemeralContainers).

Suggested change

- All container `env` entries whose name matches `(?i)(token|secret|password|key|cred)`

- All container, initContainer, and ephemeralContainer `env` entries whose name matches `(?i)(token|secret|password|key|cred)`

gemini-code-assist · 2026-06-04T05:41:04Z

+pkg/mcp/
+  server.go                    # Server constructor, tool/resource registration
+  client/
+    k8s.go                     # Wraps client-go: list nodes, pods, configmaps


HAMi already has a robust, centralized Kubernetes client initialization package in pkg/util/client/client.go which automatically handles loading kubeconfig from the environment (KUBECONFIG) or falling back to in-cluster config. Instead of creating a new custom client wrapper in pkg/mcp/client/k8s.go, you should reuse pkg/util/client to avoid code duplication and ensure consistent client configuration across the project.

Suggested change

k8s.go # Wraps client-go: list nodes, pods, configmaps

k8s.go # Uses pkg/util/client to list nodes, pods, configmaps

gemini-code-assist · 2026-06-04T05:41:04Z

+# Forbid write verbs in mcp packages
+if grep -rE '\.(Update|Create|Delete|Patch|Apply)\(' pkg/mcp/ cmd/mcp-server/ 2>/dev/null; then
+    echo "ERROR: mcp packages must not call mutating K8s APIs"
+    exit 1
+fi


The proposed static check uses a simple grep pattern \.(Update|Create|Delete|Patch|Apply)\( to forbid write operations. This regex is highly prone to false positives. For example, calling a completely unrelated local method or builder pattern like mcpServer.Create(...) or tools.Create(...) will trigger a build failure. Consider refining the regex to target K8s client interfaces specifically, or using a more robust AST-based linter/static analysis tool to enforce read-only API usage.

Suggested change

# Forbid write verbs in mcp packages

if grep -rE '\.(Update|Create|Delete|Patch|Apply)\(' pkg/mcp/ cmd/mcp-server/ 2>/dev/null; then

echo "ERROR: mcp packages must not call mutating K8s APIs"

exit 1

fi

# Forbid write verbs in mcp packages (excluding test files to avoid false positives on mocks/fixtures)

if grep -rE --exclude='*_test.go' '\.(Update|Create|Delete|Patch|Apply)\(' pkg/mcp/ cmd/mcp-server/ 2>/dev/null; then

echo "ERROR: mcp packages must not call mutating K8s APIs"

exit 1

fi

Add a Model Context Protocol (MCP) server that exposes HAMi GPU scheduling state to AI assistants (Claude Desktop, Claude Code, Cursor, etc.). ## New Components - cmd/mcp-server: Standalone MCP server binary with CLI flags - pkg/mcp/server: MCP server constructor with tool/resource registration - pkg/mcp/client: K8s and Prometheus client wrappers (read-only) - pkg/mcp/tools: 5 MCP tools for GPU queries - pkg/mcp/redact: Secret redaction for sensitive data - pkg/mcp/resources: HAMi config resource ## MCP Tools - list_gpu_nodes: List GPU nodes with vendor/count/memory info - list_gpu_pods: List pods with GPU resource requests - get_quota_usage: Get GPU quota usage per namespace - get_gpu_metrics: Query Prometheus GPU metrics - describe_node: Detailed node info with GPU devices ## Build & Deployment - Makefile: Add mcp-server and docker-mcp targets - docker/Dockerfile.mcp: Multi-stage distroless image - charts/hami/templates/mcp-server/: Helm templates - docs/mcp-server.md: User documentation ## E2E Tests - Full MCP protocol stack tests with in-memory transports - Fake K8s clientset and stubbed Prometheus server - Coverage for all 5 tools, resources, and error handling ## Security - Read-only K8s API access (no write verbs) - RBAC with minimal permissions (no secrets access) - Automatic redaction of sensitive data in responses Signed-off-by: Tim <tim@example.com> Signed-off-by: Tim <tim.wang03@sap.com>

init commit

4bdfa54

Signed-off-by: Tim <tim.wang03@sap.com>

hami-robot Bot added kind/feature new function dco-signoff: yes labels Jun 4, 2026

hami-robot Bot requested review from archlitchi and chaunceyjiang June 4, 2026 05:39

github-actions Bot removed the kind/feature new function label Jun 4, 2026

haitwang-cloud marked this pull request as draft June 4, 2026 05:39

hami-robot Bot added do-not-merge/work-in-progress size/L labels Jun 4, 2026

haitwang-cloud mentioned this pull request Jun 4, 2026

Exposing MCP for HAMi #1923

Open

gemini-code-assist Bot reviewed Jun 4, 2026

View reviewed changes

hami-robot Bot added size/XXL and removed size/L labels Jun 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

expose Hami MCP#1925

expose Hami MCP#1925
haitwang-cloud wants to merge 2 commits into
Project-HAMi:masterfrom
haitwang-cloud:expose-hami-mcp

haitwang-cloud commented Jun 4, 2026

Uh oh!

hami-robot Bot commented Jun 4, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 4, 2026

Uh oh!

gemini-code-assist Bot Jun 4, 2026

Uh oh!

gemini-code-assist Bot Jun 4, 2026

Uh oh!

gemini-code-assist Bot Jun 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	- All container `env` entries whose name matches `(?i)(token\|secret\|password\|key\|cred)`
	- All container, initContainer, and ephemeralContainer `env` entries whose name matches `(?i)(token\|secret\|password\|key\|cred)`

	k8s.go # Wraps client-go: list nodes, pods, configmaps
	k8s.go # Uses pkg/util/client to list nodes, pods, configmaps

Conversation

haitwang-cloud commented Jun 4, 2026

Uh oh!

hami-robot Bot commented Jun 4, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant