Hermes

Note

This project is still incubating and is a proof of concept.

Kubernetes cluster analyzer for RDMA-capable GPU infrastructure. Scans clusters to detect RDMA networking capabilities, GPU topology, and intelligently selects optimal node pairs for high-speed interconnect testing.

Supports CoreWeave, GKE, OpenShift, and generic Kubernetes environments.

Installation

From Release Binary

Download the latest release for your platform:

# macOS (Apple Silicon)
curl -LO https://github.com/llm-d-incubation/hermes/releases/latest/download/hermes-darwin-arm64.tar.gz
tar xzf hermes-darwin-arm64.tar.gz
sudo mv hermes /usr/local/bin/

# macOS (Intel)
curl -LO https://github.com/llm-d-incubation/hermes/releases/latest/download/hermes-darwin-amd64.tar.gz
tar xzf hermes-darwin-amd64.tar.gz
sudo mv hermes /usr/local/bin/

# Linux (x86_64)
curl -LO https://github.com/llm-d-incubation/hermes/releases/latest/download/hermes-linux-amd64.tar.gz
tar xzf hermes-linux-amd64.tar.gz
sudo mv hermes hca-probe /usr/local/bin/

From Source

cargo install --path .

Quick Start

# scan cluster
hermes scan

# filter RDMA-capable nodes
hermes scan --ib-only

# preview RDMA test manifests
hermes self-test --dry-run

# run RDMA self-test
hermes self-test --namespace default

Platform Examples

# CoreWeave
KUBECONFIG=~/path/to/cwconfig hermes scan

# GKE
gcloud container clusters get-credentials CLUSTER_NAME && hermes scan

# OpenShift (with proxy)
HTTPS_PROXY=http://proxy-ip:port hermes scan

Self-Test Framework

Prerequisite: Most self-tests require JobSet to be installed on the cluster:

kubectl apply --server-side -f https://github.com/kubernetes-sigs/jobset/releases/download/v0.10.1/manifests.yaml

Automatically deploys RDMA workloads on intelligently-selected node pairs:

# preview what will be deployed
hermes self-test --dry-run

# run UCX-based data transfer test
hermes self-test --namespace default

# OpenShift RoCE (auto-detects SR-IOV network or use --sriov-network)
hermes self-test --namespace test-ns

# keep resources after test
hermes self-test --no-cleanup

How it works: Scans cluster → selects optimal node pair (same fabric/zone) → renders test manifests → deploys jobs → monitors completion → cleanup

Available workloads: nixl-transfer-test (default), deepgemm-minimal-test

Output Formats

hermes scan --format json    # JSON output
hermes scan --format table   # table view (default)
hermes scan --save-to report.json

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
.github/workflows		.github/workflows
charts		charts
docs		docs
hca-probe		hca-probe
helm-plugins/hermes-topology		helm-plugins/hermes-topology
hermes		hermes
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
README.md		README.md
justfile		justfile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hermes

Installation

From Release Binary

From Source

Quick Start

Platform Examples

Self-Test Framework

Output Formats

License

About

Uh oh!

Releases 11

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Hermes

Installation

From Release Binary

From Source

Quick Start

Platform Examples

Self-Test Framework

Output Formats

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 11

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages