Virtual Kubelet provider that maps Kubernetes pods to Cocoon MicroVMs.
vk-cocoon is the host-side bridge between the Kubernetes API and the cocoon runtime running on a single node. It satisfies the virtual-kubelet provider contract by translating pod CRUD into cocoon CLI calls and reporting per-VM status back to the kubelet.
| Layer | Package | Responsibility |
|---|---|---|
| Application | package main |
Entry point, node registration, metrics server, VM event watcher startup |
| Provider | provider/cocoon/ |
Provider struct with lifecycle methods (CreatePod / DeletePod / UpdatePod / GetPodStatus), startup reconcile, orphan policy, VM event watcher, pod eviction |
| Provider iface | provider/ |
Shared provider interface and node-capacity helpers |
| Cocoon CLI | vm/ |
Runtime interface + the default CocoonCLI implementation that shells out to sudo cocoon … (including WatchEvents via cocoon vm status --event --format json) |
| Snapshot SDK | snapshots/ |
Wraps the epoch SDK as a RegistryClient interface, plus Puller and Pusher that stream snapshots and cloud images via epoch/snapshot and epoch/cloudimg |
| Network | network/ |
dnsmasq lease parser used to resolve a freshly cloned VM's IP, plus the ICMPv4 Pinger the probe loop uses to check guest reachability |
| Guest exec | guest/ |
SSH executor (Linux) and RDP help-text shim (Windows) |
| Probes | probes/ |
Per-pod probe agents that run a caller-supplied health check on a ticker, update the in-memory readiness map, and invoke an onUpdate callback so the async provider can push fresh status through v-k's notify hook |
| Metrics | metrics/ |
Prometheus collectors for pod lifecycle, snapshot pull / push, VM table size, orphans |
| Build metadata | version/ |
ldflags-injected version / revision / built-at strings |
- Parse
meta.VMSpecfrom the pod annotations. - If a VM with
spec.VMNamealready exists locally, adopt it (idempotent on restart). - Otherwise branch on
spec.Managedfirst, thenspec.Mode:Managed=false(static / externally-managed VMs, e.g. Windows toolboxes on an external QEMU host): skip the runtime entirely and adopt the pre-assignedVMID/IP/VNCPortthe operator pre-wrote into theVMRuntimeannotations.Managedis the single source of truth for "vk-cocoon owns this VM's lifecycle".- Mode
clone(default,Managed=true): pull the snapshot from epoch viaPuller.PullSnapshotif not cached locally, thenRuntime.Clone(from=spec.Image | spec.ForkFrom, to=spec.VMName). - Mode
run(Managed=true):Runtime.Run(image=spec.Image, name=spec.VMName).
- Resolve the IP from the dnsmasq lease file by MAC.
meta.VMRuntime{VMID, IP}.Apply(pod)writes the runtime annotations back so the operator and other consumers can pick them up.VNCPortis intentionally left unset here — cloud-hypervisor has no VNC server, so only the pre-seeded static-toolbox path ever carries a non-zero value.- Launch a per-pod probe agent in
probes/(see Readiness probing below). The agent's first probe runs synchronously so the initialnotifypush already reflects reachability; later probes run on a ticker and call back into the provider whenever readiness flips so the async notify hook re-fires.
- Decode
meta.VMSpec. meta.ShouldSnapshotVM(spec)— the shared cocoon-common decoder — decides whether to snapshot before destroy:always:Runtime.SnapshotSavethenPusher.PushSnapshot(tag=meta.DefaultSnapshotTag)to epoch.main-only: same, but only when the VM name ends in-0(slot 0 = main agent).never: skip snapshots entirely.
Runtime.Remove(vmID)to destroy the VM.- Forget the pod from the in-memory tables.
The only update vk-cocoon honors is a HibernateState transition. Anything else is a no-op (the operator deletes and recreates the pod for genuine spec changes).
| Transition | Behavior |
|---|---|
false → true |
Runtime.SnapshotSave → Pusher.PushSnapshot(tag=meta.HibernateSnapshotTag) → Runtime.Remove. Pod stays alive (PodRunning) so K8s controllers do not recreate it. Compensating rollback: if Runtime.Remove fails after a successful push, vk-cocoon best-effort Registry.DeleteManifest the hibernate tag so the operator does not observe Hibernated while the local VM is still running. Push and Save are idempotent, so a compensated retry re-publishes the tag cleanly on the next attempt. |
true → false (with no live VM) |
Puller.PullSnapshot(tag=meta.HibernateSnapshotTag) → Runtime.Clone → drop the hibernation tag from epoch. |
The operator's CocoonHibernation reconciler tracks the transition by polling epoch.GetManifest(vmName, "hibernate").
Cluster state is the source of truth. There is no persistent pods.json file. On every restart vk-cocoon:
- Lists every pod scheduled to its node via
fieldSelector=spec.nodeName=<VK_NODE_NAME>. - Lists every VM the cocoon runtime knows about via
Runtime.List. - Adopts each pod with a
vm.cocoonstack.io/idannotation by matching the VMID against the runtime list. - Walks unmatched VMs through the configured
VK_ORPHAN_POLICY:alert(default): log + bumpvk_cocoon_orphan_vm_total, leave the VM alone.destroy: remove the VM.keep: no log, no metric.
A pod whose annotated VMID does not appear in the local runtime list logs a warning and is left to CreatePod to recreate on the next reconcile.
vk-cocoon implements v-k's NotifyPods interface, so the framework treats it as an async provider: Kubernetes only sees the pod status vk-cocoon actively pushes through notify, and v-k never polls GetPodStatus on its own. That makes a real per-pod probe loop load-bearing — any status change that happens after CreatePod returns is invisible to the cluster unless vk-cocoon re-fires notify.
The probes/ package owns that loop:
CreatePod(and startup reconcile) callManager.Start(key, probe, onUpdate). The probe closure the provider supplies performs three checks in order:- The tracked VM still exists.
- If the in-memory VM record has no IP, re-try the dnsmasq lease file by MAC and write it back via
setVMIP. Pinger.Ping(ctx, ip)— a single ICMPv4 echo. This matches the cocoon Windows golden image contract (windows/autounattend.xmlexplicitly opensicmpv4:8and disables all firewall profiles), and it decouples readiness from specific services so the same probe works for Linux and Windows guests alike.
- The first probe runs synchronously inside
Startso the refreshStatus/notify pass thatCreatePoddoes before returning already reflects the initial reachability decision. - A background goroutine re-runs the probe on a ticker (2 s cold-start, 5 s once Ready) and invokes
onUpdateafter 3 consecutive failures flip readiness back to false.onUpdatere-reads the pod, rebuilds the status, and callsnotifyso the kubelet observes the change. DeletePodcallsManager.Forget, which cancels the per-pod goroutine;Manager.Closeis called once at shutdown to tear every remaining agent down.
In addition to the periodic probe, vk-cocoon subscribes to cocoon's real-time VM event stream via cocoon vm status --event --format json. This provides sub-second detection of VM state changes (DELETED, stopped, error) without waiting for the next probe tick.
The watcher goroutine (vmWatchLoop) runs for the lifetime of the process with automatic restart on subprocess failure (2 s backoff). When an event arrives:
| Event | Inspect result | Action |
|---|---|---|
DELETED |
VM not found | evictPod: delete pod from API server → operator recreates |
MODIFIED (state ≠ running) |
state = stopped/error | cocoon vm start (in-place restart, preserves disk/network) |
MODIFIED (state ≠ running) |
state = running | False alarm — ignore |
A 30-second restart cooldown (restartCooldown) prevents tight restart loops when a VM keeps crashing. If the cooldown has not elapsed since the last restart, the pod is evicted instead so the operator can do a clean recreation.
Detection latency comparison:
| Mechanism | Worst-case latency |
|---|---|
| Probe only (old: 15 s × 5 failures) | ~75 s |
| Probe only (current: 5 s × 3 failures) | ~24 s |
| VM event watcher | < 1 s |
If the ICMP raw socket cannot be opened — typically because the binary is running without CAP_NET_RAW — the provider falls back to network.NopPinger and the probe degrades to "an IP was resolved == Ready". That is weaker than a real end-to-end ping but still strictly better than the previous behaviour of marking the pod Ready the instant cocoon vm clone/run returned. The systemd unit in packaging/vk-cocoon.service grants AmbientCapabilities=CAP_NET_RAW so the production path gets the real pinger.
| Variable | Default | Description |
|---|---|---|
KUBECONFIG |
unset | Path to kubeconfig (in-cluster used otherwise). |
VK_NODE_NAME |
cocoon-pool |
Virtual node name registered with the K8s API. |
VK_LOG_LEVEL |
info |
projecteru2/core/log level. |
EPOCH_URL |
http://epoch.cocoon-system.svc:8080 |
Epoch base URL. |
EPOCH_TOKEN |
unset | Bearer token (only needed for /v2/ pushes; /dl/ is anonymous). |
VK_LEASES_PATH |
/var/lib/misc/dnsmasq.leases |
dnsmasq lease file (must match cocoon-net). |
VK_COCOON_BIN |
/usr/local/bin/cocoon |
Path to the cocoon CLI binary. |
VK_SSH_PASSWORD |
unset | SSH password for kubectl logs / exec against Linux guests. |
VK_ORPHAN_POLICY |
alert |
alert, destroy, or keep. |
VK_NODE_IP |
auto-detected | Override the virtual node's InternalIP address (first non-loopback IPv4 used otherwise). |
VK_NODE_POOL |
default |
Cocoon pool label stamped onto the registered node. |
VK_PROVIDER_ID |
unset | Cloud-provider ProviderID for the virtual node (e.g. gce://<project>/<zone>/<instance>). Prevents cloud node lifecycle controllers from deleting the virtual node. |
VK_TLS_CERT |
/etc/cocoon/vk/tls/vk-kubelet.crt |
Path to the kubelet serving TLS certificate. |
VK_TLS_KEY |
/etc/cocoon/vk/tls/vk-kubelet.key |
Path to the kubelet serving TLS private key. |
VK_METRICS_ADDR |
:9091 |
Plain-HTTP prometheus listener. |
vk-cocoon is a host-level binary, not a kubernetes Deployment. The recommended path is the supplied systemd unit:
sudo install -m 0755 ./vk-cocoon /usr/local/bin/vk-cocoon
sudo install -m 0644 packaging/vk-cocoon.service /etc/systemd/system/vk-cocoon.service
sudo install -m 0644 packaging/vk-cocoon.env.example /etc/cocoon/vk-cocoon.env
# edit /etc/cocoon/vk-cocoon.env to your environment
sudo systemctl daemon-reload
sudo systemctl enable --now vk-cocoonThe unit reads /etc/cocoon/kubeconfig for cluster credentials and /etc/cocoon/vk-cocoon.env for the variables above.
make all # full pipeline: deps + fmt + lint + test + build
make build # build vk-cocoon binary
make test # vet + race-detected tests
make lint # golangci-lint on linux + darwin
make fmt # gofumpt + goimports
make help # show all targetsThe Makefile detects Go workspace mode (go env GOWORK) and skips go mod tidy when active so cross-module references resolve through go.work without forcing a release of cocoon-common or epoch.
| Project | Role |
|---|---|
| cocoon | The MicroVM runtime vk-cocoon shells out to. |
| cocoon-common | CRD types, annotation contract, shared helpers. |
| cocoon-operator | CocoonSet and CocoonHibernation reconcilers. |
| cocoon-webhook | Admission webhook for sticky scheduling and CocoonSet validation. |
| epoch | Snapshot registry; vk-cocoon pulls and pushes via epoch/snapshot + epoch/cloudimg. |
| cocoon-net | Per-host networking provisioning (dnsmasq + iptables); vk-cocoon reads its lease file. |