diff --git a/README.md b/README.md index bd8118e..0f3748c 100644 --- a/README.md +++ b/README.md @@ -1,19 +1,20 @@ # cocoon-net -VPC-native networking setup for [Cocoon](https://github.com/cocoonstack/cocoon) VM nodes on GKE and Volcengine. It provisions cloud networking resources and configures the Linux host so that Windows and Linux VMs obtain VPC-routable IPs directly via DHCP — no overlay network, no iptables DNAT, no `kubectl port-forward` required. +VPC-native networking for [Cocoon](https://github.com/cocoonstack/cocoon) VM nodes. Provisions cloud networking resources and runs an embedded DHCP server so VMs obtain VPC-routable IPs directly -- no overlay network, no iptables DNAT, no external DHCP server dependency. ## Overview -- **Platform auto-detection** -- identifies GKE or Volcengine via instance metadata +- **Embedded DHCP server** on `cni0` bridge, replacing the external DHCP server dependency +- **Dynamic /32 host routes** added on DHCP lease, removed on expiry +- **Platform auto-detection** via instance metadata (GKE or Volcengine) - **Cloud resource provisioning** -- GKE alias IP ranges or Volcengine ENI secondary IPs -- **Host networking** -- cni0 bridge, dnsmasq DHCP, /32 host routes, iptables FORWARD + NAT +- **Host networking** -- cni0 bridge, sysctl, iptables FORWARD + NAT - **CNI integration** -- generates conflist for Kubernetes pod networking -- **State management** -- persists pool state to `/var/lib/cocoon/net/pool.json` -- **Adopt mode** -- bring existing hand-provisioned nodes under management without cloud API calls -- **Idempotent re-runs** -- skips config writes and service restarts when nothing changed -- **Dry-run mode** -- preview all changes before applying +- **State management** -- pool state persisted to `/var/lib/cocoon/net/pool.json` +- **Adopt mode** -- bring existing hand-provisioned nodes under management +- **Daemon mode** -- runs as a long-lived systemd service -### Supported platforms +### Supported Platforms | Platform | Mechanism | Max IPs/node | |---|---|---| @@ -22,205 +23,164 @@ VPC-native networking setup for [Cocoon](https://github.com/cocoonstack/cocoon) ## Architecture -On each node, `cocoon-net init` runs through these steps: - -1. **Detect** the cloud platform via instance metadata (auto, or `--platform` flag). -2. **Provision** cloud networking: - - **GKE**: adds a secondary IP range to the subnet, assigns alias IPs to the instance NIC, fixes GCE guest-agent route hijack. - - **Volcengine**: creates a dedicated /24 subnet, creates and attaches 7 secondary ENIs, assigns 20 secondary private IPs per ENI. -3. **Configure** the node: - - Creates `cni0` bridge with gateway IP. - - Generates `/etc/dnsmasq-cni.d/cni0.conf` with contiguous DHCP ranges and restarts `dnsmasq-cni`. - - Adds `/32` host routes for each VM IP pointing to `cni0`. - - Installs iptables FORWARD rules between secondary NICs and `cni0`, plus NAT MASQUERADE for outbound. - - Applies sysctl (`ip_forward=1`, `rp_filter=0`). -4. **Generate** `/etc/cni/net.d/30-dnsmasq-dhcp.conflist`. -5. **Save** pool state to `/var/lib/cocoon/net/pool.json`. +``` +cocoon-net init cocoon-net daemon + | | + v v +Cloud provisioning Node setup (sysctl, bridge, iptables, CNI conflist) +(alias IPs / ENIs) | + | v + v DHCP server on cni0 +pool.json <---------- | + v + On lease: add /32 route + On release: del /32 route +``` -## Installation +**Two-phase operation**: + +1. `cocoon-net init` (or `adopt`) -- one-time cloud provisioning + state persistence +2. `cocoon-net daemon` -- long-running service: node setup + DHCP + dynamic routing -### Download +## Installation ```bash curl -sL https://github.com/cocoonstack/cocoon-net/releases/latest/download/cocoon-net_Linux_x86_64.tar.gz | tar xz sudo install -m 0755 cocoon-net /usr/local/bin/ ``` -### Build from source +Build from source: ```bash git clone https://github.com/cocoonstack/cocoon-net.git cd cocoon-net -make build # produces ./cocoon-net -``` - -## Configuration - -### Flags - -| Flag | Default | Description | -|---|---|---| -| `--platform` | auto-detect | Force platform (`gke` or `volcengine`) | -| `--node-name` | (required) | Virtual node name (e.g. `cocoon-pool`) | -| `--subnet` | (required) | VM subnet CIDR (e.g. `172.20.100.0/24`) | -| `--pool-size` | `140` (init) / `253` (adopt) | Number of IPs in the pool | -| `--gateway` | first IP in subnet | Gateway IP on `cni0` | -| `--primary-nic` | auto-detect | Host primary NIC (`ens4` on GKE, `eth0` on Volcengine) | -| `--dns` | `8.8.8.8,1.1.1.1` | Comma-separated DNS servers for DHCP clients | -| `--state-dir` | `/var/lib/cocoon/net` | State directory for `pool.json` | -| `--dry-run` | `false` | Show what would be done, without making changes | -| `--manage-iptables` | `false` | (adopt only) Let cocoon-net write FORWARD + NAT rules | - -### Environment variables - -| Variable | Default | Description | -|---|---|---| -| `COCOON_NET_LOG_LEVEL` | `info` | Log level (`debug`, `info`, `warn`, `error`) | - -### Credentials - -**GKE** - -Uses application default credentials or the GCE instance service account. Requires `roles/compute.networkAdmin` or equivalent. - -**Volcengine** - -Reads from `~/.volcengine/config.json`: - -```json -{ - "access_key_id": "AKxxxx", - "secret_access_key": "xxxx", - "region": "cn-hongkong" -} -``` - -Or environment variables: - -```bash -export VOLCENGINE_ACCESS_KEY_ID=AKxxxx -export VOLCENGINE_SECRET_ACCESS_KEY=xxxx -export VOLCENGINE_REGION=cn-hongkong +make build ``` ## Usage -### init — full node setup +### init -- provision cloud networking ```bash sudo cocoon-net init \ + --platform gke \ --node-name cocoon-pool \ --subnet 172.20.100.0/24 \ --pool-size 140 ``` -With all flags: +### daemon -- run DHCP server (systemd service) ```bash -sudo cocoon-net init \ - --node-name cocoon-pool \ - --subnet 172.20.100.0/24 \ - --pool-size 140 \ - --gateway 172.20.100.1 \ - --platform volcengine \ - --primary-nic eth0 \ - --dns "8.8.8.8,1.1.1.1" \ - --state-dir /var/lib/cocoon/net +sudo cocoon-net daemon ``` -Dry run (show what would be done): +The daemon loads the pool from `pool.json`, configures host networking, and starts the embedded DHCP server. Host routes are managed dynamically: added when a VM gets a lease, removed when the lease expires. -```bash -sudo cocoon-net init --node-name cocoon-pool --subnet 172.20.100.0/24 --dry-run +Systemd unit: + +```ini +[Unit] +Description=cocoon-net VPC networking daemon +After=network-online.target +Wants=network-online.target + +[Service] +ExecStart=/usr/local/bin/cocoon-net daemon +Restart=always +RestartSec=5 + +[Install] +WantedBy=multi-user.target ``` -### adopt — bring an existing node under cocoon-net management +### adopt -- bring an existing node under management -For nodes whose cloud networking (alias IP range or secondary ENIs) was already provisioned by hand, `adopt` runs only the host-side configuration (bridge, sysctl, routes, dnsmasq, CNI conflist) and writes the pool state file — without calling any cloud APIs. +For nodes whose cloud networking was already provisioned by hand: ```bash sudo cocoon-net adopt \ + --platform gke \ --node-name cocoon-pool \ --subnet 172.20.0.0/24 ``` -By default, adopt preserves the host's existing iptables rules. Opt in to let cocoon-net manage them: +### status -- show pool state ```bash -sudo cocoon-net adopt \ - --node-name cocoon-pool \ - --subnet 172.20.0.0/24 \ - --manage-iptables +cocoon-net status ``` -Cloud-side teardown must be done manually on adopted nodes — cocoon-net will not undo what it did not provision. - -### status — show pool state +### teardown -- remove cloud networking resources ```bash -sudo cocoon-net status +sudo cocoon-net teardown ``` -Output: +## Flags -``` -Platform: volcengine -Node: cocoon-pool -Subnet: 172.20.100.0/24 -Gateway: 172.20.100.1 -IPs: 140 -Updated: 2026-04-04T06:00:00Z -ENIs: 7 -SubnetID: subnet-xxx -``` +| Flag | Default | Description | +|---|---|---| +| `--platform` | (required) | Cloud platform (`gke` or `volcengine`) | +| `--node-name` | (required) | Virtual node name | +| `--subnet` | (required) | VM subnet CIDR (e.g. `172.20.100.0/24`) | +| `--pool-size` | `140` (init) / `253` (adopt) | Number of IPs in the pool | +| `--gateway` | first IP in subnet | Gateway IP on `cni0` | +| `--primary-nic` | auto-detect | Host primary NIC | +| `--dns` | `8.8.8.8,1.1.1.1` | DNS servers for DHCP clients | +| `--state-dir` | `/var/lib/cocoon/net` | State directory for `pool.json` | +| `--lease-file` | `/var/lib/cocoon/net/leases.json` | DHCP lease persistence file | +| `--dry-run` | `false` | Preview changes without applying | +| `--skip-iptables` | `false` | (daemon) Skip iptables setup | +| `--manage-iptables` | `false` | (adopt) Let cocoon-net write iptables rules | -### teardown — remove cloud networking +### Environment Variables -```bash -sudo cocoon-net teardown -``` +| Variable | Default | Description | +|---|---|---| +| `COCOON_NET_LOG_LEVEL` | `info` | Log level (`debug`, `info`, `warn`, `error`) | -### CNI integration +## CNI Integration -Both `init` and `adopt` generate `/etc/cni/net.d/30-dnsmasq-dhcp.conflist`: +Both `init` and `adopt` generate `/etc/cni/net.d/30-cocoon-dhcp.conflist`: ```json { "cniVersion": "1.0.0", - "name": "dnsmasq-dhcp", - "plugins": [ - { - "type": "bridge", - "bridge": "cni0", - "isGateway": false, - "ipMasq": false, - "ipam": {} - } - ] + "name": "cocoon-dhcp", + "plugins": [{ + "type": "bridge", + "bridge": "cni0", + "isGateway": false, + "ipMasq": false, + "ipam": {} + }] } ``` -The CNI IPAM is intentionally empty — Windows guests obtain their IP directly from dnsmasq running on `cni0`. Using `"type": "dhcp"` would cause a dual-DHCP conflict. - -In a CocoonSet, use `network: dnsmasq-dhcp` to route VM pods through this CNI: +IPAM is intentionally empty -- VMs obtain IPs from the embedded DHCP server. In a CocoonSet: ```yaml spec: agent: - network: dnsmasq-dhcp + network: cocoon-dhcp os: windows ``` +## Credentials + +**GKE**: Uses application default credentials or GCE instance service account (`roles/compute.networkAdmin`). + +**Volcengine**: Reads from `~/.volcengine/config.json` or environment variables (`VOLCENGINE_ACCESS_KEY_ID`, `VOLCENGINE_SECRET_ACCESS_KEY`, `VOLCENGINE_REGION`). + ## Development ```bash -make build # build binary -make test # run tests with coverage -make lint # run golangci-lint (linux + darwin) -make fmt # format code with gofumpt and goimports -make deps # tidy modules -make clean # remove artifacts -make help # show all targets +make build # build binary +make test # run tests with coverage +make lint # golangci-lint (linux + darwin) +make fmt # gofumpt + goimports +make help # show all targets ``` ### Guides @@ -232,11 +192,11 @@ make help # show all targets | Project | Role | |---|---| +| [cocoon](https://github.com/cocoonstack/cocoon) | MicroVM engine (Cloud Hypervisor + Firecracker) | | [cocoon-common](https://github.com/cocoonstack/cocoon-common) | Shared metadata, Kubernetes, and logging helpers | | [cocoon-operator](https://github.com/cocoonstack/cocoon-operator) | CocoonSet and Hibernation CRDs | | [cocoon-webhook](https://github.com/cocoonstack/cocoon-webhook) | Admission webhook for sticky scheduling | -| [epoch](https://github.com/cocoonstack/epoch) | Snapshot storage backend | -| [vk-cocoon](https://github.com/cocoonstack/vk-cocoon) | Virtual kubelet provider managing VM lifecycle | +| [vk-cocoon](https://github.com/cocoonstack/vk-cocoon) | Virtual kubelet provider | ## License diff --git a/cmd/adopt_cmd.go b/cmd/adopt.go similarity index 68% rename from cmd/adopt_cmd.go rename to cmd/adopt.go index a3cae60..33286bb 100644 --- a/cmd/adopt_cmd.go +++ b/cmd/adopt.go @@ -16,24 +16,19 @@ import ( // on the adopt subcommand: by default adopt preserves the host's existing // firewall rules, and the operator must opt in with --manage-iptables to // have cocoon-net rewrite them. -var flagManageIPTables bool +var ( + flagManageIPTables bool +) func newAdoptCmd() *cobra.Command { cmd := &cobra.Command{ Use: "adopt", Short: "Adopt an existing manually-provisioned node into cocoon-net state", Long: `Adopt configures a node whose cloud networking (alias IP range or -secondary ENIs) already exists. cocoon-net will overwrite the dnsmasq -config, CNI conflist, bridge, iptables, sysctl, and pool state file from -its own templates while leaving the cloud-side allocation untouched. - -Use this on hosts that were provisioned by hand (or by an older script) -before cocoon-net existed, to bring them under cocoon-net management -without calling any cloud APIs. - -Required flags: - --node-name the virtual node name (e.g. cocoon-pool, cocoon-pool-2) - --subnet the existing alias range CIDR (e.g. 172.20.0.0/24)`, +secondary ENIs) already exists. cocoon-net will configure the bridge, +CNI conflist, sysctl, and write the pool state file while leaving +the cloud-side allocation untouched. Run 'cocoon-net daemon' after +adopt to start the embedded DHCP server.`, RunE: runAdopt, } @@ -63,22 +58,12 @@ func runAdopt(cmd *cobra.Command, _ []string) error { return fmt.Errorf("compute ip list: %w", err) } - // Auto-detect platform name purely for the state file. - // Default to GKE when neither metadata server answers. platformName := flagPlatform - if platformName == "" { - if plat, derr := detectPlatform(ctx); derr == nil { - platformName = plat.Name() - } else { - logger.Warnf(ctx, "platform auto-detect failed (%v), defaulting to %s", derr, platform.PlatformGKE) - platformName = platform.PlatformGKE - } - } - primaryNIC := flagPrimaryNIC if primaryNIC == "" { primaryNIC = platform.DefaultNIC(platformName) } + secondaryNICs := platform.DefaultSecondaryNICs(platformName) if flagDryRun { fmt.Println("[dry-run] would adopt node with config:") @@ -97,25 +82,24 @@ func runAdopt(cmd *cobra.Command, _ []string) error { fmt.Printf(" manage-iptables: %v\n", flagManageIPTables) fmt.Println() fmt.Println("would write:") - fmt.Println(" /etc/cni/net.d/30-dnsmasq-dhcp.conflist") - fmt.Println(" /etc/dnsmasq-cni.d/cni0.conf") + fmt.Println(" /etc/cni/net.d/30-cocoon-dhcp.conflist") fmt.Printf(" %s/pool.json\n", flagStateDir) iptablesPlan := "skipped (preserve existing rules)" if flagManageIPTables { iptablesPlan = "(re)applied" } - fmt.Printf("would (re)apply: bridge cni0, sysctl, host routes, dnsmasq-cni restart; iptables %s\n", iptablesPlan) + fmt.Printf("would (re)apply: bridge cni0, sysctl; iptables %s\n", iptablesPlan) + fmt.Println("routes and DHCP managed by 'cocoon-net daemon'") fmt.Println("would NOT touch: cloud alias IP range / ENIs (preserved as-is)") return nil } nodeCfg := &node.Config{ - Gateway: gateway, - SubnetCIDR: flagSubnet, - IPs: ips, - DNSServers: dnsServers, - PrimaryNIC: primaryNIC, - SkipIPTables: !flagManageIPTables, + Gateway: gateway, + SubnetCIDR: flagSubnet, + PrimaryNIC: primaryNIC, + SecondaryNICs: secondaryNICs, + SkipIPTables: !flagManageIPTables, } if err := node.Setup(ctx, nodeCfg); err != nil { return fmt.Errorf("node setup: %w", err) @@ -123,12 +107,14 @@ func runAdopt(cmd *cobra.Command, _ []string) error { logger.Info(ctx, "node networking configured (adopted, cloud side untouched)") state := &pool.State{ - Platform: platformName, - NodeName: flagNodeName, - Subnet: flagSubnet, - Gateway: gateway, - IPs: ips, - StateDir: flagStateDir, + Platform: platformName, + NodeName: flagNodeName, + Subnet: flagSubnet, + Gateway: gateway, + PrimaryNIC: primaryNIC, + SecondaryNICs: secondaryNICs, + IPs: ips, + StateDir: flagStateDir, } if err := state.Save(ctx); err != nil { return fmt.Errorf("save pool state: %w", err) diff --git a/cmd/daemon.go b/cmd/daemon.go new file mode 100644 index 0000000..3199162 --- /dev/null +++ b/cmd/daemon.go @@ -0,0 +1,149 @@ +package cmd + +import ( + "fmt" + "net" + "os" + "path/filepath" + "strconv" + "strings" + "syscall" + + "github.com/projecteru2/core/log" + "github.com/spf13/cobra" + + "github.com/cocoonstack/cocoon-net/dhcp" + "github.com/cocoonstack/cocoon-net/node" + "github.com/cocoonstack/cocoon-net/platform" + "github.com/cocoonstack/cocoon-net/pool" +) + +const ( + defaultLeaseFile = "/var/lib/cocoon/net/leases.json" + pidFile = "/run/cocoon-net.pid" +) + +func newDaemonCmd() *cobra.Command { + cmd := &cobra.Command{ + Use: "daemon", + Short: "Run as a long-lived service: setup node networking and serve DHCP", + Long: `Daemon mode loads the IP pool from the state file, configures host +networking (sysctl, bridge, iptables), and starts an embedded DHCP server +on cni0. Host routes (/32) are added dynamically when leases are granted +and removed when they expire.`, + RunE: runDaemon, + } + cmd.Flags().String("state-dir", defaultStateDir, "directory containing pool.json") + cmd.Flags().String("lease-file", defaultLeaseFile, "path to lease persistence file") + cmd.Flags().StringSlice("dns", []string{"8.8.8.8", "1.1.1.1"}, "DNS servers for DHCP clients") + cmd.Flags().Bool("skip-iptables", false, "skip iptables setup (for pre-configured nodes)") + return cmd +} + +func runDaemon(cmd *cobra.Command, _ []string) error { + ctx := cmd.Context() + logger := log.WithFunc("cmd.daemon") + + if err := acquirePIDFile(); err != nil { + return err + } + defer os.Remove(pidFile) //nolint:errcheck + + stateDir, _ := cmd.Flags().GetString("state-dir") + leaseFile, _ := cmd.Flags().GetString("lease-file") + dnsStrs, _ := cmd.Flags().GetStringSlice("dns") + skipIPTables, _ := cmd.Flags().GetBool("skip-iptables") + + // Load pool state. + state, err := pool.Load(ctx, stateDir) + if err != nil { + return fmt.Errorf("load pool state: %w (run 'cocoon-net init' first)", err) + } + logger.Infof(ctx, "pool loaded: %d IPs, subnet %s, gateway %s", len(state.IPs), state.Subnet, state.Gateway) + + // Setup host networking (idempotent). + primaryNIC := state.PrimaryNIC + if primaryNIC == "" { + primaryNIC = platform.DefaultNIC(state.Platform) + } + if setupErr := node.Setup(ctx, &node.Config{ + Gateway: state.Gateway, + SubnetCIDR: state.Subnet, + PrimaryNIC: primaryNIC, + SecondaryNICs: state.SecondaryNICs, + SkipIPTables: skipIPTables, + }); setupErr != nil { + return fmt.Errorf("node setup: %w", setupErr) + } + + // Parse network config. + gateway := net.ParseIP(state.Gateway).To4() + if gateway == nil { + return fmt.Errorf("invalid gateway: %s", state.Gateway) + } + + _, ipNet, err := net.ParseCIDR(state.Subnet) + if err != nil { + return fmt.Errorf("invalid subnet: %w", err) + } + + poolIPs := parseIPs(state.IPs) + if len(poolIPs) == 0 { + return fmt.Errorf("no valid IPs in pool") + } + dnsIPs := parseIPs(dnsStrs) + + // Start DHCP server (blocks until ctx canceled). + srv := dhcp.New(dhcp.Config{ + Interface: node.BridgeName, + Gateway: gateway, + SubnetMask: ipNet.Mask, + DNSServers: dnsIPs, + LeaseFile: leaseFile, + }, poolIPs) + + logger.Info(ctx, "starting DHCP daemon") + return srv.Run(ctx) +} + +// acquirePIDFile writes the current PID to /run/cocoon-net.pid and fails +// if another instance is already running. +func acquirePIDFile() error { + if err := checkExistingPID(); err != nil { + return err + } + if err := os.MkdirAll(filepath.Dir(pidFile), 0o755); err != nil { //nolint:gosec + return fmt.Errorf("create pid dir: %w", err) + } + return os.WriteFile(pidFile, []byte(strconv.Itoa(os.Getpid())), 0o644) //nolint:gosec +} + +func checkExistingPID() error { + data, err := os.ReadFile(pidFile) + if err != nil { + return nil // no PID file, safe to proceed + } + pid, err := strconv.Atoi(strings.TrimSpace(string(data))) + if err != nil { + return nil // corrupt PID file, overwrite + } + proc, err := os.FindProcess(pid) + if err != nil { + return nil + } + if proc.Signal(syscall.Signal(0)) == nil { + return fmt.Errorf("another cocoon-net daemon is running (pid %d)", pid) + } + return nil // stale PID, process dead +} + +// parseIPs converts a string slice to IPv4 addresses, skipping invalid entries. +func parseIPs(strs []string) []net.IP { + ips := make([]net.IP, 0, len(strs)) + for _, s := range strs { + if ip := net.ParseIP(s).To4(); ip != nil { + ips = append(ips, ip) + } + } + return ips +} diff --git a/cmd/helpers.go b/cmd/helpers.go index 4fecf0c..2e01de0 100644 --- a/cmd/helpers.go +++ b/cmd/helpers.go @@ -22,7 +22,8 @@ var ( // registerCommonFlags binds the flags shared by init and adopt subcommands. func registerCommonFlags(cmd *cobra.Command, defaultPoolSize int) { - cmd.Flags().StringVar(&flagPlatform, "platform", "", "force platform (gke|volcengine); auto-detect if empty") + cmd.Flags().StringVar(&flagPlatform, "platform", "", "cloud platform (gke|volcengine)") + _ = cmd.MarkFlagRequired("platform") cmd.Flags().StringVar(&flagNodeName, "node-name", "", "virtual node name (required)") cmd.Flags().StringVar(&flagSubnet, "subnet", "", "VM subnet CIDR, e.g. 172.20.100.0/24 (required)") cmd.Flags().IntVar(&flagPoolSize, "pool-size", defaultPoolSize, "number of IPs in the pool") diff --git a/cmd/init_cmd.go b/cmd/init.go similarity index 79% rename from cmd/init_cmd.go rename to cmd/init.go index 29fa0f8..a09e501 100644 --- a/cmd/init_cmd.go +++ b/cmd/init.go @@ -50,15 +50,9 @@ func runInit(cmd *cobra.Command, _ []string) error { return nil } - var plat platform.CloudPlatform - var err error - if flagPlatform != "" { - plat, err = newPlatform(flagPlatform) - } else { - plat, err = detectPlatform(ctx) - } + plat, err := newPlatform(flagPlatform) if err != nil { - return fmt.Errorf("detect platform: %w", err) + return fmt.Errorf("init platform: %w", err) } logger.Infof(ctx, "platform: %s", plat.Name()) @@ -69,11 +63,10 @@ func runInit(cmd *cobra.Command, _ []string) error { logger.Infof(ctx, "provisioned %d IPs on subnet %s", len(result.IPs), result.SubnetCIDR) nodeCfg := &node.Config{ - Gateway: result.Gateway, - SubnetCIDR: result.SubnetCIDR, - IPs: result.IPs, - DNSServers: cfg.DNSServers, - PrimaryNIC: result.PrimaryNIC, + Gateway: result.Gateway, + SubnetCIDR: result.SubnetCIDR, + PrimaryNIC: result.PrimaryNIC, + SecondaryNICs: result.SecondaryNICs, } if err := node.Setup(ctx, nodeCfg); err != nil { return fmt.Errorf("node setup: %w", err) @@ -81,12 +74,14 @@ func runInit(cmd *cobra.Command, _ []string) error { logger.Info(ctx, "node networking configured") state := &pool.State{ - Platform: result.Platform, - NodeName: cfg.NodeName, - Subnet: result.SubnetCIDR, - Gateway: result.Gateway, - IPs: result.IPs, - StateDir: flagStateDir, + Platform: result.Platform, + NodeName: cfg.NodeName, + Subnet: result.SubnetCIDR, + Gateway: result.Gateway, + PrimaryNIC: result.PrimaryNIC, + SecondaryNICs: result.SecondaryNICs, + IPs: result.IPs, + StateDir: flagStateDir, } if err := state.Save(ctx); err != nil { return fmt.Errorf("save pool state: %w", err) diff --git a/cmd/root.go b/cmd/root.go index ced677b..3d48e45 100644 --- a/cmd/root.go +++ b/cmd/root.go @@ -5,7 +5,6 @@ import ( "fmt" "os" "os/signal" - "sync" "syscall" "github.com/projecteru2/core/log" @@ -30,6 +29,7 @@ func NewRootCmd() *cobra.Command { rootCmd.AddCommand(newInitCmd()) rootCmd.AddCommand(newAdoptCmd()) + rootCmd.AddCommand(newDaemonCmd()) rootCmd.AddCommand(newStatusCmd()) rootCmd.AddCommand(newTeardownCmd()) @@ -75,39 +75,3 @@ func newPlatform(name string) (platform.CloudPlatform, error) { return nil, fmt.Errorf("unknown platform: %s (valid: %s, %s)", name, platform.PlatformGKE, platform.PlatformVolcengine) } } - -// detectPlatform auto-detects the cloud platform by probing metadata endpoints concurrently. -func detectPlatform(ctx context.Context) (platform.CloudPlatform, error) { - type result struct { - plat platform.CloudPlatform - } - ch := make(chan result, 2) - - var wg sync.WaitGroup - wg.Add(2) - go func() { - defer wg.Done() - if gke.Detect(ctx) { - ch <- result{plat: &gke.GKE{}} - } - }() - go func() { - defer wg.Done() - if volcengine.Detect(ctx) { - ch <- result{plat: &volcengine.Volcengine{}} - } - }() - - // Close channel once both probes finish. - go func() { - wg.Wait() - close(ch) - }() - - for r := range ch { - if r.plat != nil { - return r.plat, nil - } - } - return nil, fmt.Errorf("could not detect cloud platform — set --platform explicitly (%s|%s)", platform.PlatformGKE, platform.PlatformVolcengine) -} diff --git a/cmd/status_cmd.go b/cmd/status.go similarity index 100% rename from cmd/status_cmd.go rename to cmd/status.go diff --git a/cmd/teardown_cmd.go b/cmd/teardown.go similarity index 100% rename from cmd/teardown_cmd.go rename to cmd/teardown.go diff --git a/dhcp/lease.go b/dhcp/lease.go new file mode 100644 index 0000000..cdc1089 --- /dev/null +++ b/dhcp/lease.go @@ -0,0 +1,163 @@ +package dhcp + +import ( + "encoding/json" + "net" + "os" + "sync" + "time" +) + +// lease represents a single DHCP lease. +type lease struct { + MAC net.HardwareAddr + IP net.IP + Expiry time.Time +} + +// leaseEntry is the JSON-serializable form of a lease. +type leaseEntry struct { + MAC string `json:"mac"` + IP string `json:"ip"` + Expiry string `json:"expiry"` +} + +// leaseStore manages active leases with persistence to a JSON file. +type leaseStore struct { + mu sync.RWMutex + leases map[string]*lease // keyed by MAC string + filePath string +} + +func newLeaseStore(filePath string) *leaseStore { + return &leaseStore{ + leases: make(map[string]*lease), + filePath: filePath, + } +} + +func (s *leaseStore) add(mac net.HardwareAddr, ip net.IP, duration time.Duration) { + s.mu.Lock() + defer s.mu.Unlock() + s.leases[mac.String()] = &lease{ + MAC: mac, + IP: ip.To4(), + Expiry: time.Now().Add(duration), + } +} + +func (s *leaseStore) remove(mac net.HardwareAddr) { + s.mu.Lock() + defer s.mu.Unlock() + delete(s.leases, mac.String()) +} + +func (s *leaseStore) ipForMAC(mac net.HardwareAddr) net.IP { + s.mu.RLock() + defer s.mu.RUnlock() + if l, ok := s.leases[mac.String()]; ok && time.Now().Before(l.Expiry) { + return l.IP + } + return nil +} + +func (s *leaseStore) isLeasedTo(mac net.HardwareAddr, ip net.IP) bool { + s.mu.RLock() + defer s.mu.RUnlock() + l, ok := s.leases[mac.String()] + return ok && l.IP.Equal(ip) && time.Now().Before(l.Expiry) +} + +// isLeasedToOther returns true if ip is actively leased to a different MAC. +func (s *leaseStore) isLeasedToOther(mac net.HardwareAddr, ip net.IP) bool { + s.mu.RLock() + defer s.mu.RUnlock() + now := time.Now() + for k, l := range s.leases { + if l.IP.Equal(ip) && now.Before(l.Expiry) && k != mac.String() { + return true + } + } + return false +} + +func (s *leaseStore) activeLeases() []lease { + s.mu.RLock() + defer s.mu.RUnlock() + now := time.Now() + var active []lease + for _, l := range s.leases { + if now.Before(l.Expiry) { + active = append(active, *l) + } + } + return active +} + +// expireOld removes expired leases and returns them. +func (s *leaseStore) expireOld() []lease { + s.mu.Lock() + defer s.mu.Unlock() + now := time.Now() + var expired []lease + for k, l := range s.leases { + if now.After(l.Expiry) { + expired = append(expired, *l) + delete(s.leases, k) + } + } + return expired +} + +func (s *leaseStore) save() error { + s.mu.RLock() + defer s.mu.RUnlock() + var entries []leaseEntry + for _, l := range s.leases { + entries = append(entries, leaseEntry{ + MAC: l.MAC.String(), + IP: l.IP.String(), + Expiry: l.Expiry.Format(time.RFC3339), + }) + } + data, err := json.MarshalIndent(entries, "", " ") + if err != nil { + return err + } + // Atomic write: temp file + rename to prevent corruption on crash. + tmp := s.filePath + ".tmp" + if err := os.WriteFile(tmp, data, 0o644); err != nil { //nolint:gosec + return err + } + return os.Rename(tmp, s.filePath) +} + +func (s *leaseStore) load() error { + data, err := os.ReadFile(s.filePath) //nolint:gosec + if err != nil { + return err + } + var entries []leaseEntry + if err := json.Unmarshal(data, &entries); err != nil { + return err + } + s.mu.Lock() + defer s.mu.Unlock() + now := time.Now() + for _, e := range entries { + mac, parseErr := net.ParseMAC(e.MAC) + if parseErr != nil { + continue + } + ip := net.ParseIP(e.IP).To4() + if ip == nil { + continue + } + expiry, parseErr := time.Parse(time.RFC3339, e.Expiry) + if parseErr != nil || now.After(expiry) { + continue + } + s.leases[mac.String()] = &lease{MAC: mac, IP: ip, Expiry: expiry} + } + return nil +} diff --git a/dhcp/offer.go b/dhcp/offer.go new file mode 100644 index 0000000..e4eea17 --- /dev/null +++ b/dhcp/offer.go @@ -0,0 +1,103 @@ +package dhcp + +import ( + "net" + "sync" + "time" +) + +// pendingOffer tracks an IP offered to a MAC that hasn't been committed yet. +type pendingOffer struct { + IP net.IP + Expiry time.Time +} + +// pendingOffers manages IPs that have been OFFERed but not yet ACKed. +// If the client never sends REQUEST, the offer expires and the IP +// is returned to the pool by the cleanup loop. +type pendingOffers struct { + mu sync.RWMutex + offers map[string]*pendingOffer // keyed by MAC string + timeout time.Duration +} + +func newPendingOffers(timeout time.Duration) *pendingOffers { + return &pendingOffers{ + offers: make(map[string]*pendingOffer), + timeout: timeout, + } +} + +// add records a pending offer for mac. If this MAC already has a pending +// offer for a different IP, the old IP is returned so the caller can +// release it back to the pool. +func (p *pendingOffers) add(mac net.HardwareAddr, ip net.IP) net.IP { + p.mu.Lock() + defer p.mu.Unlock() + key := mac.String() + var oldIP net.IP + if old, ok := p.offers[key]; ok && !old.IP.Equal(ip.To4()) { + oldIP = old.IP + } + p.offers[key] = &pendingOffer{ + IP: ip.To4(), + Expiry: time.Now().Add(p.timeout), + } + return oldIP +} + +// remove deletes the pending offer for mac and returns the offered IP +// so the caller can release it back to the pool. Returns nil if no +// pending offer exists. +func (p *pendingOffers) remove(mac net.HardwareAddr) net.IP { + p.mu.Lock() + defer p.mu.Unlock() + key := mac.String() + old, ok := p.offers[key] + if !ok { + return nil + } + delete(p.offers, key) + return old.IP +} + +// ipForMAC returns the offered IP if still valid. If the offer has expired, +// it is removed and the IP is returned as the second value so the caller +// can release it back to the pool. +func (p *pendingOffers) ipForMAC(mac net.HardwareAddr) (active, expired net.IP) { + p.mu.Lock() + defer p.mu.Unlock() + key := mac.String() + o, ok := p.offers[key] + if !ok { + return nil, nil + } + if time.Now().Before(o.Expiry) { + return o.IP, nil + } + // Expired — reclaim immediately instead of waiting for cleanupLoop. + delete(p.offers, key) + return nil, o.IP +} + +func (p *pendingOffers) isOfferedTo(mac net.HardwareAddr, ip net.IP) bool { + p.mu.RLock() + defer p.mu.RUnlock() + o, ok := p.offers[mac.String()] + return ok && o.IP.Equal(ip) && time.Now().Before(o.Expiry) +} + +// expireOld removes expired offers and returns their IPs for pool reclamation. +func (p *pendingOffers) expireOld() []net.IP { + p.mu.Lock() + defer p.mu.Unlock() + now := time.Now() + var expired []net.IP + for k, o := range p.offers { + if now.After(o.Expiry) { + expired = append(expired, o.IP) + delete(p.offers, k) + } + } + return expired +} diff --git a/dhcp/pool.go b/dhcp/pool.go new file mode 100644 index 0000000..a229e0f --- /dev/null +++ b/dhcp/pool.go @@ -0,0 +1,77 @@ +package dhcp + +import ( + "net" + "sync" +) + +// ipPool tracks which IPs from the fixed pool are free or in use. +type ipPool struct { + mu sync.RWMutex + free map[uint32]net.IP // IPs not yet leased + used map[uint32]bool // currently leased IPs +} + +func newIPPool(ips []net.IP) *ipPool { + free := make(map[uint32]net.IP, len(ips)) + for _, ip := range ips { + free[ipToUint32(ip)] = ip.To4() + } + return &ipPool{ + free: free, + used: make(map[uint32]bool), + } +} + +// allocate returns an available IP and removes it from the free set. +func (p *ipPool) allocate() net.IP { + p.mu.Lock() + defer p.mu.Unlock() + for k, ip := range p.free { + delete(p.free, k) + p.used[k] = true + return ip + } + return nil +} + +// release returns an IP to the free pool. +func (p *ipPool) release(ip net.IP) { + if ip == nil { + return + } + p.mu.Lock() + defer p.mu.Unlock() + k := ipToUint32(ip.To4()) + delete(p.used, k) + p.free[k] = ip.To4() +} + +// markUsed moves an IP from free to used (for lease restoration). +func (p *ipPool) markUsed(ip net.IP) { + p.mu.Lock() + defer p.mu.Unlock() + k := ipToUint32(ip.To4()) + delete(p.free, k) + p.used[k] = true +} + +// isFree checks if an IP is in the free (unallocated) set. +func (p *ipPool) isFree(ip net.IP) bool { + p.mu.RLock() + defer p.mu.RUnlock() + _, ok := p.free[ipToUint32(ip.To4())] + return ok +} + +// freeCount returns the number of unallocated IPs. +func (p *ipPool) freeCount() int { + p.mu.RLock() + defer p.mu.RUnlock() + return len(p.free) +} + +func ipToUint32(ip net.IP) uint32 { + ip = ip.To4() + return uint32(ip[0])<<24 | uint32(ip[1])<<16 | uint32(ip[2])<<8 | uint32(ip[3]) //nolint:mnd +} diff --git a/dhcp/route_linux.go b/dhcp/route_linux.go new file mode 100644 index 0000000..1bcd8f8 --- /dev/null +++ b/dhcp/route_linux.go @@ -0,0 +1,43 @@ +//go:build linux + +package dhcp + +import ( + "fmt" + "net" + + "github.com/vishvananda/netlink" +) + +// resolveLinkIndex resolves a network interface name to its kernel index. +func resolveLinkIndex(iface string) (int, error) { + link, err := netlink.LinkByName(iface) + if err != nil { + return 0, fmt.Errorf("resolve link %s: %w", iface, err) + } + return link.Attrs().Index, nil +} + +// addRoute adds a /32 host route for ip via the given link index. +func addRoute(ip net.IP, linkIndex int) error { + route := &netlink.Route{ + Dst: &net.IPNet{IP: ip.To4(), Mask: net.CIDRMask(32, 32)}, + LinkIndex: linkIndex, + } + if err := netlink.RouteReplace(route); err != nil { + return fmt.Errorf("route replace %s/32: %w", ip, err) + } + return nil +} + +// delRoute removes the /32 host route for ip. +func delRoute(ip net.IP, linkIndex int) error { + route := &netlink.Route{ + Dst: &net.IPNet{IP: ip.To4(), Mask: net.CIDRMask(32, 32)}, + LinkIndex: linkIndex, + } + if err := netlink.RouteDel(route); err != nil { + return fmt.Errorf("route del %s/32: %w", ip, err) + } + return nil +} diff --git a/dhcp/route_stub.go b/dhcp/route_stub.go new file mode 100644 index 0000000..b31fbc0 --- /dev/null +++ b/dhcp/route_stub.go @@ -0,0 +1,21 @@ +//go:build !linux + +package dhcp + +import ( + "errors" + "fmt" + "net" +) + +func resolveLinkIndex(_ string) (int, error) { + return 0, fmt.Errorf("resolve link: %w", errors.ErrUnsupported) +} + +func addRoute(ip net.IP, _ int) error { + return fmt.Errorf("add route %s: %w", ip, errors.ErrUnsupported) +} + +func delRoute(ip net.IP, _ int) error { + return fmt.Errorf("del route %s: %w", ip, errors.ErrUnsupported) +} diff --git a/dhcp/server.go b/dhcp/server.go new file mode 100644 index 0000000..d3d3b95 --- /dev/null +++ b/dhcp/server.go @@ -0,0 +1,315 @@ +package dhcp + +import ( + "context" + "fmt" + "net" + "sync" + "time" + + "github.com/insomniacslk/dhcp/dhcpv4" + "github.com/insomniacslk/dhcp/dhcpv4/server4" + "github.com/projecteru2/core/log" +) + +const ( + defaultLeaseTime = 24 * time.Hour + leaseCleanupInterval = time.Minute + offerTimeout = 60 * time.Second +) + +// Config holds DHCP server parameters. +type Config struct { + Interface string + Gateway net.IP + SubnetMask net.IPMask + DNSServers []net.IP + LeaseTime time.Duration + LeaseFile string +} + +// Server is an embedded DHCPv4 server that allocates IPs from a fixed pool, +// manages leases, and adds/removes /32 host routes on lease events. +type Server struct { + conf Config + pool *ipPool + leases *leaseStore + offers *pendingOffers + srv *server4.Server + + linkIndex int // cached kernel interface index for route operations + mu sync.Mutex + ctx context.Context + stopped bool +} + +// New creates a DHCP server. IPs are the allocatable pool (excluding gateway). +func New(conf Config, ips []net.IP) *Server { + if conf.LeaseTime == 0 { + conf.LeaseTime = defaultLeaseTime + } + return &Server{ + conf: conf, + pool: newIPPool(ips), + leases: newLeaseStore(conf.LeaseFile), + offers: newPendingOffers(offerTimeout), + } +} + +// Run starts the DHCP server and blocks until ctx is canceled. +func (s *Server) Run(ctx context.Context) error { + logger := log.WithFunc("dhcp.Run") + + s.ctx = ctx + + linkIdx, resolveErr := resolveLinkIndex(s.conf.Interface) + if resolveErr != nil { + return fmt.Errorf("resolve interface %s: %w", s.conf.Interface, resolveErr) + } + s.linkIndex = linkIdx + + if err := s.leases.load(); err != nil { + logger.Warnf(ctx, "load leases: %v (starting fresh)", err) + } else { + s.restoreLeases(ctx) + } + + laddr := &net.UDPAddr{IP: net.IPv4zero, Port: dhcpv4.ServerPort} + srv, err := server4.NewServer(s.conf.Interface, laddr, s.handler) + if err != nil { + return fmt.Errorf("create DHCP server: %w", err) + } + s.srv = srv + + logger.Infof(ctx, "DHCP server listening on %s (pool: %d IPs, gateway: %s)", + s.conf.Interface, s.pool.freeCount(), s.conf.Gateway) + + go s.cleanupLoop(ctx) + + errCh := make(chan error, 1) + go func() { errCh <- srv.Serve() }() + + select { + case <-ctx.Done(): + s.mu.Lock() + s.stopped = true + s.mu.Unlock() + _ = srv.Close() + _ = s.leases.save() + logger.Info(ctx, "DHCP server stopped") + return nil + case err := <-errCh: + s.mu.Lock() + stopped := s.stopped + s.mu.Unlock() + if stopped { + return nil + } + return fmt.Errorf("DHCP server: %w", err) + } +} + +// handler processes each DHCP packet. +func (s *Server) handler(conn net.PacketConn, peer net.Addr, msg *dhcpv4.DHCPv4) { + if msg.OpCode != dhcpv4.OpcodeBootRequest { + return + } + + mac := msg.ClientHWAddr + + switch msg.MessageType() { + case dhcpv4.MessageTypeDiscover: + s.handleDiscover(conn, peer, msg, mac) + case dhcpv4.MessageTypeRequest: + s.handleRequest(conn, peer, msg, mac) + case dhcpv4.MessageTypeRelease: + s.handleRelease(mac) + } +} + +func (s *Server) handleDiscover(conn net.PacketConn, peer net.Addr, msg *dhcpv4.DHCPv4, mac net.HardwareAddr) { + logger := log.WithFunc("dhcp.handleDiscover") + + // Re-offer existing lease first. + ip := s.leases.ipForMAC(mac) + if ip == nil { + // Check if we already have a pending offer for this MAC. + var staleIP net.IP + ip, staleIP = s.offers.ipForMAC(mac) + if staleIP != nil { + s.pool.release(staleIP) + } + } + if ip == nil { + // Allocate a new IP from the free pool. + ip = s.pool.allocate() + if ip == nil { + logger.Warnf(s.ctx, "DISCOVER from %s: pool exhausted", mac) + return + } + // Track as pending offer (not yet committed as lease). + // If this MAC had a stale offer for a different IP, release it. + if oldIP := s.offers.add(mac, ip); oldIP != nil { + s.pool.release(oldIP) + } + } + + resp, err := s.buildReply(msg, dhcpv4.MessageTypeOffer, ip) + if err != nil { + logger.Warnf(s.ctx, "build OFFER for %s: %v", mac, err) + return + } + + if _, err := conn.WriteTo(resp.ToBytes(), peer); err != nil { + logger.Warnf(s.ctx, "send OFFER to %s: %v", mac, err) + return + } + logger.Infof(s.ctx, "OFFER %s -> %s", ip, mac) +} + +func (s *Server) handleRequest(conn net.PacketConn, peer net.Addr, msg *dhcpv4.DHCPv4, mac net.HardwareAddr) { + logger := log.WithFunc("dhcp.handleRequest") + + reqIP := msg.RequestedIPAddress() + if reqIP == nil || reqIP.IsUnspecified() { + reqIP = msg.ClientIPAddr + } + if reqIP == nil || reqIP.IsUnspecified() { + logger.Warnf(s.ctx, "REQUEST from %s: no IP requested", mac) + return + } + + // Validate: the IP must be either free, offered to this MAC, or already + // leased to this MAC. Reject if it's leased to a different MAC. + if s.leases.isLeasedToOther(mac, reqIP) { + s.sendNAK(conn, peer, msg) + logger.Warnf(s.ctx, "NAK %s -> %s (leased to another client)", reqIP, mac) + return + } + if !s.pool.isFree(reqIP) && !s.offers.isOfferedTo(mac, reqIP) && !s.leases.isLeasedTo(mac, reqIP) { + s.sendNAK(conn, peer, msg) + logger.Warnf(s.ctx, "NAK %s -> %s (not available)", reqIP, mac) + return + } + + // Commit: move from pending/free to leased. + // Release the offered IP back to pool if client requested a different one. + if oldIP := s.offers.remove(mac); oldIP != nil && !oldIP.Equal(reqIP) { + s.pool.release(oldIP) + } + s.pool.markUsed(reqIP) + s.leases.add(mac, reqIP, s.conf.LeaseTime) + + if err := addRoute(reqIP, s.linkIndex); err != nil { + logger.Warnf(s.ctx, "add route %s: %v", reqIP, err) + } + + resp, err := s.buildReply(msg, dhcpv4.MessageTypeAck, reqIP) + if err != nil { + logger.Warnf(s.ctx, "build ACK for %s: %v", mac, err) + return + } + + if _, err := conn.WriteTo(resp.ToBytes(), peer); err != nil { + logger.Warnf(s.ctx, "send ACK to %s: %v", mac, err) + return + } + + _ = s.leases.save() + logger.Infof(s.ctx, "ACK %s -> %s", reqIP, mac) +} + +func (s *Server) handleRelease(mac net.HardwareAddr) { + logger := log.WithFunc("dhcp.handleRelease") + + ip := s.leases.ipForMAC(mac) + if ip == nil { + return + } + + s.leases.remove(mac) + s.pool.release(ip) + + if err := delRoute(ip, s.linkIndex); err != nil { + logger.Warnf(s.ctx, "del route %s: %v", ip, err) + } + + _ = s.leases.save() + logger.Infof(s.ctx, "RELEASE %s <- %s", ip, mac) +} + +// buildReply constructs a DHCP reply with standard options. +func (s *Server) buildReply(req *dhcpv4.DHCPv4, msgType dhcpv4.MessageType, ip net.IP) (*dhcpv4.DHCPv4, error) { + return dhcpv4.NewReplyFromRequest(req, + dhcpv4.WithMessageType(msgType), + dhcpv4.WithYourIP(ip), + dhcpv4.WithServerIP(s.conf.Gateway), + dhcpv4.WithOption(dhcpv4.OptSubnetMask(s.conf.SubnetMask)), + dhcpv4.WithOption(dhcpv4.OptRouter(s.conf.Gateway)), + dhcpv4.WithOption(dhcpv4.OptDNS(s.conf.DNSServers...)), + dhcpv4.WithOption(dhcpv4.OptIPAddressLeaseTime(s.conf.LeaseTime)), + dhcpv4.WithOption(dhcpv4.OptServerIdentifier(s.conf.Gateway)), + ) +} + +func (s *Server) sendNAK(conn net.PacketConn, peer net.Addr, msg *dhcpv4.DHCPv4) { + resp, err := dhcpv4.NewReplyFromRequest(msg, + dhcpv4.WithMessageType(dhcpv4.MessageTypeNak), + dhcpv4.WithServerIP(s.conf.Gateway), + dhcpv4.WithOption(dhcpv4.OptServerIdentifier(s.conf.Gateway)), + ) + if err != nil { + return + } + if _, err := conn.WriteTo(resp.ToBytes(), peer); err != nil { + log.WithFunc("dhcp.sendNAK").Warnf(s.ctx, "send NAK: %v", err) + } +} + +// restoreLeases re-adds /32 routes for all non-expired leases on startup. +func (s *Server) restoreLeases(ctx context.Context) { + logger := log.WithFunc("dhcp.restoreLeases") + active := s.leases.activeLeases() + for _, l := range active { + s.pool.markUsed(l.IP) + if err := addRoute(l.IP, s.linkIndex); err != nil { + logger.Warnf(ctx, "restore route %s: %v", l.IP, err) + } + } + if len(active) > 0 { + logger.Infof(ctx, "restored %d active leases", len(active)) + } +} + +// cleanupLoop periodically removes expired leases and abandoned offers. +func (s *Server) cleanupLoop(ctx context.Context) { + logger := log.WithFunc("dhcp.cleanup") + ticker := time.NewTicker(leaseCleanupInterval) + defer ticker.Stop() + + for { + select { + case <-ctx.Done(): + return + case <-ticker.C: + // Reclaim abandoned offers. + for _, ip := range s.offers.expireOld() { + s.pool.release(ip) + logger.Infof(ctx, "reclaimed abandoned offer %s", ip) + } + + // Expire old leases. + expired := s.leases.expireOld() + for _, l := range expired { + s.pool.release(l.IP) + if err := delRoute(l.IP, s.linkIndex); err != nil { + logger.Warnf(ctx, "del expired route %s: %v", l.IP, err) + } + logger.Infof(ctx, "expired lease %s <- %s", l.IP, l.MAC) + } + if len(expired) > 0 { + _ = s.leases.save() + } + } + } +} diff --git a/docs/gke.md b/docs/gke.md index e6830a0..db84507 100644 --- a/docs/gke.md +++ b/docs/gke.md @@ -26,8 +26,8 @@ GKE VPC (e.g. 10.0.0.0/8) 1. GKE subnets support **secondary IP ranges** — named CIDR blocks that can be assigned as alias IPs to GCE instances. 2. Each cocoonset node gets a `/24` alias IP range (e.g. `172.20.100.0/24`) — all IPs in this range are **VPC-routable** to that instance. -3. cocoon-net assigns a subset of the alias IPs to the `cni0` bridge's DHCP pool. -4. VMs obtain IPs via DHCP from dnsmasq; those IPs are within the VPC-routed alias range. +3. cocoon-net assigns the alias IPs to its embedded DHCP server pool. +4. VMs obtain IPs via DHCP from the embedded server; those IPs are within the VPC-routed alias range. 5. Any GKE pod or node can reach VM IPs directly via L3 routing — no iptables DNAT needed. **Key caveat**: The GCE guest agent installs a `local` route for the alias CIDR in the kernel's `local` routing table, which causes the host to respond to those IPs itself (blackholing VM traffic). cocoon-net removes this route and installs a cron job to remove it on reboot. @@ -43,6 +43,7 @@ GKE VPC (e.g. 10.0.0.0/8) ```bash sudo cocoon-net init \ + --platform gke \ --node-name cocoon-pool \ --subnet 172.20.100.0/24 \ --pool-size 140 \ @@ -54,23 +55,26 @@ This will: 2. Add the secondary range `cocoon-pods=172.20.100.0/24` to the node's subnet 3. Assign the alias IP `172.20.100.0/24` to `nic0` of the instance 4. Remove the local route installed by the GCE guest agent -5. Configure `cni0` bridge, dnsmasq DHCP, host routes, iptables, sysctl -6. Write CNI conflist to `/etc/cni/net.d/30-dnsmasq-dhcp.conflist` +5. Configure `cni0` bridge, iptables, sysctl +6. Write CNI conflist to `/etc/cni/net.d/30-cocoon-dhcp.conflist` 7. Save pool state to `/var/lib/cocoon/net/pool.json` +After init, run `cocoon-net daemon` to start the embedded DHCP server. Host routes (/32) are added dynamically when VMs obtain leases. + ## Adopting existing nodes For GKE nodes that were already provisioned by hand (alias IP range assigned, bridge configured), use `adopt` to bring them under cocoon-net management without calling any cloud APIs: ```bash sudo cocoon-net adopt \ + --platform gke \ --node-name cocoon-pool \ --subnet 172.20.100.0/24 ``` -This configures dnsmasq, CNI conflist, bridge, routes, and sysctl from cocoon-net's templates, and writes the pool state file. The existing alias IP range is preserved. By default, existing iptables rules are also preserved — pass `--manage-iptables` to let cocoon-net rewrite them. +This configures bridge, CNI conflist, and sysctl from cocoon-net's templates, and writes the pool state file. The existing alias IP range is preserved. By default, existing iptables rules are also preserved — pass `--manage-iptables` to let cocoon-net rewrite them. -After adopting, `cocoon-net status` and future re-runs of `adopt` work normally. Cloud-side teardown (removing the alias range) must still be done manually. +After adopting, run `cocoon-net daemon` to start DHCP. `cocoon-net status` and future re-runs of `adopt` work normally. Cloud-side teardown (removing the alias range) must still be done manually. ## Manual Steps (for reference) @@ -123,16 +127,7 @@ sysctl -w net.ipv4.conf.cni0.rp_filter=0 sysctl -w net.ipv4.conf.ens4.rp_filter=0 ``` -### 6. Host routes - -```bash -# One /32 route per VM IP pointing to cni0 -ip route replace 172.20.100.2/32 dev cni0 -ip route replace 172.20.100.3/32 dev cni0 -# ... etc -``` - -### 7. iptables +### 6. iptables ```bash # Allow VM traffic out via ens4 with MASQUERADE (internet access) @@ -141,33 +136,22 @@ iptables -t nat -A POSTROUTING -s 172.20.100.0/24 ! -o cni0 -j MASQUERADE iptables -A FORWARD -i cni0 -o cni0 -j ACCEPT ``` -### 8. dnsmasq +### 7. DHCP + +DHCP is provided by `cocoon-net daemon` (embedded server). No external DHCP server required. Host routes (/32) are managed dynamically on lease events. ```bash -mkdir -p /etc/dnsmasq-cni.d -cat > /etc/dnsmasq-cni.d/cni0.conf <<'EOF' -interface=cni0 -bind-interfaces -except-interface=lo -except-interface=ens4 -dhcp-range=172.20.100.2,172.20.100.141,255.255.255.0,24h -dhcp-option=option:router,172.20.100.1 -dhcp-option=option:dns-server,8.8.8.8,1.1.1.1 -dhcp-leasefile=/var/lib/misc/dnsmasq.leases -dhcp-authoritative -port=0 -log-dhcp -EOF -systemctl restart dnsmasq-cni +# Start the daemon (or use systemd unit) +cocoon-net daemon ``` -### 9. CNI conflist +### 8. CNI conflist ```bash -cat > /etc/cni/net.d/30-dnsmasq-dhcp.conflist <<'EOF' +cat > /etc/cni/net.d/30-cocoon-dhcp.conflist <<'EOF' { "cniVersion": "1.0.0", - "name": "dnsmasq-dhcp", + "name": "cocoon-dhcp", "plugins": [ { "type": "bridge", @@ -221,6 +205,6 @@ gcloud compute firewall-rules create allow-gke-master-to-vk \ | Symptom | Cause | Fix | |---|---|---| | VM has IP but not reachable | GCE guest agent local route | `ip route del local dev ens4 table local` | -| No DHCP lease | dnsmasq range doesn't match alias IPs | Re-run `cocoon-net init` | +| No DHCP lease | Daemon not running or pool mismatch | Check `cocoon-net daemon` logs | | kubectl exec/logs timeout | Firewall blocks port 10250 | Add firewall rule for GKE master CIDR | | `alias IP range overlaps` | Secondary range already assigned | Use same range name `cocoon-pods` | diff --git a/pool/pool.json b/docs/pool-example.json similarity index 100% rename from pool/pool.json rename to docs/pool-example.json diff --git a/docs/volcengine.md b/docs/volcengine.md index 220f3d4..b7fae53 100644 --- a/docs/volcengine.md +++ b/docs/volcengine.md @@ -1,6 +1,6 @@ # Volcengine VPC-Native Networking for Cocoon VM Nodes -Make cocoon Windows/Linux VM pods directly routable within the Volcengine (火山引擎) VPC — no overlay, no iptables DNAT, no kubectl port-forward required for inter-VPC access. +Make cocoon Windows/Linux VM pods directly routable within the Volcengine VPC — no overlay, no iptables DNAT, no kubectl port-forward required for inter-VPC access. ## Verified Result @@ -73,6 +73,7 @@ Volcengine VPC (172.20.0.0/16) ```bash sudo cocoon-net init \ + --platform volcengine \ --node-name cocoon-pool \ --subnet 172.20.100.0/24 \ --pool-size 140 \ @@ -85,23 +86,26 @@ This will: 3. Create 7 secondary ENIs in the VM subnet and attach them to the instance 4. Assign 20 secondary private IPs per ENI (140 total) 5. Bring up `eth1`–`eth7` interfaces -6. Configure `cni0` bridge, dnsmasq DHCP, host routes, iptables, sysctl -7. Write CNI conflist to `/etc/cni/net.d/30-dnsmasq-dhcp.conflist` +6. Configure `cni0` bridge, iptables, sysctl +7. Write CNI conflist to `/etc/cni/net.d/30-cocoon-dhcp.conflist` 8. Save pool state to `/var/lib/cocoon/net/pool.json` +After init, run `cocoon-net daemon` to start the embedded DHCP server. Host routes (/32) are added dynamically when VMs obtain leases. + ## Adopting existing nodes For EBM nodes that already have secondary ENIs and IPs provisioned by hand, use `adopt` to bring them under cocoon-net management without calling any Volcengine APIs: ```bash sudo cocoon-net adopt \ + --platform volcengine \ --node-name cocoon-pool \ --subnet 172.20.100.0/24 ``` -This configures dnsmasq, CNI conflist, bridge, routes, and sysctl from cocoon-net's templates, and writes the pool state file. The existing ENIs and secondary IPs are preserved. By default, existing iptables rules are also preserved — pass `--manage-iptables` to let cocoon-net rewrite them. +This configures bridge, CNI conflist, and sysctl from cocoon-net's templates, and writes the pool state file. The existing ENIs and secondary IPs are preserved. By default, existing iptables rules are also preserved — pass `--manage-iptables` to let cocoon-net rewrite them. -After adopting, `cocoon-net status` works normally. Cloud-side teardown (detaching/deleting ENIs) must still be done manually. +After adopting, run `cocoon-net daemon` to start DHCP. `cocoon-net status` works normally. Cloud-side teardown (detaching/deleting ENIs) must still be done manually. ## Manual Steps (for reference) @@ -193,39 +197,13 @@ for eni in d.get('Result', {}).get('NetworkInterfaceSets', []): " > /tmp/vpc-ips.txt ``` -### 5. Generate dnsmasq config with contiguous ranges +### 5. DHCP + +DHCP is provided by `cocoon-net daemon` (embedded server). No external DHCP server required. Host routes (/32) are managed dynamically on lease events. ```bash -python3 -c " -import ipaddress -ips = open('/tmp/vpc-ips.txt').read().strip().split() -ips.sort(key=lambda x: list(map(int, x.split('.')))) -ranges = [] -start = ips[0] -prev = ipaddress.IPv4Address(ips[0]) -for s in ips[1:]: - ip = ipaddress.IPv4Address(s) - if ip == prev + 1: - prev = ip - else: - ranges.append((start, str(prev))) - start = s - prev = ip -ranges.append((start, str(prev))) -print('interface=cni0') -print('bind-interfaces') -print('except-interface=lo') -print('except-interface=eth0') -for s, e in ranges: - print(f'dhcp-range={s},{e},255.255.255.0,24h') -print('dhcp-option=option:router,172.20.100.1') -print('dhcp-option=option:dns-server,8.8.8.8,1.1.1.1') -print('dhcp-leasefile=/var/lib/misc/dnsmasq.leases') -print('dhcp-authoritative') -print('port=0') -print('log-dhcp') -" > /etc/dnsmasq-cni.d/cni0.conf -systemctl restart dnsmasq-cni +# Start the daemon (or use systemd unit) +cocoon-net daemon ``` ### 6. Security group @@ -279,9 +257,9 @@ ve vpc AuthorizeSecurityGroupIngress \ | Symptom | Cause | Fix | |---|---|---| | Cross-host ping fails, same-host works | Security group blocks VPC internal | Add `172.20.0.0/16 all` ingress rule | -| No DHCP lease | dnsmasq range doesn't match secondary IPs | Re-export IPs, regenerate config | +| No DHCP lease | Daemon not running or pool mismatch | Check `cocoon-net daemon` logs | | VM has IP but not reachable cross-host | `ethX` interfaces DOWN | `ip link set ethX up` | -| `inconsistent DHCP range` in dnsmasq | Config has IPs from wrong subnet | Filter IPs by correct prefix | +| DHCP offers wrong subnet | Pool state has IPs from wrong subnet | Re-run `cocoon-net init` or `adopt` with correct `--subnet` | | `InsufficientIpInSubnet` on IP assign | Orphaned ENIs consuming IPs | Delete detached ENIs in the subnet | | Windows no DHCP, SAC stuck | Wrong cloud-hypervisor version | Use cocoon fork from cocoonstack/cloud-hypervisor | @@ -293,6 +271,7 @@ For new nodes: 2. Run on the new node: ```bash sudo cocoon-net init \ + --platform volcengine \ --node-name cocoon-pool-N \ --subnet 172.20.N.0/24 \ --pool-size 140 @@ -303,6 +282,7 @@ For existing hand-provisioned nodes, use `adopt` instead of `init`: ```bash sudo cocoon-net adopt \ + --platform volcengine \ --node-name cocoon-pool-N \ --subnet 172.20.N.0/24 ``` diff --git a/go.mod b/go.mod index 18f328f..8299ae2 100644 --- a/go.mod +++ b/go.mod @@ -3,8 +3,11 @@ module github.com/cocoonstack/cocoon-net go 1.25.0 require ( + github.com/coreos/go-iptables v0.8.0 + github.com/insomniacslk/dhcp v0.0.0-20260407060928-11b94ed970f2 github.com/projecteru2/core v0.0.0-20241016125006-ff909eefe04c github.com/spf13/cobra v1.10.2 + github.com/vishvananda/netlink v1.3.1 ) require ( @@ -16,15 +19,19 @@ require ( github.com/gogo/protobuf v1.3.2 // indirect github.com/golang/protobuf v1.5.4 // indirect github.com/inconshreveable/mousetrap v1.1.0 // indirect + github.com/josharian/native v1.1.0 // indirect github.com/kr/pretty v0.3.1 // indirect github.com/kr/text v0.2.0 // indirect github.com/mattn/go-colorable v0.1.13 // indirect github.com/mattn/go-isatty v0.0.18 // indirect github.com/mitchellh/mapstructure v1.5.0 // indirect + github.com/pierrec/lz4/v4 v4.1.14 // indirect github.com/pkg/errors v0.9.1 // indirect github.com/rogpeppe/go-internal v1.14.1 // indirect github.com/rs/zerolog v1.29.1 // indirect github.com/spf13/pflag v1.0.10 // indirect + github.com/u-root/uio v0.0.0-20230220225925-ffce2a382923 // indirect + github.com/vishvananda/netns v0.0.5 // indirect go.uber.org/zap v1.27.0 // indirect golang.org/x/exp v0.0.0-20230425010034-47ecfdc1ba53 // indirect golang.org/x/sys v0.39.0 // indirect diff --git a/go.sum b/go.sum index 4903f09..902f1e6 100644 --- a/go.sum +++ b/go.sum @@ -24,6 +24,8 @@ github.com/cockroachdb/redact v1.1.3/go.mod h1:BVNblN9mBWFyMyqK1k3AAiSxhvhfK2oOZ github.com/codegangsta/inject v0.0.0-20150114235600-33e0aa1cb7c0/go.mod h1:4Zcjuz89kmFXt9morQgcfYZAYZ5n8WHjt81YYWIwtTM= github.com/coreos/etcd v3.3.10+incompatible/go.mod h1:uF7uidLiAD3TWHmW31ZFd/JWoc32PjwdhPthX9715RE= github.com/coreos/go-etcd v2.0.0+incompatible/go.mod h1:Jez6KQU2B/sWsbdaef3ED8NzMklzPG4d5KIOhIy30Tk= +github.com/coreos/go-iptables v0.8.0 h1:MPc2P89IhuVpLI7ETL/2tx3XZ61VeICZjYqDEgNsPRc= +github.com/coreos/go-iptables v0.8.0/go.mod h1:Qe8Bv2Xik5FyTXwgIbLAnv2sWSBmvWdFETJConOQ//Q= github.com/coreos/go-semver v0.2.0/go.mod h1:nnelYz7RCh+5ahJtPPxZlU+153eP4D4r3EedlOD2RNk= github.com/coreos/go-systemd/v22 v22.5.0/go.mod h1:Y58oyj3AT4RCenI/lSvhwexgC+NSVTIJ3seZv2GcEnc= github.com/cpuguy83/go-md2man v1.0.10/go.mod h1:SmD6nW6nTyfqj6ABTjUi3V3JVMnlJmwcJI5acqYI6dE= @@ -104,11 +106,16 @@ github.com/imkira/go-interpol v1.1.0/go.mod h1:z0h2/2T3XF8kyEPpRgJ3kmNv+C43p+I/C github.com/inconshreveable/mousetrap v1.0.0/go.mod h1:PxqpIevigyE2G7u3NXJIT2ANytuPF1OarO4DADm73n8= github.com/inconshreveable/mousetrap v1.1.0 h1:wN+x4NVGpMsO7ErUn/mUI3vEoE6Jt13X2s0bqwp9tc8= github.com/inconshreveable/mousetrap v1.1.0/go.mod h1:vpF70FUmC8bwa3OWnCshd2FqLfsEA9PFc4w1p2J65bw= +github.com/insomniacslk/dhcp v0.0.0-20260407060928-11b94ed970f2 h1:G3irkWwmpl0vH/nn83K2AHqLUZweC7XAONuwXy/w9Co= +github.com/insomniacslk/dhcp v0.0.0-20260407060928-11b94ed970f2/go.mod h1:qfvBmyDNp+/liLEYWRvqny/PEz9hGe2Dz833eXILSmo= github.com/iris-contrib/blackfriday v2.0.0+incompatible/go.mod h1:UzZ2bDEoaSGPbkg6SAB4att1aAwTmVIx/5gCVqeyUdI= github.com/iris-contrib/go.uuid v2.0.0+incompatible/go.mod h1:iz2lgM/1UnEf1kP0L/+fafWORmlnuysV2EMP8MW+qe0= github.com/iris-contrib/jade v1.1.3/go.mod h1:H/geBymxJhShH5kecoiOCSssPX7QWYH7UaeZTSWddIk= github.com/iris-contrib/pongo2 v0.0.1/go.mod h1:Ssh+00+3GAZqSQb30AvBRNxBx7rf0GqwkjqxNd0u65g= github.com/iris-contrib/schema v0.0.1/go.mod h1:urYA3uvUNG1TIIjOSCzHr9/LmbQo8LrOcOqfqxa4hXw= +github.com/josharian/native v1.0.1-0.20221213033349-c1e37c09b531/go.mod h1:7X/raswPFr05uY3HiLlYeyQntB6OO7E/d2Cu7qoaN2w= +github.com/josharian/native v1.1.0 h1:uuaP0hAbW7Y4l0ZRQ6C9zfb7Mg1mbFKry/xzDAfmtLA= +github.com/josharian/native v1.1.0/go.mod h1:7X/raswPFr05uY3HiLlYeyQntB6OO7E/d2Cu7qoaN2w= github.com/json-iterator/go v1.1.6/go.mod h1:+SdeFBvtyEkXs7REEP0seUULqWtbJapLOCVDaaPEHmU= github.com/json-iterator/go v1.1.9/go.mod h1:KdQUCv79m/52Kvf8AW2vK1V8akMuk1QjK/uOdHXbAo4= github.com/jtolds/gls v4.20.0+incompatible/go.mod h1:QJZ7F/aHp+rZTRtaJ1ow/lLfFfVYBRgL+9YlvaHOwJU= @@ -149,6 +156,10 @@ github.com/mattn/go-isatty v0.0.16/go.mod h1:kYGgaQfpe5nmfYZH+SKPsOc2e4SrIfOl2e/ github.com/mattn/go-isatty v0.0.18 h1:DOKFKCQ7FNG2L1rbrmstDN4QVRdS89Nkh85u68Uwp98= github.com/mattn/go-isatty v0.0.18/go.mod h1:W+V8PltTTMOvKvAeJH7IuucS94S2C6jfK/D7dTCTo3Y= github.com/mattn/goveralls v0.0.2/go.mod h1:8d1ZMHsd7fW6IRPKQh46F2WRpyib5/X4FOpevwGNQEw= +github.com/mdlayher/packet v1.1.2 h1:3Up1NG6LZrsgDVn6X4L9Ge/iyRyxFEFD9o6Pr3Q1nQY= +github.com/mdlayher/packet v1.1.2/go.mod h1:GEu1+n9sG5VtiRE4SydOmX5GTwyyYlteZiFU+x0kew4= +github.com/mdlayher/socket v0.4.1 h1:eM9y2/jlbs1M615oshPQOHZzj6R6wMT7bX5NPiQvn2U= +github.com/mdlayher/socket v0.4.1/go.mod h1:cAqeGjoufqdxWkD7DkpyS+wcefOtmu5OQ8KuoJGIReA= github.com/mediocregopher/radix/v3 v3.4.2/go.mod h1:8FL3F6UQRXHXIBSPUs5h0RybMF8i4n7wVopoX3x7Bv8= github.com/microcosm-cc/bluemonday v1.0.2/go.mod h1:iVP4YcDBq+n/5fb23BhYFvIMq/leAFZyRl6bYmGDlGc= github.com/mitchellh/go-homedir v1.1.0/go.mod h1:SfyaCUpYCn1Vlf4IUYiD9fPX4A5wJrkLzIz1N1q0pr0= @@ -168,6 +179,8 @@ github.com/onsi/ginkgo v1.6.0/go.mod h1:lLunBs/Ym6LB5Z9jYTR76FiuTmxDTDusOGeTQH+W github.com/onsi/ginkgo v1.10.3/go.mod h1:lLunBs/Ym6LB5Z9jYTR76FiuTmxDTDusOGeTQH+WWjE= github.com/onsi/gomega v1.7.1/go.mod h1:XdKZgCCFLUoM/7CFJVPcG8C1xQ1AJ0vpAezJrB7JYyY= github.com/pelletier/go-toml v1.2.0/go.mod h1:5z9KED0ma1S8pY6P1sdut58dfprrGBbd/94hg7ilaic= +github.com/pierrec/lz4/v4 v4.1.14 h1:+fL8AQEZtz/ijeNnpduH0bROTu0O3NZAlPjQxGn8LwE= +github.com/pierrec/lz4/v4 v4.1.14/go.mod h1:gZWDp/Ze/IJXGXf23ltt2EXimqmTUXEy0GFuRQyBid4= github.com/pingcap/errors v0.11.4 h1:lFuQV/oaUMGcD2tqt+01ROSmJs75VG1ToEOkZIZ4nE4= github.com/pingcap/errors v0.11.4/go.mod h1:Oi8TUi2kEtXXLMJk9l1cGmz20kV3TaQ0usTwv5KuLY8= github.com/pkg/diff v0.0.0-20210226163009-20ebb0f2a09e/go.mod h1:pJLUxLENpZxwdsKMEsNbx1VGcRFpLqf3715MtcvvzbA= @@ -216,6 +229,8 @@ github.com/stretchr/testify v1.5.1/go.mod h1:5W2xD1RspED5o8YsWQXVCued0rvSQ+mT+I5 github.com/stretchr/testify v1.7.0/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/h/Wwjteg= github.com/stretchr/testify v1.8.4 h1:CcVxjf3Q8PM0mHUKJCdn+eZZtm5yQwehR5yeSVQQcUk= github.com/stretchr/testify v1.8.4/go.mod h1:sz/lmYIOXD/1dqDmKjjqLyZ2RngseejIcXlSw2iwfAo= +github.com/u-root/uio v0.0.0-20230220225925-ffce2a382923 h1:tHNk7XK9GkmKUR6Gh8gVBKXc2MVSZ4G/NnWLtzw4gNA= +github.com/u-root/uio v0.0.0-20230220225925-ffce2a382923/go.mod h1:eLL9Nub3yfAho7qB0MzZizFhTU2QkLeoVsWdHtDW264= github.com/ugorji/go v1.1.4/go.mod h1:uQMGLiO92mf5W77hV/PUCpI3pbzQx3CRekS0kk+RGrc= github.com/ugorji/go v1.1.7/go.mod h1:kZn38zHttfInRq0xu/PH0az30d+z6vm202qpg1oXVMw= github.com/ugorji/go/codec v0.0.0-20181204163529-d75b2dcb6bc8/go.mod h1:VFNgLljTbGfSG7qAOspJ7OScBnGdDN/yBr0sguwnwf0= @@ -226,6 +241,10 @@ github.com/valyala/fasthttp v1.6.0/go.mod h1:FstJa9V+Pj9vQ7OJie2qMHdwemEDaDiSdBn github.com/valyala/fasttemplate v1.0.1/go.mod h1:UQGH1tvbgY+Nz5t2n7tXsz52dQxojPUpymEIMZ47gx8= github.com/valyala/fasttemplate v1.2.1/go.mod h1:KHLXt3tVN2HBp8eijSv/kGJopbvo7S+qRAEEKiv+SiQ= github.com/valyala/tcplisten v0.0.0-20161114210144-ceec8f93295a/go.mod h1:v3UYOV9WzVtRmSR+PDvWpU/qWl4Wa5LApYYX4ZtKbio= +github.com/vishvananda/netlink v1.3.1 h1:3AEMt62VKqz90r0tmNhog0r/PpWKmrEShJU0wJW6bV0= +github.com/vishvananda/netlink v1.3.1/go.mod h1:ARtKouGSTGchR8aMwmkzC0qiNPrrWO5JS/XMVl45+b4= +github.com/vishvananda/netns v0.0.5 h1:DfiHV+j8bA32MFM7bfEunvT8IAqQ/NzSJHtcmW5zdEY= +github.com/vishvananda/netns v0.0.5/go.mod h1:SpkAiCQRtJ6TvvxPnOSyH3BMl6unz3xZlaprSwhNNJM= github.com/xeipuuv/gojsonpointer v0.0.0-20180127040702-4e3ac2762d5f/go.mod h1:N2zxlSyiKSe5eX1tZViRH5QA0qijqEDrYZiPEAiq3wU= github.com/xeipuuv/gojsonreference v0.0.0-20180127040603-bd5ef7bd5415/go.mod h1:GwrjFmJcFw6At/Gs6z4yjiIwzuJ1/+UwLxMQDVQXShQ= github.com/xeipuuv/gojsonschema v1.2.0/go.mod h1:anYRn/JVcOK2ZgGU+IjEV4nwlhoK5sQluxsYJ78Id3Y= @@ -276,8 +295,8 @@ golang.org/x/net v0.0.0-20201021035429-f5854403a974/go.mod h1:sp8m0HH+o8qH0wwXwY golang.org/x/net v0.0.0-20210226172049-e18ecbb05110/go.mod h1:m0MpNAwzfU5UDzcl9v0D8zg8gWTRqZa9RBIspLL5mdg= golang.org/x/net v0.0.0-20210405180319-a5a99cb37ef4/go.mod h1:p54w0d4576C0XHj96bSt6lcn1PtDYWL6XObtHCRCNQM= golang.org/x/net v0.0.0-20211008194852-3b03d305991f/go.mod h1:9nx3DQGgdP8bBQD5qxJ1jj9UTztislL4KSBs9R2vV5Y= -golang.org/x/net v0.19.0 h1:zTwKpTd2XuCqf8huc7Fo2iSy+4RHPd10s4KzeTnVr1c= -golang.org/x/net v0.19.0/go.mod h1:CfAk/cbD4CthTvqiEl8NpboMuiuOYsAr/7NOjZJtv1U= +golang.org/x/net v0.38.0 h1:vRMAPTMaeGqVhG5QyLJHqNDwecKTomGeqbnfZyKlBI8= +golang.org/x/net v0.38.0/go.mod h1:ivrbrMbzFq5J41QOQh0siUuly180yBYtLp+CKbEaFx8= golang.org/x/oauth2 v0.0.0-20180821212333-d2e6202438be/go.mod h1:N/0e6XlmueqKjAGxoOufVs8QHGRruUQn6yWY3a++T0U= golang.org/x/sync v0.0.0-20180314180146-1d60e4601c6f/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM= golang.org/x/sync v0.0.0-20181108010431-42b317875d0f/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM= @@ -286,6 +305,8 @@ golang.org/x/sync v0.0.0-20190423024810-112230192c58/go.mod h1:RxMgew5VJxzue5/jJ golang.org/x/sync v0.0.0-20190911185100-cd5d95a43a6e/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM= golang.org/x/sync v0.0.0-20201020160332-67f06af15bc9/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM= golang.org/x/sync v0.0.0-20210220032951-036812b2e83c/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM= +golang.org/x/sync v0.4.0 h1:zxkM55ReGkDlKSM+Fu41A+zmbZuaPVbGMzvvdUPznYQ= +golang.org/x/sync v0.4.0/go.mod h1:FU7BRWz2tNW+3quACPkgCx/L+uEAv1htQ0V83Z9Rj+Y= golang.org/x/sys v0.0.0-20180830151530-49385e6e1522/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY= golang.org/x/sys v0.0.0-20180909124046-d0be0721c37e/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY= golang.org/x/sys v0.0.0-20181205085412-a5c9d58dba9a/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY= @@ -307,8 +328,11 @@ golang.org/x/sys v0.0.0-20210630005230-0f9fa26af87c/go.mod h1:oPkhp1MJrh7nUepCBc golang.org/x/sys v0.0.0-20210927094055-39ccf1dd6fa6/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= golang.org/x/sys v0.0.0-20211007075335-d3039528d8ac/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= golang.org/x/sys v0.0.0-20220209214540-3681064d5158/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= +golang.org/x/sys v0.0.0-20220622161953-175b2fd9d664/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= golang.org/x/sys v0.0.0-20220811171246-fbc7d0a398ab/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= +golang.org/x/sys v0.2.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= golang.org/x/sys v0.6.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= +golang.org/x/sys v0.10.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= golang.org/x/sys v0.39.0 h1:CvCKL8MeisomCi6qNZ+wbb0DN9E5AATixKsvNtMoMFk= golang.org/x/sys v0.39.0/go.mod h1:OgkHotnGiDImocRcuBABYBEXf8A9a87e/uXjp9XT3ks= golang.org/x/term v0.0.0-20201126162022-7de9c90e9dd1/go.mod h1:bj7SfCRtBDWHUb9snDiAeCFNEtKQo2Wmx5Cou7ajbmo= diff --git a/node/bridge.go b/node/bridge.go deleted file mode 100644 index f617253..0000000 --- a/node/bridge.go +++ /dev/null @@ -1,46 +0,0 @@ -package node - -import ( - "context" - "fmt" - "net" - "os/exec" - "strings" - - "github.com/projecteru2/core/log" -) - -const cni0Bridge = "cni0" - -func setupBridge(ctx context.Context, gatewayIP, subnetCIDR string) error { - logger := log.WithFunc("node.setupBridge") - - addCmd := exec.CommandContext(ctx, "ip", "link", "add", cni0Bridge, "type", "bridge") - out, err := addCmd.CombinedOutput() - if err != nil && !strings.Contains(string(out), "File exists") { - return fmt.Errorf("create bridge %s: %w: %s", cni0Bridge, err, out) - } - - _, ipNet, err := net.ParseCIDR(subnetCIDR) - if err != nil { - return fmt.Errorf("parse subnet cidr: %w", err) - } - ones, _ := ipNet.Mask.Size() - cidr := fmt.Sprintf("%s/%d", gatewayIP, ones) - - //nolint:gosec // ip args from trusted config - addrCmd := exec.CommandContext(ctx, "ip", "addr", "replace", cidr, "dev", cni0Bridge) - out, err = addrCmd.CombinedOutput() - if err != nil { - return fmt.Errorf("assign %s to %s: %w: %s", cidr, cni0Bridge, err, out) - } - - upCmd := exec.CommandContext(ctx, "ip", "link", "set", cni0Bridge, "up") - out, err = upCmd.CombinedOutput() - if err != nil { - return fmt.Errorf("bring up %s: %w: %s", cni0Bridge, err, out) - } - - logger.Infof(ctx, "bridge %s configured with gateway %s", cni0Bridge, cidr) - return nil -} diff --git a/node/bridge_linux.go b/node/bridge_linux.go new file mode 100644 index 0000000..403e6fd --- /dev/null +++ b/node/bridge_linux.go @@ -0,0 +1,58 @@ +//go:build linux + +package node + +import ( + "context" + "errors" + "fmt" + "net" + "syscall" + + "github.com/projecteru2/core/log" + "github.com/vishvananda/netlink" +) + +func setupBridge(ctx context.Context, gatewayIP, subnetCIDR string) error { + logger := log.WithFunc("node.setupBridge") + + br := &netlink.Bridge{ + LinkAttrs: netlink.LinkAttrs{Name: BridgeName}, + } + if err := netlink.LinkAdd(br); err != nil && !errors.Is(err, syscall.EEXIST) { + return fmt.Errorf("create bridge %s: %w", BridgeName, err) + } + + link, err := netlink.LinkByName(BridgeName) + if err != nil { + return fmt.Errorf("get bridge %s: %w", BridgeName, err) + } + + _, ipNet, err := net.ParseCIDR(subnetCIDR) + if err != nil { + return fmt.Errorf("parse subnet cidr: %w", err) + } + + parsed := net.ParseIP(gatewayIP) + if parsed == nil { + return fmt.Errorf("invalid gateway ip: %s", gatewayIP) + } + gwIP := parsed.To4() + if gwIP == nil { + return fmt.Errorf("gateway is not ipv4: %s", gatewayIP) + } + + addr := &netlink.Addr{ + IPNet: &net.IPNet{IP: gwIP, Mask: ipNet.Mask}, + } + if err := netlink.AddrReplace(link, addr); err != nil { + return fmt.Errorf("assign %s to %s: %w", addr.IPNet, BridgeName, err) + } + + if err := netlink.LinkSetUp(link); err != nil { + return fmt.Errorf("bring up %s: %w", BridgeName, err) + } + + logger.Infof(ctx, "bridge %s configured with gateway %s", BridgeName, addr.IPNet) + return nil +} diff --git a/node/bridge_stub.go b/node/bridge_stub.go new file mode 100644 index 0000000..580371b --- /dev/null +++ b/node/bridge_stub.go @@ -0,0 +1,13 @@ +//go:build !linux + +package node + +import ( + "context" + "errors" + "fmt" +) + +func setupBridge(_ context.Context, _, _ string) error { + return fmt.Errorf("bridge setup: %w", errors.ErrUnsupported) +} diff --git a/node/dnsmasq.go b/node/dnsmasq.go deleted file mode 100644 index 0d62ffb..0000000 --- a/node/dnsmasq.go +++ /dev/null @@ -1,124 +0,0 @@ -package node - -import ( - "context" - "fmt" - "os" - "os/exec" - "path/filepath" - "slices" - "strings" - - "github.com/projecteru2/core/log" - - "github.com/cocoonstack/cocoon-net/platform" -) - -const ( - dnsmasqConfDir = "/etc/dnsmasq-cni.d" - dnsmasqConfFile = "cni0.conf" - dnsmasqService = "dnsmasq-cni" - dhcpLeaseFile = "/var/lib/misc/dnsmasq.leases" -) - -// setupDNSMasq generates /etc/dnsmasq-cni.d/cni0.conf and restarts dnsmasq-cni. -func setupDNSMasq(ctx context.Context, cfg *Config) error { - logger := log.WithFunc("node.setupDNSMasq") - - if err := os.MkdirAll(dnsmasqConfDir, 0o750); err != nil { - return fmt.Errorf("create dnsmasq conf dir: %w", err) - } - - // Ensure lease file exists. - if err := os.MkdirAll(filepath.Dir(dhcpLeaseFile), 0o750); err != nil { - return fmt.Errorf("create lease dir: %w", err) - } - f, err := os.OpenFile(dhcpLeaseFile, os.O_CREATE|os.O_WRONLY, 0o644) //nolint:gosec // public lease file - if err != nil { - return fmt.Errorf("create lease file: %w", err) - } - _ = f.Close() - - conf, err := buildDNSMasqConf(cfg) - if err != nil { - return fmt.Errorf("build dnsmasq conf: %w", err) - } - confPath := filepath.Join(dnsmasqConfDir, dnsmasqConfFile) - - // Skip write + restart when config is unchanged. - if existing, readErr := os.ReadFile(confPath); readErr == nil && string(existing) == conf { //nolint:gosec // known path - logger.Info(ctx, "dnsmasq conf unchanged, skipping restart") - return nil - } - - if writeErr := os.WriteFile(confPath, []byte(conf), 0o644); writeErr != nil { //nolint:gosec // world-readable conf - return fmt.Errorf("write dnsmasq conf: %w", writeErr) - } - logger.Infof(ctx, "wrote dnsmasq conf to %s", confPath) - - restartCmd := exec.CommandContext(ctx, "systemctl", "restart", dnsmasqService) - out, restartErr := restartCmd.CombinedOutput() - if restartErr != nil { - return fmt.Errorf("restart %s: %w: %s", dnsmasqService, restartErr, out) - } - logger.Infof(ctx, "restarted %s", dnsmasqService) - return nil -} - -// buildDNSMasqConf generates a dnsmasq config from the Config. -// IPs are sorted numerically and grouped into contiguous ranges for efficiency. -func buildDNSMasqConf(cfg *Config) (string, error) { - _, netMask, err := platform.CIDRMask(cfg.SubnetCIDR) - if err != nil { - return "", err - } - - sorted := slices.Clone(cfg.IPs) - platform.SortIPs(sorted) - - ranges := groupContiguous(sorted) - - var sb strings.Builder - sb.WriteString("interface=cni0\n") - sb.WriteString("bind-interfaces\n") - sb.WriteString("except-interface=lo\n") - if cfg.PrimaryNIC != "" { - fmt.Fprintf(&sb, "except-interface=%s\n", cfg.PrimaryNIC) - } - - for _, r := range ranges { - fmt.Fprintf(&sb, "dhcp-range=%s,%s,%s,24h\n", r[0], r[1], netMask) - } - - fmt.Fprintf(&sb, "dhcp-option=option:router,%s\n", cfg.Gateway) - if len(cfg.DNSServers) > 0 { - fmt.Fprintf(&sb, "dhcp-option=option:dns-server,%s\n", strings.Join(cfg.DNSServers, ",")) - } - fmt.Fprintf(&sb, "dhcp-leasefile=%s\n", dhcpLeaseFile) - sb.WriteString("dhcp-authoritative\n") - sb.WriteString("port=0\n") - sb.WriteString("log-dhcp\n") - return sb.String(), nil -} - -// groupContiguous groups a sorted list of IP strings into [start,end] pairs. -func groupContiguous(ips []string) [][2]string { - if len(ips) == 0 { - return nil - } - var ranges [][2]string - start := ips[0] - prev := platform.IP4ToUint32(ips[0]) - for _, s := range ips[1:] { - cur := platform.IP4ToUint32(s) - if cur == prev+1 { - prev = cur - continue - } - ranges = append(ranges, [2]string{start, platform.Uint32ToIP4(prev)}) - start = s - prev = cur - } - ranges = append(ranges, [2]string{start, platform.Uint32ToIP4(prev)}) - return ranges -} diff --git a/node/iptables.go b/node/iptables.go deleted file mode 100644 index 480fd14..0000000 --- a/node/iptables.go +++ /dev/null @@ -1,54 +0,0 @@ -package node - -import ( - "context" - "fmt" - "os/exec" - "strings" - - "github.com/projecteru2/core/log" -) - -// setupIPTables installs FORWARD rules between secondary NICs and cni0, -// and a NAT MASQUERADE rule for outbound VM traffic. -func setupIPTables(ctx context.Context, subnetCIDR string, secondaryNICs []string) error { - logger := log.WithFunc("node.setupIPTables") - - for _, iface := range secondaryNICs { - if err := iptEnsure(ctx, "filter", "FORWARD", "-i", iface, "-o", cni0Bridge, "-j", "ACCEPT"); err != nil { - return fmt.Errorf("iptables FORWARD %s->cni0: %w", iface, err) - } - if err := iptEnsure(ctx, "filter", "FORWARD", "-i", cni0Bridge, "-o", iface, "-j", "ACCEPT"); err != nil { - return fmt.Errorf("iptables FORWARD cni0->%s: %w", iface, err) - } - } - - if err := iptEnsure(ctx, "filter", "FORWARD", "-i", cni0Bridge, "-o", cni0Bridge, "-j", "ACCEPT"); err != nil { - return fmt.Errorf("iptables FORWARD cni0<->cni0: %w", err) - } - - if err := iptEnsure(ctx, "nat", "POSTROUTING", "-s", subnetCIDR, "!", "-o", cni0Bridge, "-j", "MASQUERADE"); err != nil { - return fmt.Errorf("iptables NAT MASQUERADE: %w", err) - } - - logger.Infof(ctx, "iptables configured for subnet %s", subnetCIDR) - return nil -} - -// iptEnsure adds an iptables rule if it does not already exist. -func iptEnsure(ctx context.Context, table, chain string, args ...string) error { - checkArgs := append([]string{"-t", table, "-C", chain}, args...) - //nolint:gosec // args come from internal constants - checkCmd := exec.CommandContext(ctx, "iptables", checkArgs...) - if checkCmd.Run() == nil { - return nil - } - addArgs := append([]string{"-t", table, "-A", chain}, args...) - //nolint:gosec // args come from internal constants - addCmd := exec.CommandContext(ctx, "iptables", addArgs...) - out, err := addCmd.CombinedOutput() - if err != nil { - return fmt.Errorf("iptables -t %s -A %s %s: %w: %s", table, chain, strings.Join(args, " "), err, out) - } - return nil -} diff --git a/node/iptables_linux.go b/node/iptables_linux.go new file mode 100644 index 0000000..7c5b79f --- /dev/null +++ b/node/iptables_linux.go @@ -0,0 +1,54 @@ +//go:build linux + +package node + +import ( + "context" + "fmt" + + "github.com/coreos/go-iptables/iptables" + "github.com/projecteru2/core/log" +) + +// setupIPTables installs FORWARD rules between secondary NICs and the bridge, +// and a NAT MASQUERADE rule for outbound VM traffic. +func setupIPTables(ctx context.Context, subnetCIDR string, secondaryNICs []string) error { + logger := log.WithFunc("node.setupIPTables") + + ipt, err := iptables.New() + if err != nil { + return fmt.Errorf("init iptables: %w", err) + } + + for _, iface := range secondaryNICs { + if err := iptEnsure(ipt, "filter", "FORWARD", "-i", iface, "-o", BridgeName, "-j", "ACCEPT"); err != nil { + return fmt.Errorf("iptables FORWARD %s->%s: %w", iface, BridgeName, err) + } + if err := iptEnsure(ipt, "filter", "FORWARD", "-i", BridgeName, "-o", iface, "-j", "ACCEPT"); err != nil { + return fmt.Errorf("iptables FORWARD %s->%s: %w", BridgeName, iface, err) + } + } + + if err := iptEnsure(ipt, "filter", "FORWARD", "-i", BridgeName, "-o", BridgeName, "-j", "ACCEPT"); err != nil { + return fmt.Errorf("iptables FORWARD %s<->%s: %w", BridgeName, BridgeName, err) + } + + if err := iptEnsure(ipt, "nat", "POSTROUTING", "-s", subnetCIDR, "!", "-o", BridgeName, "-j", "MASQUERADE"); err != nil { + return fmt.Errorf("iptables NAT MASQUERADE: %w", err) + } + + logger.Infof(ctx, "iptables configured for subnet %s", subnetCIDR) + return nil +} + +// iptEnsure adds an iptables rule if it does not already exist. +func iptEnsure(ipt *iptables.IPTables, table, chain string, args ...string) error { + exists, err := ipt.Exists(table, chain, args...) + if err != nil { + return fmt.Errorf("check rule: %w", err) + } + if exists { + return nil + } + return ipt.Append(table, chain, args...) +} diff --git a/node/iptables_stub.go b/node/iptables_stub.go new file mode 100644 index 0000000..8f62c23 --- /dev/null +++ b/node/iptables_stub.go @@ -0,0 +1,13 @@ +//go:build !linux + +package node + +import ( + "context" + "errors" + "fmt" +) + +func setupIPTables(_ context.Context, _ string, _ []string) error { + return fmt.Errorf("iptables setup: %w", errors.ErrUnsupported) +} diff --git a/node/node.go b/node/node.go index dc25e37..5e88c19 100644 --- a/node/node.go +++ b/node/node.go @@ -3,7 +3,6 @@ package node import ( "context" "fmt" - "net" "os" "path/filepath" @@ -11,82 +10,53 @@ import ( ) const ( - maxSecondaryNICs = 7 + // BridgeName is the Linux bridge used for VM networking. + BridgeName = "cni0" - cniConfDir = "/etc/cni/net.d" - cniConfFile = "30-dnsmasq-dhcp.conflist" - cniConfContent = `{ - "cniVersion": "1.0.0", - "name": "dnsmasq-dhcp", - "plugins": [ - { - "type": "bridge", - "bridge": "cni0", - "isGateway": false, - "ipMasq": false, - "ipam": {} - } - ] -} -` + cniConfDir = "/etc/cni/net.d" + cniConfFile = "30-cocoon-dhcp.conflist" ) // Config holds parameters for node setup. type Config struct { - // Networking config - Gateway string - SubnetCIDR string - DNSServers []string - PrimaryNIC string - - // Resources - IPs []string - - // SkipIPTables omits the iptables FORWARD + NAT MASQUERADE rules. Set this - // when adopting an existing manually-provisioned node whose firewall rules - // were tuned by hand and where cocoon-net's MASQUERADE would change the - // outbound source IP visible to peers (e.g. GKE alias-IP routing already - // makes per-VM source IPs reachable via the host NIC, and a blanket - // MASQUERADE rewrite would mask them as the host's own IP). Other steps - // (sysctl, bridge, routes, dnsmasq, CNI conflist) still run. + Gateway string + SubnetCIDR string + PrimaryNIC string + SecondaryNICs []string // platform-provided (e.g. Volcengine eth1..eth7) + + // SkipIPTables omits the iptables FORWARD + NAT MASQUERADE rules. SkipIPTables bool } -// Setup configures all node networking components in order: -// 1. sysctl -// 2. cni0 bridge -// 3. host routes -// 4. iptables -// 5. dnsmasq -// 6. CNI conflist +// Setup configures host networking components (idempotent): +// 1. cni0 bridge (must exist before sysctl sets per-interface params) +// 2. sysctl (ip_forward, rp_filter) +// 3. iptables FORWARD + NAT +// 4. CNI conflist +// +// Host routes (/32) are NOT added here — they are managed dynamically +// by the DHCP server when leases are granted/released. func Setup(ctx context.Context, cfg *Config) error { logger := log.WithFunc("node.Setup") - secondaryNICs := detectSecondaryNICs() - logger.Infof(ctx, "detected secondary NICs: %v", secondaryNICs) - - if err := setupSysctl(ctx, cfg.PrimaryNIC, secondaryNICs); err != nil { - return fmt.Errorf("sysctl: %w", err) + if len(cfg.SecondaryNICs) > 0 { + logger.Infof(ctx, "secondary NICs: %v", cfg.SecondaryNICs) } if err := setupBridge(ctx, cfg.Gateway, cfg.SubnetCIDR); err != nil { return fmt.Errorf("bridge: %w", err) } - if err := setupRoutes(ctx, cfg.IPs); err != nil { - return fmt.Errorf("routes: %w", err) + if err := setupSysctl(ctx, cfg.PrimaryNIC, cfg.SecondaryNICs); err != nil { + return fmt.Errorf("sysctl: %w", err) } if cfg.SkipIPTables { logger.Info(ctx, "iptables setup skipped (SkipIPTables=true)") - } else if err := setupIPTables(ctx, cfg.SubnetCIDR, secondaryNICs); err != nil { + } else if err := setupIPTables(ctx, cfg.SubnetCIDR, cfg.SecondaryNICs); err != nil { return fmt.Errorf("iptables: %w", err) } - if err := setupDNSMasq(ctx, cfg); err != nil { - return fmt.Errorf("dnsmasq: %w", err) - } - if err := writeCNIConflist(ctx); err != nil { return fmt.Errorf("cni conflist: %w", err) } @@ -95,35 +65,38 @@ func Setup(ctx context.Context, cfg *Config) error { return nil } -// writeCNIConflist writes the dnsmasq-dhcp CNI conflist if content has changed. +// writeCNIConflist writes the cocoon-dhcp CNI conflist if content has changed. func writeCNIConflist(ctx context.Context) error { logger := log.WithFunc("node.writeCNIConflist") + content := fmt.Sprintf(`{ + "cniVersion": "1.0.0", + "name": "cocoon-dhcp", + "plugins": [ + { + "type": "bridge", + "bridge": %q, + "isGateway": false, + "ipMasq": false, + "ipam": {} + } + ] +} +`, BridgeName) + if err := os.MkdirAll(cniConfDir, 0o750); err != nil { return fmt.Errorf("create cni conf dir: %w", err) } confPath := filepath.Join(cniConfDir, cniConfFile) - if existing, err := os.ReadFile(confPath); err == nil && string(existing) == cniConfContent { //nolint:gosec // known path + if existing, err := os.ReadFile(confPath); err == nil && string(existing) == content { //nolint:gosec // known path logger.Info(ctx, "CNI conflist unchanged, skipping write") return nil } - if err := os.WriteFile(confPath, []byte(cniConfContent), 0o644); err != nil { //nolint:gosec // readable config + if err := os.WriteFile(confPath, []byte(content), 0o644); err != nil { //nolint:gosec // readable config return fmt.Errorf("write cni conflist: %w", err) } logger.Infof(ctx, "wrote CNI conflist to %s", confPath) return nil } - -// detectSecondaryNICs returns the list of secondary NIC names (eth1..ethN) that exist. -func detectSecondaryNICs() []string { - var nics []string - for i := 1; i <= maxSecondaryNICs; i++ { - name := fmt.Sprintf("eth%d", i) - if _, err := net.InterfaceByName(name); err == nil { - nics = append(nics, name) - } - } - return nics -} diff --git a/node/routes.go b/node/routes.go deleted file mode 100644 index 33f51c7..0000000 --- a/node/routes.go +++ /dev/null @@ -1,29 +0,0 @@ -package node - -import ( - "context" - "fmt" - "os/exec" - "strings" - - "github.com/projecteru2/core/log" -) - -// setupRoutes adds /32 host routes for each VM IP pointing to cni0 using ip -batch. -func setupRoutes(ctx context.Context, ips []string) error { - logger := log.WithFunc("node.setupRoutes") - - var batch strings.Builder - for _, ip := range ips { - fmt.Fprintf(&batch, "route replace %s/32 dev %s\n", ip, cni0Bridge) - } - - cmd := exec.CommandContext(ctx, "ip", "-batch", "-") - cmd.Stdin = strings.NewReader(batch.String()) - out, err := cmd.CombinedOutput() - if err != nil { - return fmt.Errorf("batch route add: %w: %s", err, out) - } - logger.Infof(ctx, "added %d host routes via %s", len(ips), cni0Bridge) - return nil -} diff --git a/platform/platform.go b/platform/platform.go index 28a6070..34a4ffd 100644 --- a/platform/platform.go +++ b/platform/platform.go @@ -1,6 +1,9 @@ package platform -import "context" +import ( + "context" + "fmt" +) const ( PlatformGKE = "gke" @@ -17,6 +20,21 @@ func DefaultNIC(platformName string) string { } } +// DefaultSecondaryNICs returns the expected secondary NIC names for a platform. +// GKE has no secondary NICs; Volcengine uses eth1..eth7 for ENIs. +func DefaultSecondaryNICs(platformName string) []string { + switch platformName { + case PlatformVolcengine: + nics := make([]string, 7) //nolint:mnd + for i := range nics { + nics[i] = fmt.Sprintf("eth%d", i+1) + } + return nics + default: + return nil + } +} + // CloudPlatform is the interface implemented by each cloud provider. type CloudPlatform interface { // Name returns the platform identifier ("gke", "volcengine"). @@ -41,11 +59,12 @@ type Config struct { // NetworkResult is returned by ProvisionNetwork. type NetworkResult struct { - Platform string - SubnetCIDR string - Gateway string - PrimaryNIC string - IPs []string + Platform string + SubnetCIDR string + Gateway string + PrimaryNIC string + SecondaryNICs []string // Volcengine: eth1..eth7; GKE: nil + IPs []string } // PoolStatus holds live status information from the cloud platform. diff --git a/platform/volcengine/volcengine.go b/platform/volcengine/volcengine.go index f472b5d..c474c53 100644 --- a/platform/volcengine/volcengine.go +++ b/platform/volcengine/volcengine.go @@ -18,7 +18,12 @@ import ( "github.com/cocoonstack/cocoon-net/platform" ) -var _ platform.CloudPlatform = (*Volcengine)(nil) +var ( + _ platform.CloudPlatform = (*Volcengine)(nil) + + envOnce sync.Once + envErr error +) const ( metadataBase = "http://100.96.0.96/latest/meta-data" @@ -115,12 +120,18 @@ func (v *Volcengine) ProvisionNetwork(ctx context.Context, cfg *platform.Config) platform.SortIPs(allIPs) + var secondaryNICs []string + for i := 1; i <= enisPerNode; i++ { + secondaryNICs = append(secondaryNICs, fmt.Sprintf("eth%d", i)) + } + return &platform.NetworkResult{ - Platform: v.Name(), - SubnetCIDR: cfg.SubnetCIDR, - Gateway: gateway, - IPs: allIPs, - PrimaryNIC: primaryNIC, + Platform: v.Name(), + SubnetCIDR: cfg.SubnetCIDR, + Gateway: gateway, + PrimaryNIC: primaryNIC, + SecondaryNICs: secondaryNICs, + IPs: allIPs, }, nil } @@ -420,11 +431,6 @@ func listENIs(ctx context.Context, instanceID string) ([]networkInterface, error return resp.Result.NetworkInterfaceSets, nil } -var ( - envOnce sync.Once - envErr error -) - // setupEnv reads Volcengine credentials from config file or env vars. // It is safe to call multiple times; the actual work runs at most once. func setupEnv() error { diff --git a/pool/pool.go b/pool/pool.go index 93cf27b..feeda33 100644 --- a/pool/pool.go +++ b/pool/pool.go @@ -15,17 +15,20 @@ const poolFileName = "pool.json" // State represents the pool state persisted to disk. type State struct { - Platform string `json:"platform"` - NodeName string `json:"nodeName"` - Subnet string `json:"subnet"` - Gateway string `json:"gateway"` - IPs []string `json:"ips"` - ENIIDs []string `json:"eniIDs,omitempty"` - SubnetID string `json:"subnetID,omitempty"` - UpdatedAt time.Time `json:"updatedAt"` + Platform string `json:"platform"` + NodeName string `json:"nodeName"` + Subnet string `json:"subnet"` + Gateway string `json:"gateway"` + PrimaryNIC string `json:"primaryNIC,omitempty"` // StateDir is used at runtime and not persisted. StateDir string `json:"-"` + + SecondaryNICs []string `json:"secondaryNICs,omitempty"` + IPs []string `json:"ips"` + ENIIDs []string `json:"eniIDs,omitempty"` + SubnetID string `json:"subnetID,omitempty"` + UpdatedAt time.Time `json:"updatedAt"` } // Save writes the pool state to StateDir/pool.json.