Skip to content

[BUG] AKS LB reconciliation strips manually-added frontend IPs but preserves outbound rules, causing dangling reference error #5634

@illia-haiduchenko

Description

@illia-haiduchenko

Describe the bug

When a manually-configured outbound rule exists on the AKS-managed load balancer (referencing a manually-added frontend IP configuration), any AKS operation that triggers LB reconciliation (az aks update, az aks nodepool update, etc.) reconstructs the LB PUT payload inconsistently:

Manually-added frontend IP configurations are stripped (only IPs from loadBalancerProfile.outboundIPs are included)
Manually-added outbound rules are preserved from the current LB state
This creates a dangling reference: the outbound rule references a frontend IP that no longer exists in the PUT payload. Azure ARM rejects the PUT with InvalidResourceReference, and all nodepools transition to Failed state, blocking any further cluster operations.

Additionally, AKS automatically places all nodepool VMs into aksOutboundBackendPool. When a VMSS is in multiple backend pools with different outbound rules, Azure LB selects the outbound rule nondeterministically (per Microsoft docs). This makes it impossible to reliably use a separate outbound rule for a specific nodepool without manually fixing backend pool membership after every reconciliation.

To Reproduce

Create an AKS cluster with managed VNet and standard load balancer
Manually add a frontend IP configuration to the AKS-managed LB:

az network lb frontend-ip create -g --lb-name kubernetes -n custom-frontend-ip --public-ip-address
Manually add an outbound rule referencing that frontend IP:

az network lb outbound-rule create -g --lb-name kubernetes -n customOutboundRule --frontend-ip-configs custom-frontend-ip --address-pool aksOutboundBackendPool --protocol All
Run any AKS operation, e.g.:

az aks update -g -n --load-balancer-idle-timeout 30
Operation fails with CreateOrUpdateLoadBalancerError — AKS stripped the frontend IP but kept the outbound rule

Expected behavior

AKS should either:

Preserve both manually-added frontend IPs AND outbound rules, or
Strip both — remove outbound rules whose frontend IPs are not managed by AKS, or
Provide a supported per-nodepool outbound IP configuration that works with managed VNets (the NAT Gateway approach requires custom subnets, which are incompatible with AKS managed VNets: Cannot use a custom subnet because agent pool is using a managed subnet)

Environment:

Kubernetes version: 1.31
Load Balancer SKU: Standard
Network plugin: azure
Azure CLI version: 2.77.0

Additional context

Error message:

(CreateOrUpdateLoadBalancerError) Create or update load balancer failed.
Resource .../frontendIPConfigurations/
referenced by resource .../outboundRules/ was not found.
Current workaround: We remove the manual LB resources (frontend IP + outbound rule) before every AKS operation session, then restore them afterward. This requires manual intervention for every cluster update and is error-prone.

Use case: We need a dedicated outbound IP for a specific nodepool so that destination services can allowlist traffic from that nodepool separately.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions