[BAC-492] Implement HPA transition step by juanariastn · Pull Request #68 · TiendaNube/eks-orb

juanariastn · 2026-02-12T21:58:37Z

Why? 🤔

When a service transitions from fixed spec.replicas to HPA-managed autoscaling, Kubernetes defaults the replica count to 1, causing partial downtime for services that cannot handle their workload with a single pod.
This happens because of how kubectl apply performs a three-way merge: it compares the previous manifest (stored in the last-applied-configuration annotation), the new desired manifest, and the live state. When spec.replicas was previously declared and is now omitted (because HPA should manage it), the three-way merge interprets this as an intentional deletion and removes the field — causing Kubernetes to default to 1 replica instead of preserving the current count.
Previous attempts to fix this — setting spec.replicas in the Helm template even when HPA is enabled, combined with ArgoCD ignoreDifferences — introduced a secondary issue: RespectIgnoreDifferences strips the field from the applied manifest during sync, which triggers the same three-way merge deletion, bringing replicas back to 1.

What? 📃

Helm chart microservices-v8uses the standard Helm pattern — spec.replicas is not set when autoscaling.enabled=true.
This PR adds a new pre-deploy step (argo_hpa_transition.sh) that runs before upserting the ArgoCD Application. When autoscaling.enabled: true is detected in the user's values file, the script:

Checks if the Rollout exists in the cluster
Reads the last-applied-configuration annotation from the Rollout
If spec.replicas is present, removes it using jq and patches the annotation back
This ensures the three-way merge sees "field was absent, still absent" and preserves the live replica count. The operation is idempotent — if the Rollout doesn't exist or spec.replicas is already absent, the step is a no-op.

This solution works analogously to https://github.com/max0ne/helm-omit-replicas-for-hpa

Additional Links 🔗

BAC-492

Summary by CodeRabbit

New Features
- Staging rollouts now enable autoscaling (min 2 / max 4, target CPU 80%) for more flexible capacity.
Chores
- Deployment pipeline adds a prepare-step to coordinate autoscaler handoffs during rollouts.
- Added tooling to clean deployment metadata before enabling autoscaling, preventing replica conflicts and ensuring smooth HPA transition.
- CI: minor environment variable value updated for staging deploy tests.

…c When Re-enabling HPA

…eplicas

Implement HPA transition preparation script and update ArgoCD deployment steps to clean spec.replicas from last-applied-configuration. This change prevents conflicts during HPA transitions by ensuring the correct replica count is maintained.

Refactor the HPA transition script to replace the ARGO_PARAMETERS_FILE variable with VALUES_FILE, aligning with the Helm values file usage. Update the ArgoCD deployment steps accordingly to ensure consistency in environment variable naming and improve clarity in the script's usage instructions.

coderabbitai · 2026-02-12T21:59:00Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

Adds an autoscaling block to the staging values file, inserts a pipeline step "Prepare HPA transition" to run a new script, and adds a script that removes spec.replicas from a Rollout’s kubectl.kubernetes.io/last-applied-configuration annotation to prepare for HPA handoff.

Changes

Cohort / File(s)	Summary
CI values `.circleci/resources/values-test-rollout-staging.yml`	Adds `autoscaling` block: `enabled: true`, `minReplicas: 2`, `maxReplicas: 4`, `targetCPUUtilizationPercentage: 80`.
Pipeline `src/commands/argo-deploy-application.yml`	Inserts new step "Prepare HPA transition" (sets `RELEASE_NAME`, `NAMESPACE`) that runs `scripts/argo_hpa_transition.sh` before "Upsert Argo Application".
HPA transition script (new) `src/scripts/argo_hpa_transition.sh`	Adds script that validates `VALUES_FILE`, `RELEASE_NAME`, `NAMESPACE`; checks `autoscaling.enabled` via `yq` and skips if not enabled; verifies Rollout exists; removes `spec.replicas` from the `kubectl.kubernetes.io/last-applied-configuration` annotation using `jq` and updates the Rollout annotation; includes usage, colored output, and error handling.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Merge Conflict Detection	⚠️ Warning	❌ Merge conflicts detected (7 files): ⚔️ `.circleci/resources/values-test-rollout-staging.yml` (content) ⚔️ `.circleci/test-deploy.yml` (content) ⚔️ `src/commands/argo-deploy-application.yml` (content) ⚔️ `src/jobs/argo-deploy.yml` (content) ⚔️ `src/scripts/argo_app_previous_status.sh` (content) ⚔️ `src/scripts/argo_cli_common.sh` (content) ⚔️ `src/scripts/argo_rollout_status_common.sh` (content) These conflicts must be resolved before merging into `master`.	Resolve conflicts locally and push changes to this branch.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically identifies the main change: implementing an HPA transition step to fix the replica count issue during autoscaling migration.
Description check	✅ Passed	The description follows the template structure with well-filled Why and What sections, providing comprehensive context about the problem, solution, and implementation details.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch bac-492-hpa-transition

⚔️ Resolve merge conflicts (beta)

Auto-commit resolved conflicts to branch bac-492-hpa-transition
Create stacked PR with resolved conflicts
Post resolved changes as copyable diffs in a comment

No actionable comments were generated in the recent review. 🎉

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@src/scripts/argo_hpa_transition.sh`:
- Around line 48-55: Before invoking yq, explicitly verify the values file
exists and is readable (check values_file) and if not, print an error and exit
non‑zero; after running yq to set hpa_enabled, detect yq failure (non‑zero exit
status) instead of redirecting stderr to /dev/null and if yq fails print the yq
error and exit non‑zero so we don’t silently skip HPA handling; retain the
existing conditional that skips when hpa_enabled != "true", but ensure failures
are treated as errors (use exit 1) rather than returning success.

coderabbitai · 2026-02-12T22:03:33Z

src/scripts/argo_hpa_transition.sh

+  # Check if autoscaling is being enabled in this deploy
+  local hpa_enabled
+  hpa_enabled=$(yq e '.autoscaling.enabled' "$values_file" 2>/dev/null)
+
+  if [[ "$hpa_enabled" != "true" ]]; then
+    echo -e "${GREEN}HPA not enabled in this deploy, skipping.${NC}"
+    return 0
+  fi


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # First, let's see the file structure and how VALUES_FILE is initialized head -60 src/scripts/argo_hpa_transition.sh

Repository: TiendaNube/eks-orb

Length of output: 2361

🏁 Script executed:

#!/bin/bash # Continue reading the script to see what happens after the autoscaling check tail -n +56 src/scripts/argo_hpa_transition.sh | head -40

Repository: TiendaNube/eks-orb

Length of output: 1765

🏁 Script executed:

#!/bin/bash # Search for how this script is called or referenced in the repo rg "argo_hpa_transition" --type-list find . -type f -name "*.sh" -o -name "*.yml" -o -name "*.yaml" | head -20 rg "argo_hpa_transition|VALUES_FILE" -l

Repository: TiendaNube/eks-orb

Length of output: 1259

🏁 Script executed:

#!/bin/bash # Search for how argo_hpa_transition.sh is invoked rg "argo_hpa_transition" -B 3 -A 3

Repository: TiendaNube/eks-orb

Length of output: 603

🏁 Script executed:

#!/bin/bash # Get the full argo-deploy-application.yml to understand the context cat -n src/commands/argo-deploy-application.yml | head -100

Repository: TiendaNube/eks-orb

Length of output: 4332

🏁 Script executed:

#!/bin/bash # Continue reading to see how values files are handled cat -n src/commands/argo-deploy-application.yml | sed -n '100,200p'

Repository: TiendaNube/eks-orb

Length of output: 5003

🏁 Script executed:

#!/bin/bash # Check the test values files to see how autoscaling is configured echo "=== Test values files ===" cat -n .circleci/resources/values-test-rollout-*.yml

Repository: TiendaNube/eks-orb

Length of output: 1990

🏁 Script executed:

#!/bin/bash # Check the test-deploy.yml to see how the script is invoked with values files cat -n .circleci/test-deploy.yml

Repository: TiendaNube/eks-orb

Length of output: 15461

🏁 Script executed:

#!/bin/bash # Check if there are other error-handling patterns in the script we should note cat -n src/scripts/argo_hpa_transition.sh

Repository: TiendaNube/eks-orb

Length of output: 4581

Add file existence check and explicit error handling for yq failures.

If VALUES_FILE doesn't exist or is unreadable, the script silently skips HPA handling due to yq errors being redirected to /dev/null. This masks a real problem: if autoscaling is actually enabled, the replica cleanup (which prevents Kubernetes from defaulting to 1 replica during HPA transition) will be skipped, causing a replica reset. The script should validate the file exists before reading and exit with an error if yq fails instead of silently skipping.

Suggested hardening for file existence + yq failures

# Check if autoscaling is being enabled in this deploy local hpa_enabled - hpa_enabled=$(yq e '.autoscaling.enabled' "$values_file" 2>/dev/null) + if [[ ! -f "$values_file" ]]; then + echo -e "${RED}Error: values file '$values_file' not found.${NC}" + exit 2 + fi + if ! hpa_enabled=$(yq e '.autoscaling.enabled // "false"' "$values_file"); then + echo -e "${RED}Error: failed to read autoscaling.enabled from '$values_file'.${NC}" + exit 2 + fi

🤖 Prompt for AI Agents

In `@src/scripts/argo_hpa_transition.sh` around lines 48 - 55, Before invoking yq, explicitly verify the values file exists and is readable (check values_file) and if not, print an error and exit non‑zero; after running yq to set hpa_enabled, detect yq failure (non‑zero exit status) instead of redirecting stderr to /dev/null and if yq fails print the yq error and exit non‑zero so we don’t silently skip HPA handling; retain the existing conditional that skips when hpa_enabled != "true", but ensure failures are treated as errors (use exit 1) rather than returning success.

Remove VALUES_FILE from HPA transition script and update usage instructions. This change simplifies the script by eliminating the need for the values file variable, ensuring clarity in the required environment variables for deployment.

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@src/scripts/argo_hpa_transition.sh`:
- Around line 49-63: The script currently swallows all kubectl errors; update
the rollout existence check and the last-applied fetch to use --ignore-not-found
and stop masking failures: call kubectl get rollout "$release_name" -n
"$namespace" --ignore-not-found and check its exit code (treat non-zero as a
real error and fail fast, only treat zero+empty output as "not found"), and for
last_applied run kubectl get rollout "$release_name" -n "$namespace" -o
jsonpath='{.metadata.annotations.kubectl\.kubernetes\.io/last-applied-configuration}'
--ignore-not-found without the "|| echo ''" fallback so kubectl errors surface;
keep the existing empty-string check on last_applied to skip when legitimately
absent.

coderabbitai · 2026-02-13T16:05:13Z

src/scripts/argo_hpa_transition.sh

+  # Check if the Rollout exists (might not on first deploy)
+  if ! kubectl get rollout "$release_name" -n "$namespace" > /dev/null 2>&1; then
+    echo -e "${GREEN}Rollout '$release_name' not found (likely first deploy), skipping.${NC}"
+    return 0
+  fi
+
+  # Get the last-applied-configuration annotation
+  local last_applied
+  last_applied=$(kubectl get rollout "$release_name" -n "$namespace" \
+    -o jsonpath='{.metadata.annotations.kubectl\.kubernetes\.io/last-applied-configuration}' 2>/dev/null || echo "")
+
+  if [[ -z "$last_applied" ]]; then
+    echo -e "${GREEN}No last-applied-configuration annotation found, skipping.${NC}"
+    return 0
+  fi


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Check if the file exists and read the relevant lines fd "argo_hpa_transition.sh" -x wc -l {}

Repository: TiendaNube/eks-orb

Length of output: 101

🏁 Script executed:

# Read the file to see the actual code context cat -n src/scripts/argo_hpa_transition.sh | sed -n '40,75p'

Repository: TiendaNube/eks-orb

Length of output: 1749

🏁 Script executed:

# Check for kubectl usage patterns in the script to understand context rg "kubectl" src/scripts/argo_hpa_transition.sh -A 2 -B 2

Repository: TiendaNube/eks-orb

Length of output: 1315

🏁 Script executed:

# Verify kubectl --ignore-not-found flag is standard kubectl get --help 2>/dev/null | grep -A 2 "ignore-not-found" || echo "Checking kubectl documentation"

Repository: TiendaNube/eks-orb

Length of output: 92

🌐 Web query:

kubectl --ignore-not-found flag usage bash error handling

💡 Result:

--ignore-not-found is meant to make “object doesn’t exist” not be a fatal error (useful with set -e / CI scripts), while still failing on real errors.

What it does (exit code behavior)

kubectl get ... --ignore-not-found: if the object isn’t found, it prints nothing and returns exit code 0. (kubernetes.io)

kubectl delete ... --ignore-not-found: if the object isn’t found, it treats that as a successful delete (and returns exit code 0). Other errors still fail. (kubernetes.io)

Bash patterns

Idempotent delete (safe with set -e):

set -euo pipefail kubectl delete configmap my-cm -n my-ns --ignore-not-found

Idempotent get (don’t fail if absent):

set -euo pipefail kubectl get deploy my-deploy -n my-ns --ignore-not-found >/dev/null

If you need to branch on existence (without --ignore-not-found):

if kubectl get deploy my-deploy -n my-ns >/dev/null 2>&1; then echo "exists" else echo "missing (or another error)" fi

If you tell me which command you’re using (get, delete, wait, etc.) and whether set -e is on, I can suggest the most robust pattern for that exact case.

Citations:

1: https://kubernetes.io/docs/reference/kubectl/kubectl-cmds/?utm_source=openai

2: https://kubernetes.io/docs/reference/kubectl/kubectl-cmds/?utm_source=openai

Distinguish kubectl "not found" from real errors to prevent silent failures.

Lines 50 and 57–58 suppress all kubectl failures (RBAC, network, auth, parsing errors) and treat them as "not found," causing the script to silently skip cleanup. This bypasses the protection mechanism and reintroduces the three-way merge conflict risk if kubectl is misconfigured or inaccessible.

Use --ignore-not-found to distinguish true "not found" (exit 0) from real errors, and fail fast on actual kubectl failures:

Suggested fix

- if ! kubectl get rollout "$release_name" -n "$namespace" > /dev/null 2>&1; then - echo -e "${GREEN}Rollout '$release_name' not found (likely first deploy), skipping.${NC}" - return 0 - fi + local rollout_name + if ! rollout_name=$(kubectl get rollout "$release_name" -n "$namespace" --ignore-not-found -o name); then + echo -e "${RED}Failed to query rollout '$release_name' in namespace '$namespace'.${NC}" + return 1 + fi + if [[ -z "$rollout_name" ]]; then + echo -e "${GREEN}Rollout '$release_name' not found (likely first deploy), skipping.${NC}" + return 0 + fi @@ - last_applied=$(kubectl get rollout "$release_name" -n "$namespace" \ - -o jsonpath='{.metadata.annotations.kubectl\.kubernetes\.io/last-applied-configuration}' 2>/dev/null || echo "") + if ! last_applied=$(kubectl get rollout "$release_name" -n "$namespace" \ + -o jsonpath='{.metadata.annotations.kubectl\.kubernetes\.io/last-applied-configuration}'); then + echo -e "${RED}Failed to read last-applied-configuration annotation.${NC}" + return 1 + fi

🤖 Prompt for AI Agents

In `@src/scripts/argo_hpa_transition.sh` around lines 49 - 63, The script currently swallows all kubectl errors; update the rollout existence check and the last-applied fetch to use --ignore-not-found and stop masking failures: call kubectl get rollout "$release_name" -n "$namespace" --ignore-not-found and check its exit code (treat non-zero as a real error and fail fast, only treat zero+empty output as "not found"), and for last_applied run kubectl get rollout "$release_name" -n "$namespace" -o jsonpath='{.metadata.annotations.kubectl\.kubernetes\.io/last-applied-configuration}' --ignore-not-found without the "|| echo ''" fallback so kubectl errors surface; keep the existing empty-string check on last_applied to skip when legitimately absent.

churma and others added 9 commits January 7, 2026 18:49

BAC-492 - Prevent Rollout From Losing spec.replicas During ArgoCD Syn…

aa3e862

…c When Re-enabling HPA

Merge branch 'master' into bac-492-prevent-rollout-from-losing-spec-r…

6b5ad65

…eplicas

Implement HPA transition step

8537bc1

Implement HPA transition preparation script and update ArgoCD deployment steps to clean spec.replicas from last-applied-configuration. This change prevents conflicts during HPA transitions by ensuring the correct replica count is maintained.

Test replicas

dff877d

Test replicaCount 3

38a3779

Test hpa

3005439

Test replicaCount 3

a518d38

Test hpa

609c922

juanariastn requested review from cacciaresi and churma February 12, 2026 21:58

juanariastn self-assigned this Feb 12, 2026

juanariastn added enhancement New feature or request canary labels Feb 12, 2026

juanariastn mentioned this pull request Feb 12, 2026

BAC-492 - Prevent Rollout From Losing spec.replicas During ArgoCD Syn… #62

Closed

coderabbitai bot reviewed Feb 12, 2026

View reviewed changes

juanariastn added 2 commits February 13, 2026 13:00

Test replicaCount 3

8be3ede

coderabbitai bot reviewed Feb 13, 2026

View reviewed changes

juanariastn added 2 commits February 13, 2026 14:39

Test hpa

6b88c89

Test env var

c871b15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BAC-492] Implement HPA transition step#68

[BAC-492] Implement HPA transition step#68
juanariastn wants to merge 13 commits intomasterfrom
bac-492-hpa-transition

juanariastn commented Feb 12, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 12, 2026 •

edited

Loading

Reviews paused

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 12, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

juanariastn commented Feb 12, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why? 🤔

What? 📃

Additional Links 🔗

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 13, 2026

Choose a reason for hiding this comment

What it does (exit code behavior)

Bash patterns

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

juanariastn commented Feb 12, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 12, 2026 •

edited

Loading