Skip to content

feat: Improve startup time with init containers#363

Open
adityathebe wants to merge 1 commit into
mainfrom
feat/improve-installation-duration
Open

feat: Improve startup time with init containers#363
adityathebe wants to merge 1 commit into
mainfrom
feat/improve-installation-duration

Conversation

@adityathebe

@adityathebe adityathebe commented Mar 24, 2026

Copy link
Copy Markdown
Member

We have a complex dependency between the pods installed by this chart.

postgres: kratos, config-db, canary-checker
kratos: mission-control

On first installation, it takes time for the db and kratos to come up.
We want to avoid running into the exponential backoff before the
dependencies are even ready

Summary by CodeRabbit

  • Chores
    • Enhanced deployment startup reliability by adding initialization checks that wait for service dependencies to be available before starting main containers.
    • Added health checks for mission-control service and PostgreSQL database connectivity during pod initialization.
    • Configured Kratos authentication provider to wait for service readiness with 15-minute timeout.

We have a complex dependency between the pods installed by this chart.

postgres: kratos, config-db, canary-checker
kratos: mission-control

On first installation, it takes time for the db and kratos to come up.
We want to avoid running into the exponential backoff before the
dependencies are even ready
@coderabbitai

coderabbitai Bot commented Mar 24, 2026

Copy link
Copy Markdown

Walkthrough

Changes add startup dependency checks via init containers in Kubernetes deployments. The deployment template conditionally includes a wait-for-kratos init container when Kratos authentication is configured. Configuration values enable HTTP and TCP health checks for database and service dependencies before main containers start.

Changes

Cohort / File(s) Summary
Init Container Configuration
chart/templates/deployment.yaml, chart/values.yaml
Adds conditional init container to deployment for Kratos TCP health check, and configuration blocks for HTTP health checks (mission-control) and TCP checks (postgres) with 15-minute timeouts.
Dependency Management
chart/test/go.mod
Promotes github.com/chromedp/cdproto from indirect to direct dependency.
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main change: adding init containers to the deployment to improve startup behavior by waiting for dependencies, as detailed in the objectives.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/improve-installation-duration
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch feat/improve-installation-duration

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
chart/values.yaml (2)

410-422: Init container image doesn't follow the global registry pattern.

The wait4x image is hardcoded to docker.io/atkrad/wait4x:3.6, whereas other images in this chart use .Values.global.imageRegistry. This could cause issues in air-gapped or restricted environments where docker.io is not accessible.

Consider making this configurable or documenting the external dependency.


445-457: Duplicated init container configuration.

This block is identical to the canary-checker init container (lines 410-422). If more components need this pattern, consider extracting a reusable template or helper, or documenting the pattern for maintainability.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@chart/values.yaml` around lines 445 - 457, The initContainer block for
wait-for-mission-control is duplicated with the canary-checker init container;
remove duplication by extracting the shared initContainer spec into a reusable
definition (e.g., a Helm helper template or a YAML anchor) and reference it from
both places instead of copying the block; specifically, create a reusable
snippet for the wait4x/http check and replace the concrete initContainer entries
for "wait-for-mission-control" and "canary-checker" with references to that
snippet (or call the helper), keeping the unique name fields (name:
wait-for-mission-control, name: canary-checker) while sharing the common
image/command configuration.
chart/templates/deployment.yaml (1)

35-46: Use HTTP health check with Kratos health endpoint for more robust readiness detection.

The init container uses a TCP check for kratos-public:80, but Kratos provides dedicated health endpoints. A TCP connection succeeding doesn't guarantee the service is fully ready to handle requests. Kratos's /health/ready endpoint provides more robust readiness detection by verifying both HTTP server availability and database connectivity.

💡 Suggested improvement using HTTP health check
       {{- if and (eq .Values.authProvider "kratos") .Values.kratos.enabled }}
       initContainers:
         - name: wait-for-kratos
           image: docker.io/atkrad/wait4x:3.6
           imagePullPolicy: IfNotPresent
           command:
             - wait4x
-            - tcp
-            - kratos-public:80
+            - http
+            - http://kratos-public:80/health/ready
+            - --expect-status-code
+            - "200"
             - -t
             - 15m
       {{- end }}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@chart/templates/deployment.yaml` around lines 35 - 46, The initContainer
wait-for-kratos currently uses a TCP check which can report ready before Kratos
is fully initialized; update the initContainer named wait-for-kratos to use an
HTTP health probe against Kratos's readiness endpoint (e.g., /health/ready on
kratos-public:80) by switching the wait4x command from tcp to http and adding
the appropriate path and expected status options so the container only proceeds
when the HTTP readiness check succeeds; modify the initContainers block in
deployment.yaml (look for initContainers and the wait-for-kratos container
definition) to use wait4x http targeting /health/ready.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@chart/templates/deployment.yaml`:
- Around line 35-46: The initContainer wait-for-kratos currently uses a TCP
check which can report ready before Kratos is fully initialized; update the
initContainer named wait-for-kratos to use an HTTP health probe against Kratos's
readiness endpoint (e.g., /health/ready on kratos-public:80) by switching the
wait4x command from tcp to http and adding the appropriate path and expected
status options so the container only proceeds when the HTTP readiness check
succeeds; modify the initContainers block in deployment.yaml (look for
initContainers and the wait-for-kratos container definition) to use wait4x http
targeting /health/ready.

In `@chart/values.yaml`:
- Around line 445-457: The initContainer block for wait-for-mission-control is
duplicated with the canary-checker init container; remove duplication by
extracting the shared initContainer spec into a reusable definition (e.g., a
Helm helper template or a YAML anchor) and reference it from both places instead
of copying the block; specifically, create a reusable snippet for the
wait4x/http check and replace the concrete initContainer entries for
"wait-for-mission-control" and "canary-checker" with references to that snippet
(or call the helper), keeping the unique name fields (name:
wait-for-mission-control, name: canary-checker) while sharing the common
image/command configuration.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 886d66d0-3169-4806-a5bd-5189fea23e87

📥 Commits

Reviewing files that changed from the base of the PR and between e80802e and f70b96d.

⛔ Files ignored due to path filters (1)
  • chart/test/go.sum is excluded by !**/*.sum
📒 Files selected for processing (3)
  • chart/templates/deployment.yaml
  • chart/test/go.mod
  • chart/values.yaml

{{- if and (eq .Values.authProvider "kratos") .Values.kratos.enabled }}
initContainers:
- name: wait-for-kratos
image: docker.io/atkrad/wait4x:3.6

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this introduces 2 reliability issues:

  1. Docker rate limiting will cause the instance to fail to start
  2. A kratos failure will prevent mission-control starting i.e. moving it into the critical path

Maybe we should just copy the kraros seed scripts and run them ourselves?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants