feat: Improve startup time with init containers#363
Conversation
We have a complex dependency between the pods installed by this chart. postgres: kratos, config-db, canary-checker kratos: mission-control On first installation, it takes time for the db and kratos to come up. We want to avoid running into the exponential backoff before the dependencies are even ready
WalkthroughChanges add startup dependency checks via init containers in Kubernetes deployments. The deployment template conditionally includes a wait-for-kratos init container when Kratos authentication is configured. Configuration values enable HTTP and TCP health checks for database and service dependencies before main containers start. Changes
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (3)
chart/values.yaml (2)
410-422: Init container image doesn't follow the global registry pattern.The
wait4ximage is hardcoded todocker.io/atkrad/wait4x:3.6, whereas other images in this chart use.Values.global.imageRegistry. This could cause issues in air-gapped or restricted environments wheredocker.iois not accessible.Consider making this configurable or documenting the external dependency.
445-457: Duplicated init container configuration.This block is identical to the
canary-checkerinit container (lines 410-422). If more components need this pattern, consider extracting a reusable template or helper, or documenting the pattern for maintainability.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@chart/values.yaml` around lines 445 - 457, The initContainer block for wait-for-mission-control is duplicated with the canary-checker init container; remove duplication by extracting the shared initContainer spec into a reusable definition (e.g., a Helm helper template or a YAML anchor) and reference it from both places instead of copying the block; specifically, create a reusable snippet for the wait4x/http check and replace the concrete initContainer entries for "wait-for-mission-control" and "canary-checker" with references to that snippet (or call the helper), keeping the unique name fields (name: wait-for-mission-control, name: canary-checker) while sharing the common image/command configuration.chart/templates/deployment.yaml (1)
35-46: Use HTTP health check with Kratos health endpoint for more robust readiness detection.The init container uses a TCP check for
kratos-public:80, but Kratos provides dedicated health endpoints. A TCP connection succeeding doesn't guarantee the service is fully ready to handle requests. Kratos's/health/readyendpoint provides more robust readiness detection by verifying both HTTP server availability and database connectivity.💡 Suggested improvement using HTTP health check
{{- if and (eq .Values.authProvider "kratos") .Values.kratos.enabled }} initContainers: - name: wait-for-kratos image: docker.io/atkrad/wait4x:3.6 imagePullPolicy: IfNotPresent command: - wait4x - - tcp - - kratos-public:80 + - http + - http://kratos-public:80/health/ready + - --expect-status-code + - "200" - -t - 15m {{- end }}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@chart/templates/deployment.yaml` around lines 35 - 46, The initContainer wait-for-kratos currently uses a TCP check which can report ready before Kratos is fully initialized; update the initContainer named wait-for-kratos to use an HTTP health probe against Kratos's readiness endpoint (e.g., /health/ready on kratos-public:80) by switching the wait4x command from tcp to http and adding the appropriate path and expected status options so the container only proceeds when the HTTP readiness check succeeds; modify the initContainers block in deployment.yaml (look for initContainers and the wait-for-kratos container definition) to use wait4x http targeting /health/ready.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@chart/templates/deployment.yaml`:
- Around line 35-46: The initContainer wait-for-kratos currently uses a TCP
check which can report ready before Kratos is fully initialized; update the
initContainer named wait-for-kratos to use an HTTP health probe against Kratos's
readiness endpoint (e.g., /health/ready on kratos-public:80) by switching the
wait4x command from tcp to http and adding the appropriate path and expected
status options so the container only proceeds when the HTTP readiness check
succeeds; modify the initContainers block in deployment.yaml (look for
initContainers and the wait-for-kratos container definition) to use wait4x http
targeting /health/ready.
In `@chart/values.yaml`:
- Around line 445-457: The initContainer block for wait-for-mission-control is
duplicated with the canary-checker init container; remove duplication by
extracting the shared initContainer spec into a reusable definition (e.g., a
Helm helper template or a YAML anchor) and reference it from both places instead
of copying the block; specifically, create a reusable snippet for the
wait4x/http check and replace the concrete initContainer entries for
"wait-for-mission-control" and "canary-checker" with references to that snippet
(or call the helper), keeping the unique name fields (name:
wait-for-mission-control, name: canary-checker) while sharing the common
image/command configuration.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 886d66d0-3169-4806-a5bd-5189fea23e87
⛔ Files ignored due to path filters (1)
chart/test/go.sumis excluded by!**/*.sum
📒 Files selected for processing (3)
chart/templates/deployment.yamlchart/test/go.modchart/values.yaml
| {{- if and (eq .Values.authProvider "kratos") .Values.kratos.enabled }} | ||
| initContainers: | ||
| - name: wait-for-kratos | ||
| image: docker.io/atkrad/wait4x:3.6 |
There was a problem hiding this comment.
I think this introduces 2 reliability issues:
- Docker rate limiting will cause the instance to fail to start
- A kratos failure will prevent mission-control starting i.e. moving it into the critical path
Maybe we should just copy the kraros seed scripts and run them ourselves?
We have a complex dependency between the pods installed by this chart.
postgres: kratos, config-db, canary-checker
kratos: mission-control
On first installation, it takes time for the db and kratos to come up.
We want to avoid running into the exponential backoff before the
dependencies are even ready
Summary by CodeRabbit