fix: add webhook readiness check to hub-agent#407
fix: add webhook readiness check to hub-agent#407ytimocin wants to merge 1 commit intokubefleet-dev:mainfrom
Conversation
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
ac913aa to
cfd8a71
Compare
|
|
||
| # Wait for hub-agent deployment to be ready (includes webhook) | ||
| echo "Waiting for hub-agent deployment to be ready..." | ||
| kubectl rollout status deployment/hub-agent -n fleet-system --timeout=2m |
There was a problem hiding this comment.
you can do this with the helm install options --wait and --timeout so that you don't need a separate command?
| klog.ErrorS(err, "unable to setup webhook configurations in apiserver") | ||
| return err | ||
| } | ||
| w.ready.Store(true) |
There was a problem hiding this comment.
I think this does not solve the issue: #404
You are observing
Internal error occurred: failed calling webhook "fleet.membercluster.validating":
failed to call webhook: Post "[https://fleetwebhook.fleet-system.svc:9443/validate-cluster.kubernetes-fleet.io-v1beta1-membercluster?timeout=5s](vscode-file://vscode-app/Applications/Visual%20Studio%20Code.app/Contents/Resources/app/out/vs/code/electron-browser/workbench/workbench.html)":
dial tcp 10.96.27.105:9443: connect: connection refusedthis means that the webhook was registered and k8s api server knows about it. Otherwise the api server won't know that there is a webhook "fleet.membercluster.validating" to call
I searched a bit and it seems that the webhook server's started checker (which is a built-in health checker) is one way to do it
https://pkg.go.dev/sigs.k8s.io/controller-runtime/pkg/webhook#DefaultServer.StartedChecker
wh := mgr.GetWebhookServer()
wh.StartedChecker()Signed-off-by: Yetkin Timocin <ytimocin@gmail.com>
cfd8a71 to
04d37d7
Compare
Description of your changes
This pull request introduces a readiness check for the webhook component in the hub agent, ensuring that the pod is not marked as ready until the webhook configurations have been successfully created in the API server. This improves the reliability of the deployment by preventing premature readiness signaling. Additional unit tests are added to verify the new readiness logic, and the e2e setup script is updated to wait for the deployment to become ready.
Webhook readiness improvements:
SetupWebhookincmd/hubagent/main.goto return the webhookConfigobject and register a readiness check using itsReadinessCheckermethod, ensuring the pod only becomes ready after webhook configuration is complete. [1] [2]readyflag to theConfigstruct inpkg/webhook/webhook.go, set totrueafter successful webhook configuration, and implemented theReadinessCheckermethod to expose this as a health check. [1] [2]Testing enhancements:
pkg/webhook/webhook_test.goto verify the readiness checker logic, including concurrent access scenarios.Deployment reliability:
test/e2e/setup.shto explicitly wait for thehub-agentdeployment (including the webhook) to become ready before proceeding.Test code refactoring:
pkg/webhook/webhook_test.goto use pointers, improving test code consistency. [1] [2] [3]Fixes #404
I have:
make reviewableto ensure this PR is ready for review.How has this code been tested
By running e2e tests with and w/o the changes.
Special notes for your reviewer
N/A