Skip to content

fix: add webhook readiness check to hub-agent#407

Open
ytimocin wants to merge 1 commit intokubefleet-dev:mainfrom
ytimocin:fix/webhook-readiness-check
Open

fix: add webhook readiness check to hub-agent#407
ytimocin wants to merge 1 commit intokubefleet-dev:mainfrom
ytimocin:fix/webhook-readiness-check

Conversation

@ytimocin
Copy link
Contributor

@ytimocin ytimocin commented Jan 8, 2026

Description of your changes

This pull request introduces a readiness check for the webhook component in the hub agent, ensuring that the pod is not marked as ready until the webhook configurations have been successfully created in the API server. This improves the reliability of the deployment by preventing premature readiness signaling. Additional unit tests are added to verify the new readiness logic, and the e2e setup script is updated to wait for the deployment to become ready.

Webhook readiness improvements:

  • Modified SetupWebhook in cmd/hubagent/main.go to return the webhook Config object and register a readiness check using its ReadinessChecker method, ensuring the pod only becomes ready after webhook configuration is complete. [1] [2]
  • Added an atomic ready flag to the Config struct in pkg/webhook/webhook.go, set to true after successful webhook configuration, and implemented the ReadinessChecker method to expose this as a health check. [1] [2]

Testing enhancements:

  • Added unit tests in pkg/webhook/webhook_test.go to verify the readiness checker logic, including concurrent access scenarios.

Deployment reliability:

  • Updated test/e2e/setup.sh to explicitly wait for the hub-agent deployment (including the webhook) to become ready before proceeding.

Test code refactoring:

  • Changed test case maps in pkg/webhook/webhook_test.go to use pointers, improving test code consistency. [1] [2] [3]

Fixes #404

I have:

  • Run make reviewable to ensure this PR is ready for review.

How has this code been tested

By running e2e tests with and w/o the changes.

Special notes for your reviewer

N/A

@codecov
Copy link

codecov bot commented Jan 8, 2026

Codecov Report

❌ Patch coverage is 71.42857% with 2 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
pkg/webhook/webhook.go 71.42% 2 Missing ⚠️

📢 Thoughts on this report? Let us know!

@ytimocin ytimocin force-pushed the fix/webhook-readiness-check branch 2 times, most recently from ac913aa to cfd8a71 Compare January 9, 2026 19:29

# Wait for hub-agent deployment to be ready (includes webhook)
echo "Waiting for hub-agent deployment to be ready..."
kubectl rollout status deployment/hub-agent -n fleet-system --timeout=2m
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can do this with the helm install options --wait and --timeout so that you don't need a separate command?

klog.ErrorS(err, "unable to setup webhook configurations in apiserver")
return err
}
w.ready.Store(true)
Copy link
Member

@weng271190436 weng271190436 Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this does not solve the issue: #404

You are observing

Internal error occurred: failed calling webhook "fleet.membercluster.validating":
failed to call webhook: Post "[https://fleetwebhook.fleet-system.svc:9443/validate-cluster.kubernetes-fleet.io-v1beta1-membercluster?timeout=5s](vscode-file://vscode-app/Applications/Visual%20Studio%20Code.app/Contents/Resources/app/out/vs/code/electron-browser/workbench/workbench.html)":
dial tcp 10.96.27.105:9443: connect: connection refused

this means that the webhook was registered and k8s api server knows about it. Otherwise the api server won't know that there is a webhook "fleet.membercluster.validating" to call

I searched a bit and it seems that the webhook server's started checker (which is a built-in health checker) is one way to do it

https://pkg.go.dev/sigs.k8s.io/controller-runtime/pkg/webhook#DefaultServer.StartedChecker

wh := mgr.GetWebhookServer()
wh.StartedChecker()

Signed-off-by: Yetkin Timocin <ytimocin@gmail.com>
@ytimocin ytimocin force-pushed the fix/webhook-readiness-check branch from cfd8a71 to 04d37d7 Compare January 14, 2026 17:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] E2E tests fail with webhook connection refused during MemberCluster creation

2 participants