Skip to content

fix: certificate renewal#3194

Open
claudiolor wants to merge 6 commits intomasterfrom
clo/fix-certificate-renewal
Open

fix: certificate renewal#3194
claudiolor wants to merge 6 commits intomasterfrom
clo/fix-certificate-renewal

Conversation

@claudiolor
Copy link
Contributor

@claudiolor claudiolor commented Feb 9, 2026

Description

This PR fixes the broken certificate renewal flow for both ResourceSlices and Tenant resources and closes #3173 #3172

Motivation

The previous certificate renewal flow was fundamentally broken:

  • Previous approach: The consumer cluster would request certificate renewal by creating a Renew resource, which was propagated by the CRD replicator to the provider cluster. The provider was then responsible for generating the new certificate.

  • Critical bugs:

    • Due to a bug, no new certificate was actually created—instead, the existing certificate was reused, even when expired.
    • There was a bug in the VirtualKubelet and the API server ping that prevented them from supporting certificate upgrades. These components did not automatically use the new certificate once it was updated.
  • Propagation issues: While we were able to propagate the new certificate back to the consumer cluster for ResourceSlices, there was no mechanism to write the new certificate back for Tenant resources.

  • Resource management problems: The way we used Renew resources caused conflicts and required garbage collection. We had to handle old Renew resources, creating unnecessary complexity.

This PR addresses these issues by fundamentally changing the responsibility model: the provider cluster now renews its own certificates when it detects they are approaching expiration (at 2/3 of their lifetime). We now maintain just one Renew resource per tenant containing the most recent certificate, eliminating the need for garbage collection.

Changes

ResourceSlice Certificate Renewal

The remote ResourceSlice controller manager now handles certificate renewal. When a certificate reaches 2/3 of its lifetime, the controller resigns the CSR. The CR replicator then propagates this change back to the remote cluster, where the local ResourceSlice controller completes the renewal process.

Tenant Certificate Renewal

The Tenant resource requires a different approach since the CR replicator doesn't handle it. The renewal flow works as follows:

  1. The Tenant controller renews the certificate
  2. The remote renewer controller updates the Renew resource status
  3. This status is propagated back to the local cluster
  4. The local renewer controller updates the Identity resource with the new certificate

Consumer-Side Renewal Prevention

For now, we've disabled the ability to request certificate renewal from the consumer side via the liqo.io/renew annotation on the Identity resource.

VirtualKubelet and API Server Ping Fixes

Fixed certificate renewal handling in the VirtualKubelet and API server ping components to ensure they properly adopt and use new certificates once updated.

Testing

E2E tests have been added to verify that certificates are renewed and correctly propagated across clusters. To enable testing within a reasonable timeframe, we use Kyverno to inject a policy that patches CSRs to have a 10-minute certificate duration. This ensures certificates are renewed during the E2E test run, allowing us to validate that operations continue smoothly throughout the renewal process.

@adamjensenbot
Copy link
Collaborator

Hi @claudiolor. Thanks for your PR!

I am @adamjensenbot.
You can interact with me issuing a slash command in the first line of a comment.
Currently, I understand the following commands:

  • /rebase: Rebase this PR onto the master branch (You can add the option test=true to launch the tests
    when the rebase operation is completed)
  • /merge: Merge this PR into the master branch
  • /build Build Liqo components
  • /test Launch the E2E and Unit tests
  • /hold, /unhold Add/remove the hold label to prevent merging with /merge

Make sure this PR appears in the liqo changelog, adding one of the following labels:

  • feat: 🚀 New Feature
  • fix: 🐛 Bug Fix
  • refactor: 🧹 Code Refactoring
  • docs: 📝 Documentation
  • style: 💄 Code Style
  • perf: 🐎 Performance Improvement
  • test: ✅ Tests
  • chore: 🚚 Dependencies Management
  • build: 📦 Builds Management
  • ci: 👷 CI/CD
  • revert: ⏪ Reverts Previous Changes

@github-actions github-actions bot added the fix Fixes a bug in the codebase. label Feb 9, 2026
@adamjensenbot
Copy link
Collaborator

The generated artifacts appear to be out-of-date.

Please, ensure you are using the correct version of the generators (eg. controller-gen) and re-run:

make generate
Here it is an excerpt of the diff:
diff --git a/pkg/liqo-controller-manager/authentication/remoterenwer-controller/remoterenewer_controller.go b/pkg/liqo-controller-manager/authentication/remoterenwer-controller/remoterenewer_controller.go%0Aindex 71173f3..1c22784 100644%0A--- a/pkg/liqo-controller-manager/authentication/remoterenwer-controller/remoterenewer_controller.go%0A+++ b/pkg/liqo-controller-manager/authentication/remoterenwer-controller/remoterenewer_controller.go%0A@@ -33,8 +33,8 @@ import (%0A 	authv1beta1 "github.com/liqotech/liqo/apis/authentication/v1beta1"%0A 	"github.com/liqotech/liqo/internal/crdReplicator/reflection"%0A 	"github.com/liqotech/liqo/pkg/consts"%0A-	tenantnamespace "github.com/liqotech/liqo/pkg/tenantNamespace"%0A 	"github.com/liqotech/liqo/pkg/liqo-controller-manager/authentication/utils"%0A+	tenantnamespace "github.com/liqotech/liqo/pkg/tenantNamespace"%0A 	"github.com/liqotech/liqo/pkg/utils/events"%0A 	"github.com/liqotech/liqo/pkg/utils/getters"%0A )

@claudiolor claudiolor force-pushed the clo/fix-certificate-renewal branch 2 times, most recently from c0503ab to a7451d9 Compare February 9, 2026 12:13
@claudiolor
Copy link
Contributor Author

/build

@claudiolor claudiolor force-pushed the clo/fix-certificate-renewal branch from a7451d9 to b3fc91f Compare February 15, 2026 15:59
@claudiolor claudiolor force-pushed the clo/fix-certificate-renewal branch from b3fc91f to 0236d77 Compare February 15, 2026 16:14
@github-actions github-actions bot added the chore Miscellaneous tasks that don't modify application logic (e.g., updating dependencies or build tasks) label Feb 15, 2026
@adamjensenbot
Copy link
Collaborator

The go.mod and/or go.sum files appear not to be correctly tidied.

Please, rerun go mod tidy to fix the issues.

@adamjensenbot
Copy link
Collaborator

The generated artifacts appear to be out-of-date.

Please, ensure you are using the correct version of the generators (eg. controller-gen) and re-run:

make generate
Here it is an excerpt of the diff:

@claudiolor claudiolor force-pushed the clo/fix-certificate-renewal branch from 0236d77 to 03ceab6 Compare February 15, 2026 16:21
@claudiolor
Copy link
Contributor Author

/build

@adamjensenbot
Copy link
Collaborator

The generated artifacts appear to be out-of-date.

Please, ensure you are using the correct version of the generators (eg. controller-gen) and re-run:

make generate
Here it is an excerpt of the diff:
diff --git a/pkg/liqo-controller-manager/authentication/utils/identity.go b/pkg/liqo-controller-manager/authentication/utils/identity.go%0Aindex ac529e0..239e08a 100644%0A--- a/pkg/liqo-controller-manager/authentication/utils/identity.go%0A+++ b/pkg/liqo-controller-manager/authentication/utils/identity.go%0A@@ -88,4 +88,3 @@ func ShouldRenewCertificate(pemCert []byte) (bool, time.Duration, error) {%0A %0A 	return true, requeueIn, nil%0A }%0A-

@claudiolor claudiolor force-pushed the clo/fix-certificate-renewal branch from 03ceab6 to 6b1fd35 Compare February 15, 2026 19:39
@github-actions github-actions bot added the feat Adds a new feature to the codebase label Feb 15, 2026
@adamjensenbot
Copy link
Collaborator

The generated artifacts appear to be out-of-date.

Please, ensure you are using the correct version of the generators (eg. controller-gen) and re-run:

make generate
Here it is an excerpt of the diff:
diff --git a/pkg/liqo-controller-manager/authentication/utils/identity.go b/pkg/liqo-controller-manager/authentication/utils/identity.go%0Aindex ac529e0..239e08a 100644%0A--- a/pkg/liqo-controller-manager/authentication/utils/identity.go%0A+++ b/pkg/liqo-controller-manager/authentication/utils/identity.go%0A@@ -88,4 +88,3 @@ func ShouldRenewCertificate(pemCert []byte) (bool, time.Duration, error) {%0A %0A 	return true, requeueIn, nil%0A }%0A-

@claudiolor
Copy link
Contributor Author

/build

1 similar comment
@claudiolor
Copy link
Contributor Author

/build

@adamjensenbot
Copy link
Collaborator

The generated artifacts appear to be out-of-date.

Please, ensure you are using the correct version of the generators (eg. controller-gen) and re-run:

make generate
Here it is an excerpt of the diff:
diff --git a/pkg/liqo-controller-manager/authentication/utils/identity.go b/pkg/liqo-controller-manager/authentication/utils/identity.go%0Aindex ac529e0..239e08a 100644%0A--- a/pkg/liqo-controller-manager/authentication/utils/identity.go%0A+++ b/pkg/liqo-controller-manager/authentication/utils/identity.go%0A@@ -88,4 +88,3 @@ func ShouldRenewCertificate(pemCert []byte) (bool, time.Duration, error) {%0A %0A 	return true, requeueIn, nil%0A }%0A-

@claudiolor claudiolor force-pushed the clo/fix-certificate-renewal branch from 42777af to 2da9890 Compare February 16, 2026 17:49
@claudiolor
Copy link
Contributor Author

/test

@claudiolor claudiolor force-pushed the clo/fix-certificate-renewal branch from 45e231b to 8d4201b Compare February 17, 2026 19:11
@claudiolor
Copy link
Contributor Author

/build

@claudiolor
Copy link
Contributor Author

/test

This patch changes the certificate renewal flow as the previous one was
broken.
- ResourceSlices: it makes the remote ResourceSlice controller manager
  the renew: when the certificate reaches 2/3 of lifetime, it resign
  the CSR so that the CR replicator can propagate the change back to the
  remote cluster and make the local resourceslice controller handle the
  renewal.
- Tenant: in this case we have a different issue, as the CR replicator
  does not handle the Tenant resource, we make the tenant controller
  renew the certificate, and the remote renewer controller update the
  Renew resource status, which is propagated back to the local cluster.
  At this point, the local renewer controller can update the Identity
  resource with the new certificate.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

chore Miscellaneous tasks that don't modify application logic (e.g., updating dependencies or build tasks) feat Adds a new feature to the codebase fix Fixes a bug in the codebase. size/XXL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

rslice certificate not update when to renew

2 participants