Verify resource deletion before removing finalizer#229
Verify resource deletion before removing finalizer#229a-hilaly wants to merge 1 commit intoaws-controllers-k8s:mainfrom
Conversation
Some AWS APIs return success from Delete immediately while the resource transitions through a deleting state. This change ensures the reconciler verifies the resource is fully deleted (ReadOne returns NotFound) before removing the finalizer and allowing the Kubernetes CR to be deleted. When a resource still exists after Delete, the reconciler sets ACK.ResourceSynced to False with reason "Deleting" and requeues after 15 seconds. This prevents orphaned AWS resources when the CR is deleted before AWS finishes cleaning up.
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: a-hilaly The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
| // is fully gone (ReadOne returns NotFound) before allowing the CR to be | ||
| // deleted from Kubernetes. |
There was a problem hiding this comment.
This should be good enough for most resources, but there could be cases where the AWS resource is still returned by ReadOne, but with a state indicating that deletion was successful. One example that comes to mind is JobRuns from emr-containers. We likely need to added a check for a "deleted" state similar to our sycned:when check.
There was a problem hiding this comment.
We'll also need to consider how this is rolled out since we'll need to have those checks in place before applying this to avoid these kinds of resources becoming stuck during deletion.
There was a problem hiding this comment.
I'm aware of those edge cases and willing to make a code-gen patch to support that. I just wanted to get a feel on this change, and alignment that this is something we want to support in ACK.
| _, verifyErr := rm.ReadOne(ctx, current) | ||
| rlog.Exit("rm.ReadOne (verify deletion)", verifyErr) | ||
| if verifyErr == nil { | ||
| // Resource still exists, requeue to wait for deletion to complete |
There was a problem hiding this comment.
On the second reconcile we may need to protect against sdkDelete returning an error if the AWS API returns a validation error when called on a deleting resource.
There was a problem hiding this comment.
Yeah we'll probably need a both a IsDeleted and IsDeleting
|
@a-hilaly: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Some AWS APIs return success from Delete immediately while the resource
transitions through a deleting state. This change ensures the reconciler
verifies the resource is fully deleted (ReadOne returns NotFound) before
removing the finalizer and allowing the Kubernetes CR to be deleted.
When a resource still exists after Delete, the reconciler sets
ACK.ResourceSynced to False with reason "Deleting" and requeues after
15 seconds. This prevents orphaned AWS resources when the CR is deleted
before AWS finishes cleaning up.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.