HDDS-15004. Stabilize TestReconContainerEndpoint#testContainerEndpointForOBSBucket#10116
Open
arunsarin85 wants to merge 3 commits intoapache:masterfrom
Open
HDDS-15004. Stabilize TestReconContainerEndpoint#testContainerEndpointForOBSBucket#10116arunsarin85 wants to merge 3 commits intoapache:masterfrom
arunsarin85 wants to merge 3 commits intoapache:masterfrom
Conversation
devmadhuu
reviewed
Apr 27, 2026
Contributor
devmadhuu
left a comment
There was a problem hiding this comment.
Thanks @arunsarin85 for the patch. Kindly find comments.
| cluster.shutdown(); | ||
| } | ||
| } finally { | ||
| ContainerKeyMapperHelper.clearSharedContainerCountMap(); |
Contributor
There was a problem hiding this comment.
If any error in closing the client, this will still clear the map, but cluster shutdown may skip. Not a very good resource handling. Can IOUtils.closeQuietly help ?
| GenericTestUtils.waitFor(completableFuture::isDone, 100, 30000); | ||
| completableFuture.join(); | ||
| // The buffer can be empty while tasks still finish processing a dequeued batch. | ||
| Thread.sleep(2000); |
Contributor
Author
|
@devmadhuu Thanks for the review . I have added a patch for the above changes and triggered the flaky-test-check |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
TestReconContainerEndpoint#testContainerEndpointForOBSBucket was failing intermittently with AssertionFailedError: expected: <1> but was: <0> on KeysResponse#getTotalCount() (Recon’s container-key index had no entry yet, or the wrong container was queried).
Please describe your PR in detail:
This change stabilizes the integration test without altering production code:
Reset container-key mapper static state
ContainerKeyMapperHelper keeps JVM-wide static state (initialization flag, shared count maps, active task counter). After testContainerEndpointForFSOLayout runs, that state could still reflect the previous cluster and break mapper behavior for the next method. The test now calls ContainerKeyMapperHelper.clearSharedContainerCountMap() at the start of each run (@beforeeach) and again in a finally block in @AfterEach so cleanup runs even if shutdown throws.
Surface failures from the async “buffer empty” wait
The test waited on completableFuture::isDone after waitForEventBufferEmpty but never checked completion. If the async runnable failed, the future could still be “done” and the test would continue. It now calls completableFuture.join() after the wait so failures propagate.
Short settle time after the buffer wait
The OM event queue can be empty while a batch is still being processed (events are dequeued before task processing finishes). A two-second sleep after join() gives in-flight container-key updates time to land before assertions.
Resolve the container ID from OM
testContainerEndpointForOBSBucket no longer assumes container 1L. It uses OmKeyArgs + OzoneManager#lookupKey to read the real container ID from the key’s block locations (getContainerIdForKey helper).
The FSO test uses the same buffer wait / join() / sleep pattern so both methods behave consistently after OM sync.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-15004
How was this patch tested?
(Please explain how this patch was tested. Ex: unit tests, manual tests, workflow run on the fork git repo.)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this.)
https://github.com/arunsarin85/ozone/actions/runs/24855010484
https://github.com/arunsarin85/ozone/actions/runs/24855051641