Skip to content

HDDS-15004. Stabilize TestReconContainerEndpoint#testContainerEndpointForOBSBucket#10116

Open
arunsarin85 wants to merge 3 commits intoapache:masterfrom
arunsarin85:HDDS-15004
Open

HDDS-15004. Stabilize TestReconContainerEndpoint#testContainerEndpointForOBSBucket#10116
arunsarin85 wants to merge 3 commits intoapache:masterfrom
arunsarin85:HDDS-15004

Conversation

@arunsarin85
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

TestReconContainerEndpoint#testContainerEndpointForOBSBucket was failing intermittently with AssertionFailedError: expected: <1> but was: <0> on KeysResponse#getTotalCount() (Recon’s container-key index had no entry yet, or the wrong container was queried).

Please describe your PR in detail:
This change stabilizes the integration test without altering production code:

  • Reset container-key mapper static state
    ContainerKeyMapperHelper keeps JVM-wide static state (initialization flag, shared count maps, active task counter). After testContainerEndpointForFSOLayout runs, that state could still reflect the previous cluster and break mapper behavior for the next method. The test now calls ContainerKeyMapperHelper.clearSharedContainerCountMap() at the start of each run (@beforeeach) and again in a finally block in @AfterEach so cleanup runs even if shutdown throws.

  • Surface failures from the async “buffer empty” wait
    The test waited on completableFuture::isDone after waitForEventBufferEmpty but never checked completion. If the async runnable failed, the future could still be “done” and the test would continue. It now calls completableFuture.join() after the wait so failures propagate.

  • Short settle time after the buffer wait
    The OM event queue can be empty while a batch is still being processed (events are dequeued before task processing finishes). A two-second sleep after join() gives in-flight container-key updates time to land before assertions.

  • Resolve the container ID from OM
    testContainerEndpointForOBSBucket no longer assumes container 1L. It uses OmKeyArgs + OzoneManager#lookupKey to read the real container ID from the key’s block locations (getContainerIdForKey helper).

The FSO test uses the same buffer wait / join() / sleep pattern so both methods behave consistently after OM sync.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-15004

How was this patch tested?

(Please explain how this patch was tested. Ex: unit tests, manual tests, workflow run on the fork git repo.)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this.)

https://github.com/arunsarin85/ozone/actions/runs/24855010484
https://github.com/arunsarin85/ozone/actions/runs/24855051641

Copy link
Copy Markdown
Contributor

@devmadhuu devmadhuu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @arunsarin85 for the patch. Kindly find comments.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we fix here also ?

cluster.shutdown();
}
} finally {
ContainerKeyMapperHelper.clearSharedContainerCountMap();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If any error in closing the client, this will still clear the map, but cluster shutdown may skip. Not a very good resource handling. Can IOUtils.closeQuietly help ?

GenericTestUtils.waitFor(completableFuture::isDone, 100, 30000);
completableFuture.join();
// The buffer can be empty while tasks still finish processing a dequeued batch.
Thread.sleep(2000);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets avoid Thread.sleep

@arunsarin85
Copy link
Copy Markdown
Contributor Author

@devmadhuu Thanks for the review . I have added a patch for the above changes and triggered the flaky-test-check
https://github.com/arunsarin85/ozone/actions/runs/25127529917
https://github.com/arunsarin85/ozone/actions/runs/25127529917/attempts/1

@arunsarin85 arunsarin85 requested a review from devmadhuu April 29, 2026 19:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants