[KIP-932] Fix for green build on CI #5352
[KIP-932] Fix for green build on CI #5352Ankith L (Ankith-Confluent) wants to merge 58 commits intodev_kip-932_queues-for-kafkafrom
Conversation
|
🎉 All Contributor License Agreements have been signed. Ready to merge. |
a777d8a to
7a23d48
Compare
…borted messages tests
…ating member active state logging
…it and ensure proper segment rolls
…on for all tests with SASL
…n share group tests
…lookup and management
… disable until fixes are applied
…re-enabling tests
… for share group fetch mock
… consumer test job name
213b104 to
e007663
Compare
…/m1 share consumer tests
…n for OSX arm64/m1
Kaushik Raina (k-raina)
left a comment
There was a problem hiding this comment.
First level of reviews
.semaphore/semaphore.yml
Outdated
| commands: | ||
| - '[[ -z $DOCKERHUB_APIKEY ]] || docker login --username $DOCKERHUB_USER --password $DOCKERHUB_APIKEY' | ||
| jobs: | ||
| - name: 'Build and run share consumer tests (0170+)' |
There was a problem hiding this comment.
(0170+)
Will this semaphore not run all tests? Any reason why we are only using semaphore jobs to run share consumer tests seperately?
There was a problem hiding this comment.
It will only run tests >= 170
This was decided to run the share consumer tests on every commit, we dont want to run all the tests for every commit since that will increase the cost/time required for green build.
There was a problem hiding this comment.
Should tests be renamed with 170x numbering?
There was a problem hiding this comment.
This was decided to have this change in a separate PR.
.semaphore/semaphore.yml
Outdated
| commands: | ||
| - '[[ -z $DOCKERHUB_APIKEY ]] || docker login --username $DOCKERHUB_USER --password $DOCKERHUB_APIKEY' | ||
| jobs: | ||
| - name: 'Build and run share consumer tests (0170+)' |
There was a problem hiding this comment.
| - name: 'Build and run share consumer tests (0170+)' | |
| - name: 'Build and run share consumer tests' |
.semaphore/semaphore.yml
Outdated
| - export JAVA_HOME=$(/usr/libexec/java_home -v 21) | ||
| - python3 -m pip install tests/trivup/trivup-0.14.0.tar.gz jsoncomment | ||
| - cache restore trivup-kafka-osx-${KAFKA_VERSION}-${CACHE_TAG} | ||
| - ./configure --install-deps --source-deps-only --enable-devel |
There was a problem hiding this comment.
Why is --enable-devel needed?
There was a problem hiding this comment.
I had added it for testing purposes, I will remove it
.semaphore/semaphore.yml
Outdated
| - ./configure --install-deps --source-deps-only --enable-devel | ||
| - make -j all | ||
| - make -j -C tests build | ||
| - (cd tests && python3 -m trivup.clusters.KafkaCluster --kraft |
There was a problem hiding this comment.
Some semaphore blocks are using run-share-consumer-tests.sh and some or not. Why this inconsistency ?
There was a problem hiding this comment.
For mac I wad using trivup directly, I will change it to use run-share-consumer-tests.sh
src/rdkafka_mock_sharegrp.c
Outdated
| continue; | ||
|
|
||
| state->state = RD_KAFKA_MOCK_SGRP_RECORD_AVAILABLE; | ||
| /* Per KIP-932: if delivery count has reached |
There was a problem hiding this comment.
KIP-932 Maybe remove for rest of files in PR?
| "--config cleanup.policy=compact " | ||
| "--config segment.ms=10000 " | ||
| "--config segment.bytes=10000 " | ||
| "--config segment.ms=500 " |
There was a problem hiding this comment.
Could you please mention in PR summary details, reason for these changes? Will help in review and future context
There was a problem hiding this comment.
Sure
| test_admin_create_topic(rk, topic, 1, 1, | ||
| (const char *[]) {"max.message.bytes", "10000", | ||
| "segment.bytes", "20000", | ||
| "segment.bytes", "1048576", |
There was a problem hiding this comment.
Was there any change in AK or Clients which requires increase in segment bytes?
There was a problem hiding this comment.
Yes AK 4.2 requires segment.bytes to be 1MB minimum
|
|
||
| TEST_CALL_ERR__(rd_kafka_assignment(c1, &c1_assignment)); | ||
| TEST_ASSERT(c1_assignment->cnt == 3, | ||
| cnt = wait_assignment_count(share_c1, 3, 15000); |
There was a problem hiding this comment.
cnt = wait_assignment_count(share_c1, 3, 15000);
In other calls we use 10000 as timeout, any reason why we use 15000 here?
There was a problem hiding this comment.
I had added it while it was getting stuck, I will revert
tests/test.c
Outdated
| _TEST(0155_share_group_heartbeat_mock, TEST_F_LOCAL), | ||
| _TEST(0156_share_group_fetch_mock, TEST_F_LOCAL), | ||
| _TEST(0157_share_group_ack_mock, TEST_F_LOCAL), | ||
| _TEST(0156_share_group_fetch_mock, TEST_F_MANUAL), |
There was a problem hiding this comment.
share_group_fetch_mock Maybe change to share_consumer_fetch_mock?
There was a problem hiding this comment.
Sure
…andling functions
- Updated test file names from `share_group_fetch_mock` to `share_consumer_fetch_mock` - Updated test file names from `share_group_ack_mock` to `share_consumer_ack_mock` - Adjusted corresponding test declarations in `test.c` to reflect the new naming convention
Fix mock broker share session management: add per-broker session lookup by NodeId, correct epoch mismatch handling to return INVALID_SHARE_SESSION_EPOCH, and add proper ShareFetch/ShareAcknowledge validation (epoch-0 ack rejection, epoch=-1 topic add/forget rejection). Update Kafka version to 4.2.0, add CI pipeline for share consumer integration tests (0170/0171), fix compaction test segment.bytes for Kafka 4.1+.
As mentioned above, segment.bytes=10000 fails on Kafka 4.x due to the 1MB minimum. segment.ms was also reduced from 10s to 500ms to compensate — since segments are now much larger (1MB vs 10KB), they won't fill up as quickly from small messages, so we need the time-based roll to trigger sooner to keep the test fast.
The compaction test works by filling a segment with enough data to force a segment roll, which is what triggers compaction. With the old 10KB segment.bytes, tiny 1KB messages were enough to fill a segment. Now that segment.bytes is 1MB, the same small messages would require ~1000 of them to fill a segment. Instead, 60KB messages were used so that ~20 messages (~1.2MB) exceed the 1MB segment limit and force a roll, keeping the test behaviour (and runtime) roughly the same.