Skip to content

[KIP-932] Fix for green build on CI #5352

Open
Ankith L (Ankith-Confluent) wants to merge 58 commits intodev_kip-932_queues-for-kafkafrom
dev_kip-932_ci_integration_tests_fixes
Open

[KIP-932] Fix for green build on CI #5352
Ankith L (Ankith-Confluent) wants to merge 58 commits intodev_kip-932_queues-for-kafkafrom
dev_kip-932_ci_integration_tests_fixes

Conversation

@Ankith-Confluent
Copy link
Member

@Ankith-Confluent Ankith L (Ankith-Confluent) commented Mar 9, 2026

Fix mock broker share session management: add per-broker session lookup by NodeId, correct epoch mismatch handling to return INVALID_SHARE_SESSION_EPOCH, and add proper ShareFetch/ShareAcknowledge validation (epoch-0 ack rejection, epoch=-1 topic add/forget rejection). Update Kafka version to 4.2.0, add CI pipeline for share consumer integration tests (0170/0171), fix compaction test segment.bytes for Kafka 4.1+.

  1. segment.bytes=10000 → 1048576 and segment.ms=10000 → 500

As mentioned above, segment.bytes=10000 fails on Kafka 4.x due to the 1MB minimum. segment.ms was also reduced from 10s to 500ms to compensate — since segments are now much larger (1MB vs 10KB), they won't fill up as quickly from small messages, so we need the time-based roll to trigger sooner to keep the test fast.

  1. Message size increased from 1000/1024 bytes → 60000 bytes (60KB)

The compaction test works by filling a segment with enough data to force a segment roll, which is what triggers compaction. With the old 10KB segment.bytes, tiny 1KB messages were enough to fill a segment. Now that segment.bytes is 1MB, the same small messages would require ~1000 of them to fill a segment. Instead, 60KB messages were used so that ~20 messages (~1.2MB) exceed the 1MB segment limit and force a roll, keeping the test behaviour (and runtime) roughly the same.

@confluent-cla-assistant
Copy link

🎉 All Contributor License Agreements have been signed. Ready to merge.
Please push an empty commit if you would like to re-run the checks to verify CLA status for all contributors.

@airlock-confluentinc airlock-confluentinc bot force-pushed the dev_kip-932_ci_integration_tests_fixes branch 3 times, most recently from a777d8a to 7a23d48 Compare March 13, 2026 16:17
@Ankith-Confluent Ankith L (Ankith-Confluent) marked this pull request as ready for review March 13, 2026 17:14
@Ankith-Confluent Ankith L (Ankith-Confluent) requested a review from a team as a code owner March 13, 2026 17:14
@Ankith-Confluent Ankith L (Ankith-Confluent) changed the title Increase segment.bytes configuration to 1MB in compaction and fetch a… [KIP-932] Fix for green build on CI Mar 14, 2026
@airlock-confluentinc airlock-confluentinc bot force-pushed the dev_kip-932_ci_integration_tests_fixes branch from 213b104 to e007663 Compare March 17, 2026 08:32
Copy link
Member

@k-raina Kaushik Raina (k-raina) left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First level of reviews

commands:
- '[[ -z $DOCKERHUB_APIKEY ]] || docker login --username $DOCKERHUB_USER --password $DOCKERHUB_APIKEY'
jobs:
- name: 'Build and run share consumer tests (0170+)'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(0170+)
Will this semaphore not run all tests? Any reason why we are only using semaphore jobs to run share consumer tests seperately?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will only run tests >= 170

This was decided to run the share consumer tests on every commit, we dont want to run all the tests for every commit since that will increase the cost/time required for green build.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should tests be renamed with 170x numbering?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was decided to have this change in a separate PR.

commands:
- '[[ -z $DOCKERHUB_APIKEY ]] || docker login --username $DOCKERHUB_USER --password $DOCKERHUB_APIKEY'
jobs:
- name: 'Build and run share consumer tests (0170+)'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- name: 'Build and run share consumer tests (0170+)'
- name: 'Build and run share consumer tests'

- export JAVA_HOME=$(/usr/libexec/java_home -v 21)
- python3 -m pip install tests/trivup/trivup-0.14.0.tar.gz jsoncomment
- cache restore trivup-kafka-osx-${KAFKA_VERSION}-${CACHE_TAG}
- ./configure --install-deps --source-deps-only --enable-devel

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is --enable-devel needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had added it for testing purposes, I will remove it

- ./configure --install-deps --source-deps-only --enable-devel
- make -j all
- make -j -C tests build
- (cd tests && python3 -m trivup.clusters.KafkaCluster --kraft

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some semaphore blocks are using run-share-consumer-tests.sh and some or not. Why this inconsistency ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For mac I wad using trivup directly, I will change it to use run-share-consumer-tests.sh

continue;

state->state = RD_KAFKA_MOCK_SGRP_RECORD_AVAILABLE;
/* Per KIP-932: if delivery count has reached

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KIP-932 Maybe remove for rest of files in PR?

"--config cleanup.policy=compact "
"--config segment.ms=10000 "
"--config segment.bytes=10000 "
"--config segment.ms=500 "

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please mention in PR summary details, reason for these changes? Will help in review and future context

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

test_admin_create_topic(rk, topic, 1, 1,
(const char *[]) {"max.message.bytes", "10000",
"segment.bytes", "20000",
"segment.bytes", "1048576",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was there any change in AK or Clients which requires increase in segment bytes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes AK 4.2 requires segment.bytes to be 1MB minimum


TEST_CALL_ERR__(rd_kafka_assignment(c1, &c1_assignment));
TEST_ASSERT(c1_assignment->cnt == 3,
cnt = wait_assignment_count(share_c1, 3, 15000);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

   cnt = wait_assignment_count(share_c1, 3, 15000);

In other calls we use 10000 as timeout, any reason why we use 15000 here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had added it while it was getting stuck, I will revert

tests/test.c Outdated
_TEST(0155_share_group_heartbeat_mock, TEST_F_LOCAL),
_TEST(0156_share_group_fetch_mock, TEST_F_LOCAL),
_TEST(0157_share_group_ack_mock, TEST_F_LOCAL),
_TEST(0156_share_group_fetch_mock, TEST_F_MANUAL),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

share_group_fetch_mock Maybe change to share_consumer_fetch_mock?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

- Updated test file names from `share_group_fetch_mock` to `share_consumer_fetch_mock`
- Updated test file names from `share_group_ack_mock` to `share_consumer_ack_mock`
- Adjusted corresponding test declarations in `test.c` to reflect the new naming convention
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants