Add support for multi-process port sharing with CIBIR. by ProjectsByJackHe · Pull Request #5798 · microsoft/msquic

ProjectsByJackHe · 2026-02-18T02:02:06Z

Description

The XDP datapath can be configured to intercept packets based on QUIC Connection ID instead of local port.
This behavior existed in MsQuic but was not heavily exercised until recently.
One issue was that MsQuic always attempted to reserve UDP / TCP sockets for each application server process.
But for multiple server processes that may want to share a single port, we would run into port collision errors.
This PR adds support for CIBIR across multiple processes on the same port and document the behavior

Testing

A new DataPathTest was added.

Documentation

Settings.md

codecov · 2026-02-18T03:05:57Z

Codecov Report

❌ Patch coverage is 66.66667% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 85.07%. Comparing base (ed14762) to head (2f06dad).
⚠️ Report is 46 commits behind head on main.

Files with missing lines	Patch %	Lines
src/core/connection.c	66.66%	1 Missing ⚠️

❌ Your patch check has failed because the patch coverage (66.66%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #5798      +/-   ##
==========================================
- Coverage   85.99%   85.07%   -0.93%     
==========================================
  Files          60       60              
  Lines       18729    18732       +3     
==========================================
- Hits        16106    15936     -170     
- Misses       2623     2796     +173

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

src/platform/datapath_raw_socket.c

src/platform/datapath_winuser.c

src/platform/platform_internal.h

docs/Settings.md

guhetier

In general, I am concerned that we keep adding incremental exceptions to the port reservation logic to solve the next issue, but without having a clear design goal.
This code will be hard to maintain and confusing for apps as there is no simple rule about what can be done with ports.

I think we need to take the time soon to come up with a clear story about when can port be shared and when they can't + document it + check we implement it.

docs/Settings.md

src/platform/platform_internal.h

src/platform/datapath_winuser.c

ProjectsByJackHe · 2026-02-19T00:33:14Z

In general, I am concerned that we keep adding incremental exceptions to the port reservation logic to solve the next issue, but without having a clear design goal. This code will be hard to maintain and confusing for apps as there is no simple rule about what can be done with ports.

I think we need to take the time soon to come up with a clear story about when can port be shared and when they can't + document it + check we implement it.

I agree. I added a CIBIR.md.

ProjectsByJackHe · 2026-02-21T02:06:47Z

In general, I am concerned that we keep adding incremental exceptions to the port reservation logic to solve the next issue, but without having a clear design goal. This code will be hard to maintain and confusing for apps as there is no simple rule about what can be done with ports.

I think we need to take the time soon to come up with a clear story about when can port be shared and when they can't + document it + check we implement it.

please see the updated XDP.md and CIBIR.md

nibanks · 2026-03-11T11:39:01Z

Maybe we should actually go through the rest of the standardization process for https://datatracker.ietf.org/doc/draft-banks-quic-cibir/ since you're still using it....

guhetier · 2026-03-11T16:02:21Z

Maybe we should actually go through the rest of the standardization process for https://datatracker.ietf.org/doc/draft-banks-quic-cibir/ since you're still using it....

Do you have more context about it? I am interested in understanding whether we should work on standardizing CIBIR or if another solution was adopted by other deployments that we should align with.

nibanks · 2026-03-11T23:04:57Z

Maybe we should actually go through the rest of the standardization process for https://datatracker.ietf.org/doc/draft-banks-quic-cibir/ since you're still using it....

Do you have more context about it? I am interested in understanding whether we should work on standardizing CIBIR or if another solution was adopted by other deployments that we should align with.

I know of no other solution/proposal that achieves what CIBIR does. We can chat more on Discord about the process.

docs/images/XDP-arch.png

docs/CIBIR.md

src/core/connection.c

src/platform/datapath_winuser.c

mtfriesen · 2026-03-16T20:57:57Z

src/platform/datapath_winuser.c

+                            "[data][%p] CIBIR detected,  %s",
+                            Socket,
+                            "ignoring port collision by assuming some \
+                            other MsQuic CIBIR process has reserved the OS port. \


It's too bad we don't have a way to test if MsQuic with CIBIR is the other endpoint - unless we do?

Could we create a very simple (cooperative, not adversarial) protocol to send a UDP packet to the port and check if the response seems MsQuic-CIBIR-like? I suppose this would introduce a path where the other process has exited, and then we have to retry binding...potentially forever.

I think your suggestion is: when a port collision occurs and CIBIR is enabled, instead of assuming the
other process that previously reserved the port is also a CIBIR process, we should first do a quick test to
verify that?

If that is indeed the suggestion I would like to offer a little pushback because that seems a bit overkill.
I think it's reasonable to expect apps that enable CIBIR to know what they are doing.

Moving this discussion offline...

Reactivating: my request is more generally to consider whether we can apply any heuristic beyond "it's probably CIBIR" before potentially stealing an OS port.

A concrete suggestion is to follow the existing port reservation model here, but add support for persistent reservations, which can span processes, rather than ephemeral reservations, which are constrained to one process.

@mtfriesen Maybe you can give some more details about how you see things?
Do you mean enforcing a permanent port reservation to exist? Or trying to use a potentially existing one and ignore the failure if it doesn't exist? Something else?
I am not very familiar with permanent port reservations (from the doc, I am somewhat confused about what the token is meant to help with).

To be super clear, while I provided one concrete suggestion, my request is to demonstrate clear reasoning for whatever we choose to do:

Describe the problem (potential port stealing) and reason about its likelihood and impact. Random, hard-to-reproduce failures on 2% of prod machines would be frustrating, as would tests or dev environments randomly failing.

Come up with some possible solutions, including existing APIs that might help, or anything else we can implement with reasonable upfront and maintenance cost.

Evaluate whether the costs/risks of any of the solutions in (2) would be enough to offset eliminating the expected reliability issues in (1).

It's not obvious to me, after quite a few years in networking at Microsoft, that customers/deployments can be trusted to "know what they're doing" and some rather popular concepts (containers) exist precisely because it's hard to know what you're doing at scale.

src/core/listener.c

docs/XDP.md

src/platform/datapath_winuser.c

guhetier · 2026-03-18T17:18:11Z

src/inc/quic_datapath.h

+    _In_ uint8_t Length
+    )
+{
+    uint64_t Value = 0;


please assert that Length <= 8

src/platform/datapath_winuser.c

guhetier · 2026-03-20T00:59:59Z

src/platform/datapath_winuser.c

+                            // In the case of a port collision while cibir is enabled,
+                            // MsQuic assumes another MsQuic cibir process has done the
+                            // job of reserving the OS ports via socket creation/bind.
+                            // At least that process can fall back to using the OS stack if


That comment concerns me: if CIBIR is enabled, we need XDP to honor it. Falling back to OS sockets seems problematic to me, we don't honor an option the app set without reporting it.

... except if you consider that CIBIR can also be used for dispatching between different host, with a load balancer dispatching instead of XDP. But that should not be a transparent fallback.

guhetier · 2026-03-20T01:08:39Z

src/platform/datapath_winuser.c

nit: Consider not logging an error when we are actually fine because CIBIR is in use. It will avoid confusion when looking at logs, "fake errors" are misleading.

Agreed, this falls into my general concern that we don't have clear and robust error detection and error handling here. Our behavior should be clearly defined, with the ability to fill out a matrix of possible actions by MsQuic and non-MsQuic (and CIBIR and non-CIBIR) processes on the system, and the matrix should have reasonable observabvle behavior in each entry.

guhetier · 2026-03-20T02:14:49Z

In my opinion, this entire code path is becoming a maintenance problem (independently from this specific PR). Reviewing changes in a function that long and with multiple level of loops, tests and jump is not reasonable. We need to refactor it, with intentional design.

We cannot expect a user to understand all the special cases and subtly different behavior that have emerged from this code.

mtfriesen · 2026-03-20T12:02:27Z

In my opinion, this entire code path is becoming a maintenance problem (independently from this specific PR). Reviewing changes in a function that long and with multiple level of loops, tests and jump is not reasonable. We need to refactor it, with intentional design.

We cannot expect a user to understand all the special cases and subtly different behavior that have emerged from this code.

This comment has been made multiple times, and while I agree there is probably room for improvement, as we've discussed, port pools rich with features tend to become intrinsically complicated when properly implemented.

guhetier · 2026-03-20T16:09:03Z

Complex and complicated are different. Portpool are complex, but what we expose to our users or the way we implement it doesn't have to be complicated and confusing. A good implementation can abstract the complexity and present a clear behavior, this is not what we are doing now.

ProjectsByJackHe added 2 commits February 17, 2026 16:45

add a comment

ee68434

update code to account for cibir

68a7afe

ProjectsByJackHe requested a review from a team as a code owner February 18, 2026 02:02

clog fixes

382fba7