Acceptor drops pipelined frames, killing connections and losing link credit#330
Open
Indomitable wants to merge 2 commits intominghuaw:mainfrom
Open
Acceptor drops pipelined frames, killing connections and losing link credit#330Indomitable wants to merge 2 commits intominghuaw:mainfrom
Indomitable wants to merge 2 commits intominghuaw:mainfrom
Conversation
When a remote peer (e.g. the Azure Service Bus .NET SDK) pipelines Begin+Attach frames without waiting for the server's Begin response, the Attach frames arrive before the session is accepted and its relay is registered in session_by_incoming_channel. This causes forward_to_session to return NotFound, killing the connection. Fix: in the listener connection's on_incoming_begin handler, eagerly allocate the session relay and register it for the incoming channel. The pre-allocated incoming_rx and outgoing_channel are passed through IncomingSession so that accept_incoming_session reuses them instead of allocating a second relay.
When a remote peer pipelines Flow frames immediately after Attach (before the listener application has called accept_incoming_attach), the link handle is not yet registered in link_by_input_handle. Session::on_incoming_flow_inner returns UnattachedHandle, which escalates to an error that kills the session engine. This is particularly problematic for the initial credit grant (e.g. link_credit=50) that the remote receiver sends in its first Flow. If this Flow is dropped, the corresponding sender link starts with zero credit and blocks forever. Fix: 1. Override on_incoming_flow in ListenerSession to catch UnattachedHandle and buffer the LinkFlow in pending_link_flows (keyed by InputHandle) instead of propagating the error. 2. Override on_incoming_transfer similarly — drop pipelined transfers for unregistered handles rather than killing the session. 3. Override allocate_incoming_link to replay any buffered LinkFlow frames synchronously after the relay is inserted. For Sender relays, update flow_state and notify waiters so credit is available immediately. For Receiver relays, call on_incoming_flow. 4. Make Producer.state pub(crate) so allocate_incoming_link (which is sync) can access the flow state directly for replay.
Owner
|
Thanks for the PR. Sorry I've been busy for the past months. I'll try to take a look this week. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I'm building an Azure Service Bus emulator using
fe2o3-amqpv0.14.0 as the AMQP server. The Azure .NET SDK (Azure.Messaging.ServiceBus7.20.1) pipelines its AMQP frames pretty aggressively — it sendsBegin+Attach+Flowback-to-back without waiting for server responses between them. This is allowed by the AMQP 1.0 spec (Section 2.5.1), but it triggers three related problems in the acceptor code.First connections usually work fine. Second and subsequent connections fail nondeterministically — the server closes them with
amqp:not-found, or the CBS authentication hangs forever. On the .NET side this shows up asTaskCanceledExceptionor connection timeouts.Problem 1:
forward_to_sessioncan't find the session relayWhen the client opens a remotely-initiated session,
ListenerConnection::on_incoming_beginsends anIncomingSessionto thesession_listenerchannel for user code to accept. The relay only gets registered insession_by_incoming_channellater, whenon_outgoing_beginprocesses the server's responseBegin.But if the client pipelines an
Attachright after itsBegin, the connection engine tries to route it viaforward_to_sessionbefore the relay exists:This kills the entire connection.
Problem 2: Pipelined
Flowcrashes the session engineAfter fixing problem 1 (so pipelined frames actually reach the session), the next thing that breaks is
Flowhandling. TheAttachgets correctly buffered in thelink_listenerchannel, but theFlowthat follows it references a link handle that hasn't been registered yet —accept_incoming_attachhasn't been called.on_incoming_flow_innerlooks uplink_by_input_handle, doesn't find it, and returnsUnattachedHandle. This escalates throughon_error→end_session, killing the session engine. The connection engine then gets aSendErrorbecause the session's channel dropped.Same thing happens with pipelined
Transferframes.Problem 3: Dropping the
Flowloses the initial credit grantEven after catching
UnattachedHandlegracefully (instead of crashing), just dropping theFlowcauses a subtler problem. That pipelinedFlowtypically carries the initial credit grant from the remote receiver (link_credit=50). The sender link on the server side starts with zero credit and waits for aFlowto grant some. If the only credit-grantingFlowwas dropped, the sender blocks forever onflow_state.consume(1).In my case this manifested as the CBS response path deadlocking — the CBS receiver task would get the
put-tokenrequest fine, butsend_response()would hang because the CBS sender link had no credit.In
on_incoming_begin, allocate the session relay and register it insession_by_incoming_channelimmediately, before handing off to user code. Pass the pre-allocatedincoming_rxandoutgoing_channelthroughIncomingSessionsoaccept_incoming_sessionreuses them.Override
on_incoming_flowinListenerSessionto catchUnattachedHandle— instead of propagating the error, buffer theLinkFlowin aHashMap<InputHandle, Vec<LinkFlow>>. Do the same foron_incoming_transfer(drop rather than crash). Then overrideallocate_incoming_linkto replay buffered flows after the relay is inserted intolink_by_input_handle, so credit is available immediately. This requires makingProducer.statepub(crate)for synchronous access to the flow state during replay.