-
Notifications
You must be signed in to change notification settings - Fork 5.2k
http: fix potential FD leak when zombie streams prevent connection close #42859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
I am chasing a case where a client sends POST messages on a HTTP1.1 connection, but is being systematically denied. The connection is immediately reused by the client. In some case we seem to trigger a racy case where the number of open file descriptor explodes. When a stream becomes a zombie (waiting for codec encode completion) and the connection is in DrainState::Closing, the connection is not closed because checkForDeferredClose() is not called after the zombie stream was finally destroyed. This commonly occurs with HTTP/1.1 connections when: - ext_authz (or similar filter) denies a request early, sending a 403 response before the full request body is received - The connection is (potentially) marked DrainState::Closing - The stream becomes a zombie waiting for the codec to finish encoding - When onCodecEncodeComplete() fires, doDeferredStreamDestroy() was called but checkForDeferredClose() was not, leaving the connection open In scenarios where clients try to reuse the same HTTP/1.1 connection and requests are systematically denied (e.g., by ext_authz), this causes rapid FD exhaustion as each denied request leaves an orphaned connection. The fix adds checkForDeferredClose() calls after zombie stream destruction in both onCodecEncodeComplete() and onCodecLowLevelReset(). Signed-off-by: William Dauchy <[email protected]>
|
@KBaichoo, Kevin, do you want also to take a look for this? |
botengyao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this, and left a high level question.
/wait
495f026 to
c96a9be
Compare
Signed-off-by: William Dauchy <[email protected]>
|
/retest transients |
botengyao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/wait
KBaichoo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work, looks good to me
/wait
Signed-off-by: William Dauchy <[email protected]>
|
/retest transients |
botengyao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix!! LGTM, module 2 nits
/wait
| // This tests the fix for a potential connection leak where zombie streams | ||
| // were not triggering connection close when drain_state_ was Closing. | ||
| TEST_F(HttpConnectionManagerImplTest, ZombieStreamClosesConnectionOnCodecComplete) { | ||
| // Use HTTP/1.1 and set max_requests_per_connection to 1 to trigger |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the test, can we explicitly set the runtime guard for true and false.
| // closed. doEndStream() call that created the zombie may have set | ||
| // drain_state_ to Closing, but checkForDeferredClose() couldn't close the | ||
| // connection at that time because streams_ wasn't empty yet. | ||
| if (Runtime::runtimeFeatureEnabled( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we latch the runtime value as well to improve efficiency?
Signed-off-by: William Dauchy <[email protected]>
botengyao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, thanks for the fix!
Will merge it after the release early next week.
|
/retest transients |
Commit Message:
I am chasing a case where a client sends POST messages on a HTTP1.1 connection, but is being systematically denied. The connection is immediately reused by the client. In some case we seem to trigger a racy case where the number of open file descriptor explodes.
When a stream becomes a zombie (waiting for codec encode completion) and the connection is in DrainState::Closing, the connection is not closed because checkForDeferredClose() is not called after the zombie stream was finally destroyed.
This commonly occurs with HTTP/1.1 connections when:
In scenarios where clients try to reuse the same HTTP/1.1 connection and requests are systematically denied (e.g., by ext_authz), this causes rapid FD exhaustion as each denied request leaves an orphaned connection.
The fix adds checkForDeferredClose() calls after zombie stream destruction in both onCodecEncodeComplete() and onCodecLowLevelReset().
Additional Description:
Risk Level:
Testing:
Docs Changes:
Release Notes:
Platform Specific Features:
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Fixes commit #PR or SHA]
[Optional Deprecated:]
[Optional API Considerations:]