Fix assertion crash in processIOThreadsReadDone when DONT_PARSE client has stale argc#3245
Fix assertion crash in processIOThreadsReadDone when DONT_PARSE client has stale argc#3245aradz44 wants to merge 4 commits intovalkey-io:unstablefrom
Conversation
Signed-off-by: Arad Zilberstein <aradz@amazon.com>
9e7310f to
0eb19b4
Compare
| } else if (!strcasecmp(objectGetVal(c->argv[1]), "client-enforce-reply-list") && c->argc == 3) { | ||
| server.debug_client_enforce_reply_list = atoi(objectGetVal(c->argv[2])); | ||
| addReply(c, shared.ok); | ||
| } else if (!strcasecmp(objectGetVal(c->argv[1]), "set-close-after-reply") && c->argc == 3) { |
There was a problem hiding this comment.
I think what we wanted is a debug extended CLIENT KILL abilities right?
maybe it would be helpful to have a debug command: DEBUG CLOSE-ASAP (close async) and DEBUG CLOSE-AFTER-REPLY ?
src/debug.c
Outdated
| if (!target) { addReplyError(c, "No such client"); return; } | ||
| target->flag.close_after_reply = 1; |
There was a problem hiding this comment.
Presumably this would be reproducible after doing ACL LOAD with the current user getting invalidated. In that case we call close_after_reply. That seems like a real potential crash? If we can trigger this with a real known workload, I would feel a bit better about it than something synthetic.
There was a problem hiding this comment.
Agree. better to fix and test REAL issue rather than a synthetic one. Good point about the ACL LOAD!
There was a problem hiding this comment.
close_after_reply is only set by a client on itself by what I saw , but a client can't have both stale argc and close_after_reply because
setting close_after_reply requires executing a command, which calls resetClient and zeros argc, and once it's set. For external paths like ACL LOAD, the code uses freeClientOrCloseLater which sets close_asap (not close_after_reply),
and close_asap is already guarded by trySendReadToIOThreads.
If you think of another way to trigger it I can check it.
Signed-off-by: Arad Zilberstein <aradz@amazon.com>
Signed-off-by: Arad Zilberstein <aradz@amazon.com>
|
Sorry for the |
Found while reading the code — I have not yet been able to confirm whether this path can materialize in existing real-world scenarios.
Bug
When an IO thread reads from a client with READ_FLAGS_DONT_PARSE set (because canParseCommand returns 0), processIOThreadsReadDone skips handleParseResults but still checks c->argc > 0 and sets pending_command = 1. However, argc may be stale from a previous incomplete pipelined command parse — it was set by consumeCommandQueue when handleParseResults returned PARSE_NEEDMORE, and was never cleared.
This leads to processCommand being called with parsed_cmd == NULL and without READ_FLAGS_COMMAND_NOT_FOUND, triggering the assertion:
==> server.c:4177 'c->read_flags & READ_FLAGS_COMMAND_NOT_FOUND' is not true
Crash sequence
Fix
In processIOThreadsReadDone, when DONT_PARSE is set and argc > 0, skip the client. The argc is stale from a previous incomplete parse — the IO thread didn't parse anything new, so there is no valid command to execute.
Test
Added DEBUG SET-CLOSE-AFTER-REPLY subcommand and a test that reproduces the crash deterministically: