libeventlog: wait for pending commits to complete#7271
libeventlog: wait for pending commits to complete#7271mergify[bot] merged 5 commits intoflux-framework:masterfrom
Conversation
538c437 to
6758039
Compare
| /* If pending list is not empty, we need to wait until all | ||
| * pending batches have completed before returning. | ||
| * Accomplish this by creating a new batch and committing this | ||
| * "empty batch". While this may be inefficient (sending an | ||
| * RPC that does nothing), this ensures that everything else | ||
| * in this API works just as though a non-empty-batch was | ||
| * committed. | ||
| * | ||
| * N.B. eventlog_batch_get() places this new batch on the | ||
| * pending list. | ||
| */ | ||
|
|
||
| if (!(batch = eventlog_batch_get (ev))) | ||
| return NULL; | ||
| if (batchp) | ||
| *batchp = batch; | ||
| ev->current = NULL; |
There was a problem hiding this comment.
This is a good catch!
I'm not sure how often an empty batch is actually committed. If it is a significant amount, perhaps we should explore other solutions that don't send an empty RPC.
For example, append empty batches to the pending list without sending an RPC. When non-empty batches complete, check if the next batch in the list and "complete" that batch as well. This would require saving the future to the eventlog batch object.
If empty batches are very rare, then the current solution here is simpler and is probably fine.
There was a problem hiding this comment.
I'm not sure how often an empty batch is actually committed. If it is a significant amount, perhaps we should explore other solutions that don't send an empty RPC.
It should be quite rare. If you take a look at #6581, the max "ops" that the kvs saw (within some sampling window) was 11K.
Looking at corona right now, the max is 21 (not a typo, not 21K) :-) Tuolomne it's 29K.
Perhaps a minor bump up to .... 8K or 16K may be ok?
(Edit: I just realized, my comment covered a bit of a this one and follow on #7291)
For example, append empty batches to the pending list without sending an RPC. When non-empty batches complete, check if the next batch in the list and "complete" that batch as well. This would require saving the future to the eventlog batch object.
If empty batches are very rare, then the current solution here is simpler and is probably fine.
I originally implemented this much like you describe. Just creating a dummy future with flux_future_create() and manually fulfilling it as needed.
The fallout from this was eventlog_flush() and its call to flux_future_wait_for(). Since there was no init function to the future, the internal "now" rpc was destined to hang forever. Which is why it lead me down the path of sending an empty batch RPC (thus my comment of preserving all other aspects of the API).
grondo
left a comment
There was a problem hiding this comment.
LGTM! Minor comments inline.
src/common/libeventlog/eventlogger.c
Outdated
| */ | ||
| if ((f = flux_future_create (NULL, NULL))) { | ||
| flux_future_set_reactor (f, flux_get_reactor (ev->h)); | ||
| flux_future_set_flux (f, ev->h); |
There was a problem hiding this comment.
Commit message typo: "Set the flux handle via flux_future_get_flux()" should be "flux_future_set_flux()"
| void *arg; | ||
| }; | ||
|
|
||
| static struct eventlog_batch * eventlog_batch_get (struct eventlogger *ev); |
There was a problem hiding this comment.
Commit message title: libevent: should be libeventlog:
src/common/libeventlog/eventlogger.c
Outdated
| if (batchp) | ||
| *batchp = batch; | ||
| ev->current = NULL; | ||
| /* fallthrough */ |
There was a problem hiding this comment.
Suggestion: expand this comment to describe what happens to the empty batch on fallthrough, e.g.:
/* fallthrough: empty batch is now pending and will serialize behind previous batches */Problem: Modern flux-core coding style is to place each function parameter on a different line if the line length exceeds 80 chars. This is violated in libeventlog. Fix the unmodern coding style.
Problem: In the libeventlog eventlogger, an "empty batch" is immediately fulfilled with a new future that has a reactor set with flux_future_set_reactor(). This can lead to downstream problems because often callers get the reactor via the flux handle and flux_future_get_flux(). Set the flux handle via flux_future_set_flux() instead.
Problem: The naming of some of the internal functions gets a little confusing. Solution: Rename some functions as follows: commit_batch -> commit - this is the internal wrapper around eventlogger_commit() eventlogger_commit_batch -> commit_batch - this does the actual commit of a batch, and it is not a public function so remove the "eventlogger_" prefix. commit_cb -> commit_batch_cb - this is the callback from the new commit_batch()
Problem: The commit_batch() function in the eventlogger handles a special case when the batch is NULL. The commit_batch() function should only deal with the specific case of committing a batch to the KVS. Move the special case handling from commit_batch() to commit().
Problem: Both eventlogger_commit() and eventlogger_flush() assume all prior pending commits have completed. This is not a problem under normal conditions, as there is always some "current" event data in the eventlogger that needs to be written out, and commit/flush can wait for that commit to be completed. However, under some circumstances there may not be any "current" data to be written out. This can lead to a race as code returns a fulfilled future, assuming all prior pending commits have completed. Its worth nothing there is no current use of eventlogger_commit() that could lead to a race. Races with eventlogger_flush() are theoretically possible in the shell. Solution: When there are no "current" events to be written out and there are pending commits waiting to be completed, wait for the most recent pending commit to complete before fulfilling the future. We accomplish this by creating a new "empty" batch of events and commit that empty set of events. The future created by this "empty" batch can be waited on. Fixes flux-framework#7257
6758039 to
91738ce
Compare
|
thanks! will set MWP. |
|
Oh, I noticed the PR title has |
Merge Queue Status✅ The pull request has been merged at 91738ce This pull request spent 6 seconds in the queue, with no time running CI. Required conditions to merge
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #7271 +/- ##
==========================================
- Coverage 83.66% 83.65% -0.01%
==========================================
Files 559 559
Lines 93260 93267 +7
==========================================
- Hits 78025 78022 -3
- Misses 15235 15245 +10
🚀 New features to boost your workflow:
|
Problem: Both eventlogger_commit() and eventlogger_flush() assume all prior pending commits have completed. This is not a problem under normal conditions, as there is always some "current" event data in the eventlogger that needs to be written out, and commit/flush can wait for that commit to be completed.
However, under some circumstances there may not be any "current" data to be written out. This can lead to a race as code returns a fulfilled future, assuming all prior pending commits have completed.
Its worth noting there is no current use of eventlogger_commit() that could lead to a race. Races with eventlogger_flush() are
theoretically possible in the shell.
Solution: When there are no "current" events to be written out and there are pending commits waiting to be completed, wait for the most recent pending commit to complete before fulfilling the future. We accomplish this by creating a new "empty" batch of events and commit that empty set of events. The future created by this "empty" batch can be waited on.
Fixes #7257