kvs: support configuration of max operations count#6581
kvs: support configuration of max operations count#6581chu11 wants to merge 3 commits intoflux-framework:masterfrom
Conversation
8868cd6 to
d245914
Compare
d245914 to
07ed829
Compare
|
re-pushed, removing all fence related parts of this PR, since KVS fence support is now gone |
07ed829 to
7b1939d
Compare
|
re-pushed, removing all the fence stuff that is no longer relevant b/c KVS fence support was removed |
a651b3a to
5ed7562
Compare
6bcbcfe to
3706f06
Compare
|
Some small set of transaction stats the max number of ops in a transaction is 398, which means my default cap of 64K seems more than fine. I'll still monitor stats until release time, but removing WIP for now so this can be reviewed |
|
saw this today max of 11K ... perhaps upping to 128K over 64K would still be good but a bit safer. |
3706f06 to
a8e4ffa
Compare
|
Before we do this, we should probably investigate why we are seeing these high op-count commits. I think stdout events are batched by this code, which triggers the commit of a batch of events based on a timer. Maybe it should also have a high water mark on the operation count and/or the cumulative size of the events. https://github.com/flux-framework/flux-core/blob/master/src/common/libeventlog/eventlogger.c If we can make sure flux-core code is well behaved, then it makes sense to me to impose a limit to avoid regressions and bad behavior by framework projects, but I should think we could set it much lower than 128K. We should also provide some way for API users to know what the limit is. For example, then the above code could set its high water mark accordingly. Maye we could even just make it a constant in Note this PR in the title and a few other places uses "transaction" where "operation" is meant. IMHO the title should be "kvs: limit the number of operations per commit". |
a8e4ffa to
a93874b
Compare
Good point. I'll write up an issue to investigate this issue as well. However, this PR was initially developed under the idea to defend against a denial of service, so I think it is worthwhile to have the max independent of it. At this moment, a rogue user (or misbehaving code) could create a KVS transaction with like a bajillion operations in it right now. |
Problem: A KVS denial of service is possible because there is no maximum on the number of operations a user can submit in a KVS transaction. For example, a KVS transaction with billions of KVS entries would lead to a severe degradation in KVS performance. Support a new KVS configuration "transaction-max-ops" that will reject KVS transaction with operations above a maximum count. The default maximum is 16384. Fixes flux-framework#6572
Problem: The new kvs transaction-max-ops configuration option is not documented. Add documentation to flux-config-kvs(5)
Problem: There is no coverage for the new kvs transaction-max-ops configuration. Add coverage in t1005-kvs-security.t.
a93874b to
bfb5d59
Compare
|
with #7291 being merged, came by to update this old PR. The default cap is now 16384, matching the default cap in the As a side note, I looked around and the largest "max ops" I saw in the wild was 29K on tuolomne. The presumption is it is from the eventlogger which will handle it. Before, the max I saw was 11K. We could bump to 32K if we want to be super safe. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6581 +/- ##
==========================================
- Coverage 83.56% 83.56% -0.01%
==========================================
Files 562 562
Lines 93616 93638 +22
==========================================
+ Hits 78231 78244 +13
- Misses 15385 15394 +9
🚀 New features to boost your workflow:
|
Per discussion in #6125, denial-of-service attacks could be made against the KVS by very very large KVS transactions.
Support two configurations for capping the number of transactions made by users. One for each individual transaction made by a caller and one for the combined total of operations from a fence.
For the time being, I made the default 64K for the transaction cap and 1M for the fence cap.
I made this WIP only b/c those defaults may be tweaked depending on what stats we get from the prior PR #6556. I would like to merge only after we gather a bit of data, although I'd be quite shocked if we have to adjust the defaults. Edit: Or alternately, if we'd like to just get the code in, we could default the max to LLONG_MAX and lower the default at a later time.
Only other thought is I decided to return the errno E2BIG if we went across a max cap boundary. It's possible there is a superior errno for this, I picked it b/c I thought "ehhh that's not bad".