Stream debug_traceBlock* responses directly to avoid OOM on large blocks#9848
Draft
daniellehrner wants to merge 3 commits intohyperledger:mainfrom
Draft
Stream debug_traceBlock* responses directly to avoid OOM on large blocks#9848daniellehrner wants to merge 3 commits intohyperledger:mainfrom
daniellehrner wants to merge 3 commits intohyperledger:mainfrom
Conversation
Signed-off-by: daniellehrner <daniel.lehrner@consensys.net>
Signed-off-by: daniellehrner <daniel.lehrner@consensys.net>
Signed-off-by: daniellehrner <daniel.lehrner@consensys.net>
27af4c6 to
7927737
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR description
Converts
debug_traceBlockByNumber,debug_traceBlockByHash, and `debug_traceBlock from accumulate-then-serialize to stream-as-you-go. Previously, these methods built the entire JSON response in memory (via TransactionTrace + DebugTraceTransactionResult), which OOMs on blocks with many transactions or complex traces. Now, structLogs are written directly to the HTTP/WebSocket output stream during EVM execution.Infrastructure changes:
Trace-specific changes
opcode step
Breaking changes
JSON field ordering changed for debug_traceBlock* with OPCODE_TRACER (default tracer):
Before:
After:
gas,failed, andreturnValuenow appear after structLogs because they're only known after execution completes. JSON-RPC clients that parse by key name (standard) are unaffected. Clients that depend on field ordering willbreak.
Batch requests containing streaming methods now return {"error": {"code": -32600, "message": "Invalid request"}} for those methods instead of crashing the batch. This is new behavior but I am unsure if tracing more than one block in parallel would have been possible without OOM anyways.
Performance tests
I ran the following script to trace 10 recent blocks in a row to compare the current implementation against this PR. The script was:
On the feature node, which includes this PR we got:
On the control node, which is
mainwe got:The control node crashed during the execution, as can be seen by retruning 0 bytes on the bottom 3 blocks.
TTFB (time to first byte) is only a few ms on the feature node and several seconds on the control node, showing that the streaming works correctly and starts to send data almost immediately.
The total response time varies between the two without a clear winner.
During the tests we saw the following memory consumption:
We see the expected spikes on the control node in GC time and general memory consumption. As the memory consumption increases by several GBs we eventually run into a OOM error which crahes the node.
On the feature node GC time increases a bit, but the general memory consumption stays relatively flat, as expected because the streaming only keeps very little data in memory before writing it to the socket and deleting it right away.
Fixed Issue(s)
Thanks for sending a pull request! Have you done the following?
doc-change-requiredlabel to this PR if updates are required.Locally, you can run these tests to catch failures early:
./gradlew spotlessApply./gradlew build./gradlew acceptanceTest./gradlew integrationTest./gradlew ethereum:referenceTests:referenceTests