Fix memory corruption/JVM crash in RediSearch by copying pooled ByteBuffers#3664
Fix memory corruption/JVM crash in RediSearch by copying pooled ByteBuffers#3664YoHanKi wants to merge 5 commits intoredis:mainfrom
Conversation
…uffer - Update EncodedComplexOutput to store a copy of ByteBuffer instead of a read-only view - Add integration test for FT.SEARCH with large JSON payloads to verify stability
|
Thank you @YoHanKi |
atakavci
left a comment
There was a problem hiding this comment.
hey @YoHanKi, thanks for your contribution!
looks good to me.
cc: @a-TODO-rov
Use copy.put(source) to align with other Output implementations and avoid unnecessary duplication.
a-TODO-rov
left a comment
There was a problem hiding this comment.
Well i ran my reproducer and it seems the memory corruption still exists ... I didn't have the time to analyze further.
Reproducer
public class RediSearchBugProver {
private static final Logger log = LoggerFactory.getLogger(RediSearchBugProver.class);
public static void main(String[] args) {
// System.setProperty("org.slf4j.simpleLogger.log.io.lettuce.core.protocol", "trace");
RedisURI uri = RedisURI.Builder.redis("localhost", 6379)
.withTimeout(Duration.ofSeconds(30))
.build();
try (var client = RedisClient.create(uri)) {
client.setOptions(ClientOptions.builder()
// increase it drastically just to be sure discardBytesRatio is not the cause
.decodeBufferPolicy(DecodeBufferPolicies.ratio(Integer.MAX_VALUE / 2.0f))
// ft.search result on RESP2 is an array, on RESP3 a map. So error is different
.protocolVersion(ProtocolVersion.RESP3)
.build());
try (var connection = client.connect()) {
runTest(connection, String.valueOf(System.currentTimeMillis()));
}
}
}
private static void runTest(StatefulRedisConnection<String, String> con, String space) {
var prefix = "test-" + space + ":";
var index = "idx-" + space;
log.info("===> running with {}, {}", prefix, index);
con.sync().ftCreate(
index,
CreateArgs.<String, String>builder()
.on(CreateArgs.TargetType.JSON)
.withPrefix(prefix)
.build(),
List.of(NumericFieldArgs.<String>builder().name("pos").build())
);
var searchArgs = SearchArgs.<String, String>builder()
.sortBy(SortByArgs.<String>builder().attribute("pos").build())
.limit(0, 10_000)
.build();
var expected = new ArrayList<>();
for (int i = 1; i <= 1000; i++) {
var latest = """
{"pos":%d,"ts":%d,"large":"just here to make the response larger to some great extend and overflow the buffers"}
""".formatted(i, System.currentTimeMillis()).trim();
con.sync().jsonSet(prefix + i, JsonPath.ROOT_PATH, latest);
expected.add(latest);
if (i >= 924) {
log.info("=== search {}", i);
var searchReply = con.sync().ftSearch(index, "*", searchArgs);
// with RESP3 this simply returns 0 as the map keys in Resp3SearchResultsParser are not found
assertEquals(expected.size(), searchReply.getCount());
for (int t = 1; t <= expected.size(); t++) {
var fields = searchReply.getResults().get(t - 1).getFields();
try {
assertEquals(expected.get(t - 1), fields.get("$"));
} catch (AssertionFailedError e) {
// with RESP2 this shows strange fields instead of the expected '$={"pos":..."'
log.info("Fields at pos {}: {}", t - 1, fields);
throw new AssertionFailedError("On loop " + i + ": " + e.getMessage(), e);
}
}
}
}
}
}Error
[main] WARN io.netty.resolver.dns.DnsServerAddressStreamProviders - Can not find io.netty.resolver.dns.macos.MacOSDnsServerAddressStreamProvider in the classpath, fallback to system defaults. This may result in incorrect DNS resolutions on MacOS. Check whether you have a dependency on 'io.netty:netty-resolver-dns-native-macos'
[main] INFO RediSearchBugProver - ===> running with test-1771235215479:, idx-1771235215479
[main] INFO RediSearchBugProver - === search 924
[main] INFO RediSearchBugProver - Fields at pos 0: {$={"pos":12,"ts":1771235215499,"large":"just here to make the response larger to some great extend and overflow the buffers"}}
Exception in thread "main" org.opentest4j.AssertionFailedError: On loop 924: expected: <{"pos":1,"ts":1771235215484,"large":"just here to make the response larger to some great extend and overflow the buffers"}> but was: <{"pos":12,"ts":1771235215499,"large":"just here to make the response larger to some great extend and overflow the buffers"}>
at RediSearchBugProver.runTest(RediSearchBugProver.java:83)
at RediSearchBugProver.main(RediSearchBugProver.java:40)
Caused by: org.opentest4j.AssertionFailedError: expected: <{"pos":1,"ts":1771235215484,"large":"just here to make the response larger to some great extend and overflow the buffers"}> but was: <{"pos":12,"ts":1771235215499,"large":"just here to make the response larger to some great extend and overflow the buffers"}>
at org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:158)
at org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:139)
at org.junit.jupiter.api.AssertEquals.failNotEqual(AssertEquals.java:201)
at org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:184)
at org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:179)
at org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:1188)
at RediSearchBugProver.runTest(RediSearchBugProver.java:79)
... 1 more|
@YoHanKi I also tried tightening your integration test, similar to the reproducer, but it seems it cannot be reproduced with integration test. Probably some environment factors are involved in the problem -like java version for example. Didn't have time to investigate further. |
|
@a-TODO-rov — thank you again for taking the time to test and for sharing the crash log. After applying the copy() change, I haven’t been able to reproduce the remaining corruption on my side. I ran the full regression suite — including a test that mirrors your reproducer as closely as I can (small ~95-byte payloads, sortBy("pos"), incremental insert + search) — across the following matrix on Linux (x86-64, Docker):
Tests covered:
For reference, the unfixed version (using asReadOnlyBuffer()) triggers a JVM SIGSEGV crash across all combinations above. Update — additional Linux environment verification:
On the other hand, the unfixed version (asReadOnlyBuffer()) crashes with SIGSEGV in jbyte_disjoint_arraycopy_avx3 on all five environments above — consistent with what you reported. Since I can’t reproduce a failure with the copy() fix across any of these combinations, I’m concerned I might be missing a condition that triggers a different code path. If you don’t mind, could you share a bit more about your environment and how you ran the reproducer? Even small details may help narrow this down:
One more thing: I noticed you mentioned near the end of PR #3664 that a very similar approach didn’t resolve the issue. I re-tested with the same code structure and observed:
If there’s any subtle difference in your local patch or the exact reproduction steps (even something that seems minor), I’d really appreciate pointers — I want to make sure I’m not overlooking anything. Thanks again — any additional details you can share would be very helpful for validating the fix and ensuring we cover the real root cause. |
Make sure that:
mvn formatter:formattarget. Don’t submit any formatting related changes.Description
This PR fixes a memory-safety issue in RediSearch response parsing that can lead to data corruption or JVM crashes (SIGSEGV) under Netty pooled buffer reuse scenarios.
Problem
EncodedComplexOutputstored read-only views of Netty pooledByteBuffers.Because pooled buffers can be reclaimed/reused before parsing completes, the stored views may later point to mutated/freed memory, causing:
Solution
EncodedComplexOutputto copy the ByteBuffer contents into a heap-allocated buffer before storing it.Verification
testSearchWithLargeJsonPayloadstoRediSearchIntegrationTests.4.2.4.FinalPARANOIDChanges
Split from #3624