Skip to content

ProtobufDeserializer crashes with TypeError: Cannot read properties of undefined (reading nestedMessages) on out-of-range message indexes #467

@mikelese

Description

@mikelese

Description

ProtobufDeserializer.deserialize() crashes with an unhelpful TypeError when message index resolution encounters out-of-range values:

TypeError: Cannot read properties of undefined (reading 'nestedMessages')
    at ProtobufDeserializer.toNestedMessageDesc (.../serde/protobuf.js:371:48)
    at ProtobufDeserializer.toMessageDescFromIndexes (.../serde/protobuf.js:364:21)
    at ProtobufDeserializer.deserialize (.../serde/protobuf.js:307:34)

There are two related problems:

  1. No bounds checking in toMessageDescFromIndexes / toNestedMessageDesc: When fd.messages[index] returns undefined (index out of range), the code passes undefined to toNestedMessageDesc, which crashes accessing .nestedMessages on undefined.

  2. readMessageIndexes reads protobuf payload bytes as message indexes when indexes are absent: When a producer does not include message index bytes in the wire format (valid for single-message schemas — the first message is the implicit default), readMessageIndexes reads the first byte of the protobuf payload as the varint count, producing a non-zero garbage value. It then reads more payload bytes as index varints, generating nonsensical out-of-range indexes that crash the deserializer.

This is a separate bug from #455 (file-level enums). Both can occur on the same codebase — #455 affects schemas with file-level enums, while this one affects any protobuf topic where the producer omits message index bytes.

Environment

  • @confluentinc/schemaregistry: 1.8.2
  • @confluentinc/kafka-javascript: 1.8.2
  • Node: 22.15.0
  • OS: macOS (darwin arm64)

Minimal reproduction

// Standard consumer usage — nothing unusual.
import { SchemaRegistryClient } from '@confluentinc/schemaregistry';
import { ProtobufDeserializer } from '@confluentinc/schemaregistry/serde/protobuf';

const client = new SchemaRegistryClient({
  baseURLs: ['https://schema-registry.example.com'],
  basicAuthCredentials: { credentialsSource: 'USER_INFO', userInfo: 'user:pass' },
});
const deserializer = new ProtobufDeserializer(client, 1 /* VALUE */, {});

// Inside Kafka consumer eachMessage handler:
const decoded = await deserializer.deserialize(topic, message.value);
// => TypeError: Cannot read properties of undefined (reading 'nestedMessages')

The schema is a single-message proto (one top-level message type). The message is produced by an upstream Confluent producer and read directly from Kafka — no custom wire format manipulation.

What happens internally: The producer omits the message index array for single-message schemas. readMessageIndexes reads the first protobuf payload byte (0x0a) as the varint count, zigzag-decodes it to 5, then reads 5 more payload bytes as indexes — producing [-3, 52, -51, 54, 54]. These garbage values crash toMessageDescFromIndexes.

Root cause

1. readMessageIndexes reads payload bytes as indexes when indexes are absent

// serde.js — SchemaId.readMessageIndexes
readMessageIndexes(payload) {
    const bw = new BufferWrapper(payload);
    const count = bw.readVarInt();   // ← reads first byte of protobuf payload if indexes are absent
    if (count == 0) {
        return [1, [0]];
    }
    const msgIndexes = [];
    for (let i = 0; i < count; i++) {
        msgIndexes.push(bw.readVarInt());  // ← reads more payload bytes as indexes
    }
    return [bw.pos, msgIndexes];
}

When the producer omits the message index array, the slice passed to readMessageIndexes starts at the protobuf payload. For example, a MyMessage { name: "hello" } payload starts with 0x0a (field 1, wire type 2). readVarInt() zigzag-decodes this as 5, so it reads 5 more varints from the payload — producing garbage indexes.

2. toMessageDescFromIndexes crashes on undefined — no bounds checking

// protobuf.js — ProtobufDeserializer
toMessageDescFromIndexes(fd, msgIndexes) {
    let index = msgIndexes[0];
    if (msgIndexes.length === 1) {
        return fd.messages[index];       // ← undefined if index >= fd.messages.length
    }
    return this.toNestedMessageDesc(fd.messages[index], msgIndexes.slice(1));
}

toNestedMessageDesc(parent, msgIndexes) {
    let index = msgIndexes[0];
    if (msgIndexes.length === 1) {
        return parent.nestedMessages[index];   // ← TypeError if parent is undefined
    }
    return this.toNestedMessageDesc(parent.nestedMessages[index], msgIndexes.slice(1));
}

Expected behavior

  1. Bounds checking: toMessageDescFromIndexes and toNestedMessageDesc should validate that indexes are within range and throw a descriptive error (e.g., "message index 15 out of range, schema has 1 top-level message(s)").

  2. Graceful fallback for absent indexes: readMessageIndexes should handle the case where message indexes are absent for single-message schemas. For example, if the parsed count exceeds a reasonable threshold or doesn't match the schema's message count, default to [0].

Impact

This crashes deserialization for any protobuf topic where the producer omits message index bytes for single-message schemas. The TypeError: Cannot read properties of undefined stack trace gives no indication of the actual problem, making it extremely difficult to debug without reading the library source.

We hit this in QA on a billing pipeline consuming from a topic written by an upstream Confluent producer. The workaround was replacing ProtobufDeserializer entirely with a custom deserializer that parses the Confluent wire format manually.

Note: The same class of bug exists in confluent-kafka-go — the Go client panics with runtime error: index out of range [-8] under the same conditions.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions