fix: encoder silent row drop, negative/non-decimal array lengths, trailing delimiter, colon re-search#302
Conversation
β¦iling delimiter, and colon re-search
Five bugs fixed across encoder and decoder:
1. encoders.ts - encodeArrayOfArraysAsListItemsLines silently dropped
non-primitive sub-arrays. The declared header count said [N] but fewer
rows were emitted, producing structurally invalid TOON output that then
failed on decode with a confusing count mismatch. Fixed by falling back
to encodeMixedArrayAsListItemsLines for non-primitive sub-arrays.
2. parser.ts - parseBracketSegment accepted negative array lengths because
Number.parseInt("-5") is not NaN. Added length < 0 guard.
3. parser.ts - parseBracketSegment accepted hex (0x10 β 16) and float
(3.7 β 3) lengths via parseInt's permissive behaviour. Added a
/^-?\d+$/ pre-check to require a clean decimal integer string.
4. parser.ts - parseDelimitedValues produced a spurious empty final element
on trailing delimiters (e.g. "a,b,"). The post-loop condition
`values.length > 0` fired even when valueBuffer was empty, appending "".
Fixed to only push when valueBuffer.trim() is non-empty or no values
have been collected yet (empty-input case).
5. decoders.ts - decodeKeyValueSync and decodeKeyValueAsync discarded the
`end` return value from parseKeyToken and re-searched for the colon via
indexOf. This currently works by accident (escape sequences only expand
length) but is fragile. Fixed to use `end` directly in both paths.
Thanks for the bug hunt. Going through each in order: Bug 1 (encoder silent row drop) β real bug. The Bug 2 (negative array length) β real bug, fix is correct. Please add a fixture in Bug 3 (hex/float array length) β real bug, regex pre-check is correct. Same ask: one fixture per case (hex, float). The subsequent Bug 4 (trailing delimiter) β real bug. The fix subtly changes behavior for single-value inputs with trailing whitespace ( Bug 5 (colon re-search) β provably-equivalent refactor, not a bugfix. Happy to take it as cleanup if you reframe the title/description; calling it a fix is misleading. Tests for all four real bugs are the gating item. |
Summary
Five bugs found via code review of the encoder and decoder, fixed with passing tests (466 tests pass).
Bug 1 β Encoder data loss: non-primitive sub-arrays silently dropped (HIGH)
encodeArrayOfArraysAsListItemsLinesinencoders.tsonly emitted list items for sub-arrays that passisArrayOfPrimitives. Sub-arrays containing objects or nested arrays were skipped silently. The header declared[N]rows but fewer were emitted, producing structurally invalid TOON that then failed on decode with a confusing count mismatch.Fix: added an
elsebranch that falls back toencodeMixedArrayAsListItemsLinesfor non-primitive sub-arrays.Bug 2 β Negative array lengths accepted (MEDIUM)
parseBracketSegmentinparser.tschecked onlyNumber.isNaN(length).Number.parseInt("-5", 10)returns-5, notNaN, so negative lengths passed through. The decoder loopedwhile (rowCount < -5), consuming zero rows, andassertExpectedCount(0, -5, ...)produced a confusing error.Fix: added
length < 0to the rejection condition.Bug 3 β Hex and float array lengths silently accepted (MEDIUM)
parseInt("0x10")β16,parseInt("3.7")β3. Neither is a valid TOON array length but both passed theisNaNcheck.Fix: added a
/^-?\d+$/pre-check beforeparseIntto require a clean decimal integer string.Bug 4 β Trailing delimiter produces spurious empty value (MEDIUM)
parseDelimitedValues("a,b,", ",")returned["a", "b", ""]instead of["a", "b"]. After the loop, the conditionif (valueBuffer || values.length > 0)pushed an emptyvalueBufferbecausevalues.length > 0was true. In strict mode this caused a spurious field-count assertion failure; in non-strict mode it silently added an extra empty field.Fix: changed the post-loop condition to
if (valueBuffer.trim() || values.length === 0).Bug 5 β Colon position re-derived instead of using
parseKeyToken'send(LOW-MEDIUM)decodeKeyValueSyncanddecodeKeyValueAsyncboth discardedendfromparseKeyTokenand re-searched for the colon viacontent.indexOf(COLON, key.length). This works today because all current escape sequences expand length, sokey.lengthnever overshoots the separator. It will silently break if a future escape sequence maps a multi-character source to a shorter decoded key.Fix: destructure
endfromparseKeyTokenand usecontent.slice(end).trim()in both paths.Test plan
pnpm --filter @toon-format/toon testβ 466 passed, 0 failed