Add double-flush reset-dictionary feature.#300
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces a “double-FLUSH” dictionary reset mechanism (and an append mode built on it) so new compressed segments can be appended to an existing stream without persisting compressor state, with seamless handling by updated decompressors across implementations.
Changes:
- Add
dictionary_reset+appendconfiguration flags, plus compressor-side APIs to emit the double-FLUSH reset signal and reinitialize state. - Update decompressors (C + Python + WASM + Cython) to support 2-byte headers (
more_header+ reserved byte) and to detect double-FLUSH for dictionary reinit. - Add extensive cross-implementation test coverage and update the specification + language/library docs.
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
wasm/src/tamp.js |
Adds WASM wrapper support for dictionary_reset/append, implements resetDictionary(), and adjusts JS header parsing to allow 2-byte headers. |
wasm/Makefile |
Exports the new WASM symbol tamp_compressor_reset_dictionary. |
tests/test_compressor_decompressor.py |
Adds Python-level tests covering reset, multiple resets, and append mode roundtrips. |
tamp/decompressor.py |
Implements double-FLUSH detection and dictionary reinit in the pure-Python decompressor; supports second header byte (reserved). |
tamp/ctamp.pxd |
Extends Cython declarations to include dictionary_reset, append, and the reset function. |
tamp/compressor.py |
Implements pure-Python reset_dictionary(), adds dictionary_reset/append, and ensures trailing FLUSH behavior for append compatibility. |
tamp/_c_src/tamp/decompressor.h |
Adds internal state for incremental 2-byte header handling and double-FLUSH detection. |
tamp/_c_src/tamp/decompressor.c |
Implements 2-byte header reading, incremental implicit header parsing, and double-FLUSH-triggered dictionary reset. |
tamp/_c_src/tamp/compressor.h |
Documents and declares tamp_compressor_reset_dictionary() and append-mode init semantics. |
tamp/_c_src/tamp/compressor.c |
Implements append-mode init, flush behavior changes for dictionary_reset, and reset_dictionary logic. |
tamp/_c_src/tamp/common.h |
Adds dictionary_reset and append fields to TampConf. |
tamp/_c_decompressor.pyx |
Updates Cython decompressor to read the header incrementally until complete. |
tamp/_c_compressor.pyx |
Adds dictionary_reset/append, implements reset_dictionary(), and ensures close emits trailing FLUSH when needed. |
mpy_bindings/bindings_compressor.py |
Exposes dictionary_reset/append and reset_dictionary() in the MicroPython Python-level wrapper. |
mpy_bindings/bindings.c |
Extends MicroPython native bindings for dictionary_reset/append and adds a reset_dictionary method. |
docs/source/specification.rst |
Defines header byte 2 as reserved and specifies double-FLUSH dictionary reset semantics. |
docs/source/javascript.rst |
Documents JS usage of mid-stream dictionary reset. |
docs/source/c_library.rst |
Documents the C API for dictionary reset and append mode, and updated FLUSH behavior. |
ctests/test_runner.c |
Registers new C tests for dictionary reset behavior. |
ctests/test_compressor.c |
Adds C tests for reset roundtrip, required flag behavior, and small-buffer retry scenarios. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 20 out of 20 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const pollResult = this.module.ccall( | ||
| 'tamp_compressor_poll', | ||
| 'number', | ||
| ['number', 'number', 'number', 'number'], | ||
| [this.compressorPtr, outputPtr + chunkOutputWritten, CHUNK_SIZE - chunkOutputWritten, pollOutputSizePtr] | ||
| ); | ||
|
|
||
| if (pollResult !== 0) { | ||
| if (pollResult !== TAMP_OK) { | ||
| throwOnError(pollResult, 'Compression poll'); | ||
| } |
There was a problem hiding this comment.
tamp_compressor_poll can return positive non-OK statuses like TAMP_OUTPUT_FULL (see C API), but this branch only calls throwOnError, which ignores non-negative values. If the output buffer has <3 bytes remaining, poll may return TAMP_OUTPUT_FULL and this loop will continue without making progress (risking an infinite loop). Handle TAMP_OUTPUT_FULL explicitly (e.g., stop polling when remaining output space is too small, flush the current output chunk, and continue with a fresh buffer).
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 20 out of 20 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 20 out of 20 changed files in this pull request and generated 4 comments.
Comments suppressed due to low confidence (1)
wasm/src/tamp.js:14
- Only
TAMP_OKis defined as a named status code, but later code still uses magic numbers for other non-error statuses (e.g.result === 2forTAMP_INPUT_EXHAUSTED). Consider definingTAMP_OUTPUT_FULLandTAMP_INPUT_EXHAUSTEDalongsideTAMP_OKand updating comparisons to use them for consistency and readability.
// Status codes from C library
const TAMP_OK = 0;
const TAMP_ERROR = -1;
const TAMP_EXCESS_BITS = -2;
const TAMP_INVALID_CONF = -3;
const TAMP_OOB = -4;
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| uint8_t conf_dictionary_reset : 1; // Stream may contain double-FLUSH dictionary resets (from header byte 1 bit [0] | ||
| // / more_header) |
There was a problem hiding this comment.
The conf_dictionary_reset comment is missing a closing parenthesis, which makes the header harder to read. Please fix the comment formatting (and consider keeping the entire parenthetical on one line if possible).
| uint8_t conf_dictionary_reset : 1; // Stream may contain double-FLUSH dictionary resets (from header byte 1 bit [0] | |
| // / more_header) | |
| uint8_t conf_dictionary_reset : 1; // Stream may contain double-FLUSH dictionary resets (from header byte 1 bit [0] / more_header) |
| When ``more_header`` is set, a FLUSH is **always** emitted (even when byte-aligned) | ||
| to support append mode (see `Dictionary Reset (Double-FLUSH)`_). | ||
| In the worst case (1 pending bit), a FLUSH symbol (9 bits) and 6 padding bits are |
There was a problem hiding this comment.
This note says a FLUSH is always emitted when more_header is set, but the preceding bullet list still states “If the output is already byte-aligned, no action is performed.” Please reconcile/clarify the flush behavior so the spec is unambiguous (e.g., make the “no action” case conditional on more_header being unset).
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 20 out of 20 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Summary
more_headerheader bit to signal dictionary-reset-capable streams. Old decompressors (<2.1.0) reject these streams at the header, preventing silent corruption.conf.appendmode: writes a FLUSH instead of a header, allowing a new compressor to resume an existing stream after reboot.Protocol
more_header(header byte 1, bit 0) now impliesdictionary_resetmore_headerstreams to support append detectionAPI
TampConf.dictionary_reset/TampConf.appendconfig fieldstamp_compressor_reset_dictionary()(C) /Compressor.reset_dictionary()(Python/JS)more_headeris set