Skip to content

Add double-flush reset-dictionary feature.#300

Merged
BrianPugh merged 19 commits intomainfrom
reset-dictionary
Mar 26, 2026
Merged

Add double-flush reset-dictionary feature.#300
BrianPugh merged 19 commits intomainfrom
reset-dictionary

Conversation

@BrianPugh
Copy link
Copy Markdown
Owner

@BrianPugh BrianPugh commented Mar 25, 2026

Summary

  • Add dictionary reset via double-FLUSH token sequence, enabling append-to-stream without retaining prior compressor state. Both sides re-initialize the window and continue with a fresh dictionary.
  • Uses the previously reserved more_header header bit to signal dictionary-reset-capable streams. Old decompressors (<2.1.0) reject these streams at the header, preventing silent corruption.
  • Add conf.append mode: writes a FLUSH instead of a header, allowing a new compressor to resume an existing stream after reboot.
  • Implemented across all bindings: C, Python, Cython, MicroPython, and JavaScript/WASM.

Protocol

  • more_header (header byte 1, bit 0) now implies dictionary_reset
  • Header byte 2 added (all bits reserved, must be zero)
  • Two consecutive FLUSH tokens signal dictionary re-initialization
  • FLUSH is always emitted (even when byte-aligned) in more_header streams to support append detection

API

  • TampConf.dictionary_reset / TampConf.append config fields
  • tamp_compressor_reset_dictionary() (C) / Compressor.reset_dictionary() (Python/JS)
  • Decompressor automatically handles double-FLUSH when more_header is set

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a “double-FLUSH” dictionary reset mechanism (and an append mode built on it) so new compressed segments can be appended to an existing stream without persisting compressor state, with seamless handling by updated decompressors across implementations.

Changes:

  • Add dictionary_reset + append configuration flags, plus compressor-side APIs to emit the double-FLUSH reset signal and reinitialize state.
  • Update decompressors (C + Python + WASM + Cython) to support 2-byte headers (more_header + reserved byte) and to detect double-FLUSH for dictionary reinit.
  • Add extensive cross-implementation test coverage and update the specification + language/library docs.

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
wasm/src/tamp.js Adds WASM wrapper support for dictionary_reset/append, implements resetDictionary(), and adjusts JS header parsing to allow 2-byte headers.
wasm/Makefile Exports the new WASM symbol tamp_compressor_reset_dictionary.
tests/test_compressor_decompressor.py Adds Python-level tests covering reset, multiple resets, and append mode roundtrips.
tamp/decompressor.py Implements double-FLUSH detection and dictionary reinit in the pure-Python decompressor; supports second header byte (reserved).
tamp/ctamp.pxd Extends Cython declarations to include dictionary_reset, append, and the reset function.
tamp/compressor.py Implements pure-Python reset_dictionary(), adds dictionary_reset/append, and ensures trailing FLUSH behavior for append compatibility.
tamp/_c_src/tamp/decompressor.h Adds internal state for incremental 2-byte header handling and double-FLUSH detection.
tamp/_c_src/tamp/decompressor.c Implements 2-byte header reading, incremental implicit header parsing, and double-FLUSH-triggered dictionary reset.
tamp/_c_src/tamp/compressor.h Documents and declares tamp_compressor_reset_dictionary() and append-mode init semantics.
tamp/_c_src/tamp/compressor.c Implements append-mode init, flush behavior changes for dictionary_reset, and reset_dictionary logic.
tamp/_c_src/tamp/common.h Adds dictionary_reset and append fields to TampConf.
tamp/_c_decompressor.pyx Updates Cython decompressor to read the header incrementally until complete.
tamp/_c_compressor.pyx Adds dictionary_reset/append, implements reset_dictionary(), and ensures close emits trailing FLUSH when needed.
mpy_bindings/bindings_compressor.py Exposes dictionary_reset/append and reset_dictionary() in the MicroPython Python-level wrapper.
mpy_bindings/bindings.c Extends MicroPython native bindings for dictionary_reset/append and adds a reset_dictionary method.
docs/source/specification.rst Defines header byte 2 as reserved and specifies double-FLUSH dictionary reset semantics.
docs/source/javascript.rst Documents JS usage of mid-stream dictionary reset.
docs/source/c_library.rst Documents the C API for dictionary reset and append mode, and updated FLUSH behavior.
ctests/test_runner.c Registers new C tests for dictionary reset behavior.
ctests/test_compressor.c Adds C tests for reset roundtrip, required flag behavior, and small-buffer retry scenarios.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tamp/_c_src/tamp/common.h Outdated
Comment thread tamp/_c_src/tamp/decompressor.h Outdated
Comment thread tamp/ctamp.pxd Outdated
Comment thread mpy_bindings/bindings.c
Comment thread wasm/src/tamp.js
Comment thread wasm/src/tamp.js
Comment thread wasm/src/tamp.js
BrianPugh and others added 2 commits March 25, 2026 13:41
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread wasm/src/tamp.js
Comment on lines 318 to 327
const pollResult = this.module.ccall(
'tamp_compressor_poll',
'number',
['number', 'number', 'number', 'number'],
[this.compressorPtr, outputPtr + chunkOutputWritten, CHUNK_SIZE - chunkOutputWritten, pollOutputSizePtr]
);

if (pollResult !== 0) {
if (pollResult !== TAMP_OK) {
throwOnError(pollResult, 'Compression poll');
}
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tamp_compressor_poll can return positive non-OK statuses like TAMP_OUTPUT_FULL (see C API), but this branch only calls throwOnError, which ignores non-negative values. If the output buffer has <3 bytes remaining, poll may return TAMP_OUTPUT_FULL and this loop will continue without making progress (risking an infinite loop). Handle TAMP_OUTPUT_FULL explicitly (e.g., stop polling when remaining output space is too small, flush the current output chunk, and continue with a fresh buffer).

Copilot uses AI. Check for mistakes.
Comment thread tamp/ctamp.pxd Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread wasm/src/tamp.js
Comment thread tamp/_c_src/tamp/decompressor.c
Comment thread docs/source/javascript.rst
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (1)

wasm/src/tamp.js:14

  • Only TAMP_OK is defined as a named status code, but later code still uses magic numbers for other non-error statuses (e.g. result === 2 for TAMP_INPUT_EXHAUSTED). Consider defining TAMP_OUTPUT_FULL and TAMP_INPUT_EXHAUSTED alongside TAMP_OK and updating comparisons to use them for consistency and readability.
// Status codes from C library
const TAMP_OK = 0;
const TAMP_ERROR = -1;
const TAMP_EXCESS_BITS = -2;
const TAMP_INVALID_CONF = -3;
const TAMP_OOB = -4;


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +44 to +45
uint8_t conf_dictionary_reset : 1; // Stream may contain double-FLUSH dictionary resets (from header byte 1 bit [0]
// / more_header)
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conf_dictionary_reset comment is missing a closing parenthesis, which makes the header harder to read. Please fix the comment formatting (and consider keeping the entire parenthetical on one line if possible).

Suggested change
uint8_t conf_dictionary_reset : 1; // Stream may contain double-FLUSH dictionary resets (from header byte 1 bit [0]
// / more_header)
uint8_t conf_dictionary_reset : 1; // Stream may contain double-FLUSH dictionary resets (from header byte 1 bit [0] / more_header)

Copilot uses AI. Check for mistakes.
Comment on lines +297 to 299
When ``more_header`` is set, a FLUSH is **always** emitted (even when byte-aligned)
to support append mode (see `Dictionary Reset (Double-FLUSH)`_).
In the worst case (1 pending bit), a FLUSH symbol (9 bits) and 6 padding bits are
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This note says a FLUSH is always emitted when more_header is set, but the preceding bullet list still states “If the output is already byte-aligned, no action is performed.” Please reconcile/clarify the flush behavior so the spec is unambiguous (e.g., make the “no action” case conditional on more_header being unset).

Copilot uses AI. Check for mistakes.
Comment thread tamp/decompressor.py Outdated
Comment thread tamp/_c_src/tamp/compressor.c
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@BrianPugh BrianPugh merged commit 3ba7755 into main Mar 26, 2026
86 checks passed
@BrianPugh BrianPugh deleted the reset-dictionary branch March 26, 2026 22:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants