Skip to content

Fix custom WrappingFunction not respected by Tokenizer#299

Draft
Copilot wants to merge 6 commits intomasterfrom
copilot/add-regression-test-wrapping-function
Draft

Fix custom WrappingFunction not respected by Tokenizer#299
Copilot wants to merge 6 commits intomasterfrom
copilot/add-regression-test-wrapping-function

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Feb 15, 2026

The Tokenizer.findWrapPointsWithFunction method incorrectly calculated wrap point positions by assuming segments from wrappingFunction.getLines() would concatenate to the original text length, failing when delimiters were removed during splitting.

Changes

  • Fixed Tokenizer.findWrapPointsWithFunction(): Now searches for each segment in the original text using indexOf() with progressive search positions, then places wrap points at the start of subsequent segments to correctly handle delimiters

  • Added regression test: testCustomWrappingFunctionWithHashDelimiter() verifies custom wrapping functions are respected

Example

A custom wrapping function that splits on #:

WrappingFunction customFunc = text -> text.split("(?<=#)");
List<Token> tokens = Tokenizer.tokenize("path/to/file#another/path#third", customFunc);
// Now correctly produces 3 TEXT tokens split at '#', not at '/'

Previously, wrap points were calculated incorrectly by subtracting segment lengths from the end, causing the tokenizer to ignore the custom function and apply default wrapping behavior.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • s3.amazonaws.com
    • Triggering command: /usr/lib/jvm/temurin-17-jdk-amd64/bin/java /usr/lib/jvm/temurin-17-jdk-amd64/bin/java -jar /home/REDACTED/work/boxable/boxable/target/surefire/surefirebooter-20260215053922123_3.jar /home/REDACTED/work/boxable/boxable/target/surefire 2026-02-15T05-39-22_022-jvmRun1 surefire-20260215053922123_1tmp surefire_0-20260215053922123_2tmp (dns block)
    • Triggering command: /usr/lib/jvm/temurin-17-jdk-amd64/bin/java /usr/lib/jvm/temurin-17-jdk-amd64/bin/java -jar /home/REDACTED/work/boxable/boxable/target/surefire/surefirebooter-20260215053943000_3.jar /home/REDACTED/work/boxable/boxable/target/surefire 2026-02-15T05-39-42_905-jvmRun1 surefire-20260215053943000_1tmp surefire_0-20260215053943000_2tmp (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Original prompt

Create a regression unit test for issue #298 in dhorions/boxable: setWrappingFunction not working.

Context:

  • Issue URL: setWrappingFunction not working #298
  • Reporter sets a custom WrappingFunction on a Cell that splits on #, but Boxable still wraps text using the default wrap strategy (e.g., wrapping around / characters) when rendering/drawing.
  • They are using pdfbox 3.0.6 and boxable 1.8.2.

Goal:

  • Add a focused regression test that fails on the buggy behavior and passes once the bug is fixed.

Repo details / conventions to follow:

  • Tests live under src/test/java and use JUnit4 (org.junit.Test, org.junit.Assert).
  • There are existing tokenizer-related unit tests in src/test/java/be/quodlibet/boxable/text/TokenizerTest.java.

What to implement:

  1. Add a new unit test class under src/test/java/be/quodlibet/boxable/text/ (or extend TokenizerTest) that verifies Tokenizer.tokenize(text, wrappingFunction) respects the provided custom WrappingFunction by inserting POSSIBLE_WRAP_POINT tokens only at the boundaries implied by the function.
    • Use a custom function that splits on # and returns segments.
    • Use an input containing many / characters plus # separators (similar to the issue) and assert:
      • The number of TokenType.TEXT segments equals the expected split count.
      • None of the returned TokenType.TEXT segments have been further split on / (i.e., the segments should still contain long runs of /).
    • Keep assertions deterministic and independent of PDF rendering.
  2. (Optional but nice) Add a lightweight integration-style test that sets Cell.setWrappingFunction(...) and ensures cell.getWrappingFunction() returns a function that behaves as expected; avoid heavy PDF text extraction.
  3. Ensure tests are stable across environments (no external network calls; avoid depending on fonts in resources).

Deliverables:

Also:

  • If the test reveals a clear bug in production code and a minimal fix is obvious, propose the fix in the PR (but prioritize landing the regression test).

This pull request was created from Copilot chat.


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 4 commits February 15, 2026 05:39
…working

Co-authored-by: quodlibetbv <220320055+quodlibetbv@users.noreply.github.com>
Co-authored-by: quodlibetbv <220320055+quodlibetbv@users.noreply.github.com>
Co-authored-by: quodlibetbv <220320055+quodlibetbv@users.noreply.github.com>
Co-authored-by: quodlibetbv <220320055+quodlibetbv@users.noreply.github.com>
Copilot AI changed the title [WIP] Add regression test for setWrappingFunction issue #298 Fix custom WrappingFunction not respected by Tokenizer Feb 15, 2026
Copilot AI requested a review from quodlibetbv February 15, 2026 05:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants