Skip to content

fix: handle URLs with balanced parentheses in URL detection#152

Open
ericjypark wants to merge 2 commits into
coder:mainfrom
ericjypark:fix/url-parens-detection
Open

fix: handle URLs with balanced parentheses in URL detection#152
ericjypark wants to merge 2 commits into
coder:mainfrom
ericjypark:fix/url-parens-detection

Conversation

@ericjypark
Copy link
Copy Markdown

Problem

URLs containing parentheses — such as Wikipedia links like https://en.wikipedia.org/wiki/Rust_(programming_language) — are incorrectly truncated. The URL regex character class excludes ( and ), so the match stops at the first parenthesis. Additionally, TRAILING_PUNCTUATION unconditionally strips ), breaking URLs where parentheses are part of the path.

Fix

  • Add () to the URL regex character class so parentheses are captured
  • Remove ) from TRAILING_PUNCTUATION to preserve balanced parens
  • Add a balanced-paren stripping pass: only strip trailing ) when the URL has more close-parens than open-parens (handles the common case of a URL wrapped in prose parentheses like (see https://...))

Test Plan

  • bun test lib/url-detection.test.ts — 24/24 pass (4 new tests for parens handling)
  • biome check . — clean
  • tsc --noEmit — clean
  • Existing test 'strips trailing parenthesis' with (see https://example.com) still passes

URLs containing parentheses (e.g., Wikipedia links like
https://en.wikipedia.org/wiki/Rust_(programming_language)) were
truncated at the first `(` because the URL regex character class
excluded parentheses, and TRAILING_PUNCTUATION unconditionally
stripped `)`.

- Add `()` to URL_REGEX character class
- Remove `)` from TRAILING_PUNCTUATION to avoid stripping balanced parens
- Add balanced-paren stripping: only strip trailing `)` when unbalanced
- Add tests for Wikipedia URLs, nested parens, and wrapped URLs
diegosouzapw added a commit to diegosouzapw/ghostty-web that referenced this pull request May 23, 2026
URLs containing parentheses — such as Wikipedia links like
https://en.wikipedia.org/wiki/Rust_(programming_language) — were
incorrectly truncated. The URL regex character class excluded `(` and
`)`, so the match stopped at the first parenthesis. Additionally,
TRAILING_PUNCTUATION unconditionally stripped `)`, breaking URLs where
parentheses are part of the path.

Fix:
- Add `()` to the URL regex character class so parentheses are captured
- Remove `)` from TRAILING_PUNCTUATION to preserve balanced parens
- Add a balanced-paren stripping pass: only strip trailing `)` when the
  URL has more closes than opens (e.g. URL wrapped in surrounding parens)

Adds four unit tests covering Wikipedia paths, wrapped URLs, multiple
parenthesized path segments, and nested parentheses.


Inspired-by: coder#152

Co-authored-by: eric-jy-park <2019147551@yonsei.ac.kr>
@diegosouzapw
Copy link
Copy Markdown

Hi @ericjypark! 👋

Your work on this PR inspired a commit in my fork diegosouzapw/ghostty-web.
I ported the URL detection fix for balanced parentheses and added you as co-author in the corresponding commit — thank you for the contribution!

I'm working on OmniRoute, a project that provides free access to LLM models, and I'm planning to use ghostty-web as the terminal component there. Your work is part of what makes that possible. 🙏

Feel free to check it out — contributions and feedback are very welcome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants