Skip to content

Fix HTML entity encoding/decoding in markdown conversion#7565

Open
bdbch wants to merge 12 commits intomainfrom
claude/fix-tiptap-issue-7539-NyhyF
Open

Fix HTML entity encoding/decoding in markdown conversion#7565
bdbch wants to merge 12 commits intomainfrom
claude/fix-tiptap-issue-7539-NyhyF

Conversation

@bdbch
Copy link
Member

@bdbch bdbch commented Mar 5, 2026

Changes Overview

This PR fixes HTML entity handling in markdown parsing and serialization to ensure proper roundtripping of special characters like <, >, and &. Previously, these characters were not being properly encoded/decoded, causing data loss or corruption during markdown conversion.

Implementation Approach

  1. Added HTML entity decoding during parsing: When parsing markdown tokens, HTML entities (&lt;, &gt;, &quot;, &amp;) are now decoded to their literal character equivalents so they display correctly in the editor.

  2. Added HTML entity encoding during serialization: When serializing editor content back to markdown, special characters are encoded to their HTML entity equivalents to ensure safe roundtripping.

  3. Preserved literal characters in code contexts: Code blocks and inline code marks are excluded from entity encoding since they should preserve literal <, >, and & characters without escaping.

  4. Implemented proper encoding order:

    • Decoding: &lt;, &gt;, &quot; are decoded first, then &amp; last to handle doubly-encoded sequences correctly (e.g., &amp;lt;&lt;)
    • Encoding: & is encoded first to avoid double-encoding (e.g., <&lt;, not &amp;lt;)
  5. Added Code extension to test setup: The Code extension was added to the test configuration to support inline code mark testing.

Testing Done

Comprehensive test suite added covering:

  • Basic entity decoding (&lt;, &gt;, &amp;)
  • Basic entity encoding during serialization
  • Roundtrip conversion (parse → serialize → parse)
  • Doubly-encoded entities (&amp;lt;)
  • Preservation of literal characters in code blocks and inline code
  • Special case handling for &nbsp; in empty paragraphs

All tests pass and verify correct behavior across parsing, serialization, and roundtripping scenarios.

Verification Steps

  1. Run the test suite: npm test -- packages/markdown/__tests__/conversion.spec.ts
  2. Verify that all new tests in the "HTML character escaping (issue MarkdownManager not handling escaped HTML characters correctly #7539)" describe block pass
  3. Test roundtripping: Parse markdown with entities → serialize back → verify entities are preserved
  4. Verify code blocks and inline code preserve literal <, >, & characters without encoding

Additional Notes

The implementation handles edge cases like doubly-encoded entities and preserves the special behavior of &nbsp; for empty paragraphs. The entity encoding/decoding logic is centralized in utility functions for consistency across the codebase.

Checklist

  • I have created a changeset for this PR if necessary.
  • My changes do not break the library.
  • I have added tests where applicable.
  • I have followed the project guidelines.
  • I have fixed any lint issues.

Related Issues

Fixes #7539

Copilot AI review requested due to automatic review settings March 5, 2026 09:51
@netlify
Copy link

netlify bot commented Mar 5, 2026

Deploy Preview for tiptap-embed ready!

Name Link
🔨 Latest commit 3774f9e
🔍 Latest deploy log https://app.netlify.com/projects/tiptap-embed/deploys/69b9815abdaa750007d1146d
😎 Deploy Preview https://deploy-preview-7565--tiptap-embed.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@changeset-bot
Copy link

changeset-bot bot commented Mar 5, 2026

🦋 Changeset detected

Latest commit: 3774f9e

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 72 packages
Name Type
@tiptap/core Patch
@tiptap/markdown Patch
@tiptap/extension-audio Patch
@tiptap/extension-blockquote Patch
@tiptap/extension-bold Patch
@tiptap/extension-bubble-menu Patch
@tiptap/extension-code-block-lowlight Patch
@tiptap/extension-code-block Patch
@tiptap/extension-code Patch
@tiptap/extension-collaboration-caret Patch
@tiptap/extension-collaboration Patch
@tiptap/extension-details Patch
@tiptap/extension-document Patch
@tiptap/extension-drag-handle Patch
@tiptap/extension-emoji Patch
@tiptap/extension-file-handler Patch
@tiptap/extension-floating-menu Patch
@tiptap/extension-hard-break Patch
@tiptap/extension-heading Patch
@tiptap/extension-highlight Patch
@tiptap/extension-horizontal-rule Patch
@tiptap/extension-image Patch
@tiptap/extension-invisible-characters Patch
@tiptap/extension-italic Patch
@tiptap/extension-link Patch
@tiptap/extension-list Patch
@tiptap/extension-mathematics Patch
@tiptap/extension-mention Patch
@tiptap/extension-node-range Patch
@tiptap/extension-paragraph Patch
@tiptap/extension-strike Patch
@tiptap/extension-subscript Patch
@tiptap/extension-superscript Patch
@tiptap/extension-table-of-contents Patch
@tiptap/extension-table Patch
@tiptap/extension-text-align Patch
@tiptap/extension-text-style Patch
@tiptap/extension-text Patch
@tiptap/extension-twitch Patch
@tiptap/extension-typography Patch
@tiptap/extension-underline Patch
@tiptap/extension-unique-id Patch
@tiptap/extension-youtube Patch
@tiptap/extensions Patch
@tiptap/html Patch
@tiptap/react Patch
@tiptap/starter-kit Patch
@tiptap/static-renderer Patch
@tiptap/suggestion Patch
@tiptap/vue-2 Patch
@tiptap/vue-3 Patch
@tiptap/extension-drag-handle-react Patch
@tiptap/extension-drag-handle-vue-2 Patch
@tiptap/extension-drag-handle-vue-3 Patch
@tiptap/extension-bullet-list Patch
@tiptap/extension-ordered-list Patch
@tiptap/extension-list-item Patch
@tiptap/extension-list-keymap Patch
@tiptap/extension-task-item Patch
@tiptap/extension-task-list Patch
@tiptap/extension-table-cell Patch
@tiptap/extension-table-header Patch
@tiptap/extension-table-row Patch
@tiptap/extension-color Patch
@tiptap/extension-font-family Patch
@tiptap/extension-character-count Patch
@tiptap/extension-dropcursor Patch
@tiptap/extension-focus Patch
@tiptap/extension-gapcursor Patch
@tiptap/extension-history Patch
@tiptap/extension-placeholder Patch
@tiptap/pm Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes HTML entity handling in the Markdown conversion pipeline to ensure safe/consistent roundtripping of special characters (notably <, >, &) between markdown ↔︎ editor JSON, with explicit exceptions for code contexts.

Changes:

  • Added decodeHtmlEntities / encodeHtmlEntities utilities to @tiptap/core and re-exported them from @tiptap/markdown utils for compatibility.
  • Decoded entities when parsing markdown text tokens and encoded special characters when serializing text nodes back to markdown (skipping code blocks / inline code).
  • Added regression tests covering decoding, encoding, roundtrips, doubly-encoded sequences, and &nbsp; empty-paragraph behavior.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
packages/markdown/src/utils.ts Re-exports core html entity utilities to keep local utils imports stable.
packages/markdown/src/MarkdownManager.ts Applies entity decoding during parsing and entity encoding during serialization (with code-context exclusions).
packages/markdown/tests/conversion.spec.ts Adds targeted tests for entity decode/encode + roundtrip behavior; includes Code mark in setup.
packages/extension-text/src/text.ts Decodes common entities when parsing markdown text tokens into text nodes.
packages/core/src/utilities/index.ts Exposes the new html entity utility module via core utilities exports.
packages/core/src/utilities/htmlEntities.ts Implements encode/decode helpers with ordering to handle doubly-encoded sequences.

@pkg-pr-new
Copy link

pkg-pr-new bot commented Mar 5, 2026

Open in StackBlitz

@tiptap/extension-character-count

npm i https://pkg.pr.new/@tiptap/extension-character-count@7565

@tiptap/extension-focus

npm i https://pkg.pr.new/@tiptap/extension-focus@7565

@tiptap/extension-dropcursor

npm i https://pkg.pr.new/@tiptap/extension-dropcursor@7565

@tiptap/extension-gapcursor

npm i https://pkg.pr.new/@tiptap/extension-gapcursor@7565

@tiptap/extension-history

npm i https://pkg.pr.new/@tiptap/extension-history@7565

@tiptap/extension-list-item

npm i https://pkg.pr.new/@tiptap/extension-list-item@7565

@tiptap/extension-list-keymap

npm i https://pkg.pr.new/@tiptap/extension-list-keymap@7565

@tiptap/extension-placeholder

npm i https://pkg.pr.new/@tiptap/extension-placeholder@7565

@tiptap/extension-table-header

npm i https://pkg.pr.new/@tiptap/extension-table-header@7565

@tiptap/extension-table-cell

npm i https://pkg.pr.new/@tiptap/extension-table-cell@7565

@tiptap/extension-table-row

npm i https://pkg.pr.new/@tiptap/extension-table-row@7565

@tiptap/extension-task-item

npm i https://pkg.pr.new/@tiptap/extension-task-item@7565

@tiptap/extension-task-list

npm i https://pkg.pr.new/@tiptap/extension-task-list@7565

@tiptap/core

npm i https://pkg.pr.new/@tiptap/core@7565

@tiptap/extension-blockquote

npm i https://pkg.pr.new/@tiptap/extension-blockquote@7565

@tiptap/extension-bold

npm i https://pkg.pr.new/@tiptap/extension-bold@7565

@tiptap/extension-audio

npm i https://pkg.pr.new/@tiptap/extension-audio@7565

@tiptap/extension-bullet-list

npm i https://pkg.pr.new/@tiptap/extension-bullet-list@7565

@tiptap/extension-bubble-menu

npm i https://pkg.pr.new/@tiptap/extension-bubble-menu@7565

@tiptap/extension-code

npm i https://pkg.pr.new/@tiptap/extension-code@7565

@tiptap/extension-code-block-lowlight

npm i https://pkg.pr.new/@tiptap/extension-code-block-lowlight@7565

@tiptap/extension-code-block

npm i https://pkg.pr.new/@tiptap/extension-code-block@7565

@tiptap/extension-collaboration

npm i https://pkg.pr.new/@tiptap/extension-collaboration@7565

@tiptap/extension-collaboration-caret

npm i https://pkg.pr.new/@tiptap/extension-collaboration-caret@7565

@tiptap/extension-color

npm i https://pkg.pr.new/@tiptap/extension-color@7565

@tiptap/extension-details

npm i https://pkg.pr.new/@tiptap/extension-details@7565

@tiptap/extension-document

npm i https://pkg.pr.new/@tiptap/extension-document@7565

@tiptap/extension-drag-handle

npm i https://pkg.pr.new/@tiptap/extension-drag-handle@7565

@tiptap/extension-drag-handle-react

npm i https://pkg.pr.new/@tiptap/extension-drag-handle-react@7565

@tiptap/extension-drag-handle-vue-3

npm i https://pkg.pr.new/@tiptap/extension-drag-handle-vue-3@7565

@tiptap/extension-drag-handle-vue-2

npm i https://pkg.pr.new/@tiptap/extension-drag-handle-vue-2@7565

@tiptap/extension-floating-menu

npm i https://pkg.pr.new/@tiptap/extension-floating-menu@7565

@tiptap/extension-file-handler

npm i https://pkg.pr.new/@tiptap/extension-file-handler@7565

@tiptap/extension-emoji

npm i https://pkg.pr.new/@tiptap/extension-emoji@7565

@tiptap/extension-hard-break

npm i https://pkg.pr.new/@tiptap/extension-hard-break@7565

@tiptap/extension-font-family

npm i https://pkg.pr.new/@tiptap/extension-font-family@7565

@tiptap/extension-heading

npm i https://pkg.pr.new/@tiptap/extension-heading@7565

@tiptap/extension-highlight

npm i https://pkg.pr.new/@tiptap/extension-highlight@7565

@tiptap/extension-horizontal-rule

npm i https://pkg.pr.new/@tiptap/extension-horizontal-rule@7565

@tiptap/extension-image

npm i https://pkg.pr.new/@tiptap/extension-image@7565

@tiptap/extension-invisible-characters

npm i https://pkg.pr.new/@tiptap/extension-invisible-characters@7565

@tiptap/extension-italic

npm i https://pkg.pr.new/@tiptap/extension-italic@7565

@tiptap/extension-link

npm i https://pkg.pr.new/@tiptap/extension-link@7565

@tiptap/extension-list

npm i https://pkg.pr.new/@tiptap/extension-list@7565

@tiptap/extension-mathematics

npm i https://pkg.pr.new/@tiptap/extension-mathematics@7565

@tiptap/extension-mention

npm i https://pkg.pr.new/@tiptap/extension-mention@7565

@tiptap/extension-node-range

npm i https://pkg.pr.new/@tiptap/extension-node-range@7565

@tiptap/extension-ordered-list

npm i https://pkg.pr.new/@tiptap/extension-ordered-list@7565

@tiptap/extension-strike

npm i https://pkg.pr.new/@tiptap/extension-strike@7565

@tiptap/extension-subscript

npm i https://pkg.pr.new/@tiptap/extension-subscript@7565

@tiptap/extension-paragraph

npm i https://pkg.pr.new/@tiptap/extension-paragraph@7565

@tiptap/extension-superscript

npm i https://pkg.pr.new/@tiptap/extension-superscript@7565

@tiptap/extension-table

npm i https://pkg.pr.new/@tiptap/extension-table@7565

@tiptap/extension-table-of-contents

npm i https://pkg.pr.new/@tiptap/extension-table-of-contents@7565

@tiptap/extension-text

npm i https://pkg.pr.new/@tiptap/extension-text@7565

@tiptap/extension-text-align

npm i https://pkg.pr.new/@tiptap/extension-text-align@7565

@tiptap/extension-text-style

npm i https://pkg.pr.new/@tiptap/extension-text-style@7565

@tiptap/extension-typography

npm i https://pkg.pr.new/@tiptap/extension-typography@7565

@tiptap/extension-twitch

npm i https://pkg.pr.new/@tiptap/extension-twitch@7565

@tiptap/extension-underline

npm i https://pkg.pr.new/@tiptap/extension-underline@7565

@tiptap/extension-unique-id

npm i https://pkg.pr.new/@tiptap/extension-unique-id@7565

@tiptap/extension-youtube

npm i https://pkg.pr.new/@tiptap/extension-youtube@7565

@tiptap/extensions

npm i https://pkg.pr.new/@tiptap/extensions@7565

@tiptap/markdown

npm i https://pkg.pr.new/@tiptap/markdown@7565

@tiptap/html

npm i https://pkg.pr.new/@tiptap/html@7565

@tiptap/react

npm i https://pkg.pr.new/@tiptap/react@7565

@tiptap/starter-kit

npm i https://pkg.pr.new/@tiptap/starter-kit@7565

@tiptap/pm

npm i https://pkg.pr.new/@tiptap/pm@7565

@tiptap/static-renderer

npm i https://pkg.pr.new/@tiptap/static-renderer@7565

@tiptap/suggestion

npm i https://pkg.pr.new/@tiptap/suggestion@7565

@tiptap/vue-3

npm i https://pkg.pr.new/@tiptap/vue-3@7565

@tiptap/vue-2

npm i https://pkg.pr.new/@tiptap/vue-2@7565

commit: bcf7ff1

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

claude added 8 commits March 17, 2026 14:14
…ndtrip (#7539)

Decode HTML entities (&lt;, &gt;, &amp;, &quot;) to literal characters
during markdown parsing so the editor displays them correctly, and
re-encode them during serialization so they survive markdown roundtrips.
Code blocks and inline code are excluded from encoding to preserve
literal characters in code contexts.

https://claude.ai/code/session_01BhDQNLqwkb5XMwHqiRA9Mz
…cation

Move decodeHtmlEntities and encodeHtmlEntities from markdown/utils.ts
and extension-text into a shared @tiptap/core utility. The markdown
package re-exports from core for backward compatibility. Also removes
the issue number reference from the test description.

https://claude.ai/code/session_01BhDQNLqwkb5XMwHqiRA9Mz
encodeHtmlEntities now encodes `"` → `&quot;` to match the
`&quot;` → `"` decoding already present in decodeHtmlEntities.
Also adds roundtrip tests for the encode/decode pair.

https://claude.ai/code/session_01BhDQNLqwkb5XMwHqiRA9Mz
Text nodes with a `code` mark should preserve literal characters
like `<`, `>`, and `&` rather than encoding them to HTML entities.
This mirrors the existing code-mark check in renderNodesWithMarkBoundaries.

https://claude.ai/code/session_01BhDQNLqwkb5XMwHqiRA9Mz
Add parse, serialize, and roundtrip tests for &quot; ⇄ " to match
the existing coverage for &lt;, &gt;, and &amp;.

https://claude.ai/code/session_01BhDQNLqwkb5XMwHqiRA9Mz
Double quotes are ordinary markdown characters and don't need entity
encoding. Keep decoding &quot; → " (markdown-it may emit it) but
don't encode " back — this avoids mangling quoted text in serialized
markdown.

https://claude.ai/code/session_01BhDQNLqwkb5XMwHqiRA9Mz
@bdbch bdbch force-pushed the claude/fix-tiptap-issue-7539-NyhyF branch from dd4e4ff to f73da0f Compare March 17, 2026 14:17
@bdbch bdbch requested a review from Copilot March 17, 2026 14:28
Instead of checking `parentNode?.type === 'codeBlock'` and
`mark.type === 'code'`, build a set of code-like extension names
from the `code: true` spec property at registration time. This
respects custom extensions that set `code: true` and won't break
if users rename the built-in code/codeBlock node types.

https://claude.ai/code/session_01BhDQNLqwkb5XMwHqiRA9Mz
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

claude added 2 commits March 17, 2026 14:44
… case

- Fix changeset to clarify that " is decoded but not re-encoded
- Add regression test proving &amp;nbsp; roundtrips correctly and is
  not misinterpreted as an empty paragraph marker

https://claude.ai/code/session_01BhDQNLqwkb5XMwHqiRA9Mz
…ports

- Remove decodeHtmlEntities from extension-text (dead code; MarkdownManager
  already decodes text tokens before the extension handler is reached)
- Drop @tiptap/extension-text from changeset since it has no behavioral change
- Import decode/encode utilities directly from @tiptap/core in MarkdownManager
  instead of re-exporting through markdown/utils.ts
- Replace [...currentMarks.keys()].some() with node.marks check to avoid
  unnecessary array allocation

https://claude.ai/code/session_01BhDQNLqwkb5XMwHqiRA9Mz
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

- Replace "markdown-it" with "the markdown tokenizer" in htmlEntities.ts
  JSDoc since this repo uses marked, not markdown-it.
- Extract isInsideCode detection + entity encoding into a shared private
  `encodeTextForMarkdown` method on MarkdownManager, deduplicating the
  logic between renderNodeToMarkdown and renderNodesWithMarkBoundaries.

https://claude.ai/code/session_01BhDQNLqwkb5XMwHqiRA9Mz
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MarkdownManager not handling escaped HTML characters correctly

3 participants