Skip to content

OneNote: Fix nested list to-do items being corrupted on import#543

Open
jamescqcampbell wants to merge 3 commits intoobsidianmd:masterfrom
jamescqcampbell:master
Open

OneNote: Fix nested list to-do items being corrupted on import#543
jamescqcampbell wants to merge 3 commits intoobsidianmd:masterfrom
jamescqcampbell:master

Conversation

@jamescqcampbell
Copy link
Copy Markdown

Fixes two related bugs causing nested list to-do items to be corrupted on import.

My issue

I used the importer to import my notebook containing multiple nested ordered lists of tasks. Rather than being flattened out (as I know OneNote can have issues with this data not being present in the html output), they were not present in the output at all.

From my test file:
Screenshot 2026-04-21 at 22 00 58

The imported obsidian file looked like:
Screenshot 2026-04-21 at 22 02 20

This PR fixes this issue in the first commit, adds tests for the commit in the second commit and adds regression testing against a comprehensive test file in the third commit.

Please consider merging the first commit and fixing my problem. I am more than happy to refactor as necessary.

I couldn't see any testing and wrote the tests to confirm my own problem was solved and didn't impact the existing functionality. They are not essential for this PR though and can be ignored if you do not want to add testing to this repo.

Problems

convertTags handled data-tag="to-do" by injecting literal - [ ] into
the element's innerHTML. Turndown then escaped those characters, producing
\- \[ \] Task text instead of a task-list item. This affected flat
<p data-tag="to-do"> paragraphs and to-do items inside nested <ol>/<ul>
lists, where the result was 1. \- \[ \] Outer instead of - [ ].

A secondary bug: the injection broke removeExtraListItemParagraphs
(introduced in #363), which relies on li > p:first-child — after injection
the paragraph was no longer the first child, causing extra blank lines between
nested list levels.

Fix

convertTags: for to-do elements inside a <li>, insert
<input type="checkbox"> as the first child of the <li> so the GFM
turndown rule produces - [ ] / - [x] natively. For top-level
<p data-tag="to-do"> elements (which the OneNote Graph API always emits
flat regardless of visual indentation), replace the paragraph with
<ul><li><input type="checkbox">…</li></ul>.

removeExtraListItemParagraphs: selector extended from li > p:first-child
to also match li > input:first-child + p.

Testing

Commits 2 and 3 add vitest + jsdom. DOM-transformation functions are extracted
into onenote-converter.ts for testability; onenote.ts is refactored to
delegate its five pipeline methods to onenote-converter.ts so that the
tested code and production code stay in sync.

  • Commit 2: 14 tests against onenote-indented-example.html (fixture from
    the author's personal OneNote, 2026-04-18), covering all five affected cases:
    indented numbered lists, indented unordered lists, numbered to-do lists,
    unordered to-do lists, and flat to-do paragraphs. All pass.
  • Commit 3: 25 regression tests against onenote.html, covering all
    element types handled before this change. All pass. Not covered: convertMathML, escapeTextNodes, and the MathML error-fallback path. Statement coverage: 87%.

Source OneNote page for onenote-indented-example.html as it renders in OneNote.
Screenshot 2026-04-21 at 22 06 45

Source OneNote page for onenote.html as it renders in OneNote. NB, I don't want to leak my OneNote userId so I'd prefer to avoid sharing a PDF.
Screenshot 2026-04-21 at 22 09 22

convertTags was injecting literal "- [ ] " text into element innerHTML
for data-tag="to-do" elements. turndown then escaped the brackets,
producing "\- \[ \] " instead of a real task-list item.

For to-do elements inside a <li>, insert <input type="checkbox"> as the
first child of the <li> so the GFM turndown rule emits "- [ ] " /
"- [x] " correctly. The surrounding <ol>/<ul> structure already encodes
the nesting depth, so indentation is preserved for free.

For top-level <p data-tag="to-do"> elements, replace the paragraph with
<ul><li><input type="checkbox">…</li></ul>. The OneNote Graph API does
not encode visual indentation for these elements — they appear as flat
<p> tags regardless of how they look in the UI.

removeExtraListItemParagraphs extended from `li > p:first-child` to
also match `li > input:first-child + p` so the marginless paragraph
wrapper is still removed after convertTags inserts a checkbox before it.
Set up vitest + jsdom as the test framework (no existing test runner in
this project). The DOM-transformation functions used in processFile are
extracted into onenote-converter.ts so they can be exercised without the
full Obsidian runtime; onenote.ts retains its own inline implementations
and is unchanged by this commit.

Tests cover the five cases in onenote-indented-example.html:
  1. Indented numbered list — inner items indented at correct depth
  2. Indented unordered list — inner items indented at correct depth
  3. Indented numbered todos — "[ ]" at each nesting level, correctly indented
  4. Indented unordered todos — "[ ]" at each nesting level, correctly indented
  5. Flat todos — four <p data-tag="to-do"> elements each become a top-level
     "- [ ]" task-list item with no indentation (the API provides none)
Covers every element type in onenote.html: plain paragraphs,
bold/italic/strikethrough, highlights, task-list items (checked and
unchecked), unordered and ordered lists, fenced code blocks, inline
code, citations, H1–H3 headings, internal and external links, GFM
tables, and note tags.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant