Skip to content

[Prod DB Integration] HTML to Markdown Conversion Utility #68

@Enniwhere

Description

@Enniwhere

Description:
Implement a utility function to convert article HTML content to Markdown format for RAG processing, preserving links and structure.

Acceptance criteria:

  • Function html_to_markdown(html: str) -> str converts HTML to Markdown
  • Preserves internal article links (e.g., )
  • Preserves external links and basic formatting (bold, italic, lists, paragraphs)
  • Strips unnecessary HTML attributes while maintaining semantic structure
  • Handles edge cases: empty strings, malformed HTML, special characters
  • Unit tests with sample article HTML from lex.dk JSON responses
  • Lists
  • Tables
  • Special characters
  • Text boxes
  • LaTeX formulae

Technical details:
Use markdownify library for HTML→Markdown conversion

Design:
Optional details on design for context.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions