-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Description:
Implement a utility function to convert article HTML content to Markdown format for RAG processing, preserving links and structure.
Acceptance criteria:
- Function html_to_markdown(html: str) -> str converts HTML to Markdown
- Preserves internal article links (e.g., )
- Preserves external links and basic formatting (bold, italic, lists, paragraphs)
- Strips unnecessary HTML attributes while maintaining semantic structure
- Handles edge cases: empty strings, malformed HTML, special characters
- Unit tests with sample article HTML from lex.dk JSON responses
- Lists
- Tables
- Special characters
- Text boxes
- LaTeX formulae
Technical details:
Use markdownify library for HTML→Markdown conversion
Design:
Optional details on design for context.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels