pdf2markdown

Here are 9 public repositories matching this topic...

PaddlePaddle / PaddleOCR

Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.

ocr pdf-parser kie document-translation rag chineseocr ai4science pp-ocr document-parsing pp-structure pdf-extractor-rag pdf2markdown paddleocr-vl

Updated Feb 26, 2026
Python

PaddlePaddle / PaddleX

Star

All-in-One Development Tool based on PaddlePaddle

ocr time-series deployment speech-recognition classification segmentation object-detection ai-pipelines layout-detection formula-recognition pp-chatocr pdf2markdown

Updated Feb 26, 2026
Python

MarkPDFdown / markpdfdown

Star

A high-quality PDF to Markdown tool based on large language model visual recognition. 一款基于大模型视觉识别的高质量PDF转Markdown工具

markdown pdf pdf-converter llm pdf2md pdf2markdown pdf-markdown

Updated Jan 25, 2026
Python

AdemBoukhris457 / Doctra

Star

📄🔍 Parse, extract, and analyze documents with ease 📄🔍

python ocr ai gemini openai extract-data document-analysis image-restoration vlm pdf-parser pdf2markdown documentparsing

Updated Nov 29, 2025
Jupyter Notebook

OpenDCAI / Flash-MinerU

Star

Ray-based accelerator for MinerU VLM inference pipeline. Lightweight, multi-GPU friendly PDF → Markdown processing. 基于 Ray 的 MinerU VLM 推理加速器，轻量、低侵入，面向多 GPU / 国产算力环境的 PDF → Markdown 处理方案。

pdf parallel-computing distributed-computing ray multi-gpu pdf-parsing document-ai llm-inference mineru pdf2markdown