-
Notifications
You must be signed in to change notification settings - Fork 125
CUDA OOM with small PDF file #50
Copy link
Copy link
Open
Description
Question
I try to convert a small PDF file (vietnamese) to markdown. It works perfectly on the web version but fails when I switch to use local install with GPU (on Google Colab).
Environment
- docstrange version: 1.1.8
- GPU: Tesla T4 / 14.74 GiB VRAM
- CUDA version: 12.6
- PyTorch version: 2.9.0+cu126
Description
Getting CUDA out of memory error when processing a small PDF file (801KB, 17 pages).
Error Message
ERROR:docstrange.pipeline.nanonets_processor:Nanonets OCR extraction failed:
CUDA out of memory. Tried to allocate 122.64 GiB. GPU 0 has a total capacity
of 14.74 GiB of which 8.54 GiB is free.
PDF Details
- File size: 801 KB
- Pages: 17
- Page dimensions: 1675 x 2353 pixels
- Type: Scanned images
Code to Reproduce
!apt-get install poppler-utils
!pip install docstrange -q
from docstrange import DocumentExtractor
extractor = DocumentExtractor(gpu=True)
result = extractor.extract("sample.pdf")
markdown = result.extract_markdown()Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels