perf: lazy import pdfminer in pdf.py by ESultanik · Pull Request #3442 · trailofbits/polyfile

ESultanik · 2026-01-20T21:14:23Z

Summary

Defer importing the pdf module (and pdfminer) until a PDF file is actually matched
Uses a lazy parser wrapper (_LazyPDFParser) that registers immediately but only imports pdf.py on first use
The pdfminer library imports cryptography and other heavy modules, adding ~0.5s to import time

Implementation

Add _LazyPDFParser class in __init__.py that wraps the actual PDF parser
Remove @register_parser decorator from pdf.py (registration is now done lazily in __init__.py)
Parser is registered at import time, but the actual pdf module import is deferred

Performance Results

Metric	Before	After	Improvement
Import time	527ms	380ms	28% faster
pdfminer loaded at import	Yes	No	Deferred

Test plan

Run pytest tests/test_magic.py tests/test_pdf.py - same test results as baseline
Verify pdfminer is not loaded after import polyfile
Verify pdfminer is loaded when matching PDF files

🤖 Generated with Claude Code

Defer importing the pdf module (and pdfminer) until a PDF file is actually matched. This is done via a lazy parser wrapper that registers immediately but only imports the actual pdf module on first use. The pdfminer library imports many submodules (cryptography, etc.) which adds ~0.5s to import time. Most files aren't PDFs, so deferring this import improves startup time for the common case. Performance improvement: - pdfminer no longer loaded at import time - Import time reduced by ~28% (measured 527ms → 380ms in cached runs) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

ESultanik force-pushed the perf/lazy-pdfminer-import branch 2 times, most recently from 9a6022b to 795e25a Compare January 20, 2026 22:02

ESultanik force-pushed the perf/lazy-pdfminer-import branch from 795e25a to 8db8471 Compare January 20, 2026 22:15

ESultanik merged commit f8d1e2b into master Jan 20, 2026
10 checks passed

ESultanik deleted the perf/lazy-pdfminer-import branch January 20, 2026 22:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: lazy import pdfminer in pdf.py#3442

perf: lazy import pdfminer in pdf.py#3442
ESultanik merged 1 commit intomasterfrom
perf/lazy-pdfminer-import

ESultanik commented Jan 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ESultanik commented Jan 20, 2026

Summary

Implementation

Performance Results

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants