PageIndex, but the work happens in Postgres.
Parse PDF/Markdown to structure JSON, run tree search and helpers—all from SQL.
Extension v1.0 · schema pageindex · upstream PageIndex
Running in production? Read docs/install.md before you blame the extension for MuPDF path drama.1
CREATE EXTENSION pg_pageindex;
SELECT pageindex.build_pdf('/tmp/paper.pdf');
-- jsonb: document tree / structure for downstream SQLflowchart LR
A[(PDF / MD path)] --> B[pageindex.build_*]
B --> C{{jsonb document}}
C --> D[pageindex.tree_search]
D --> E[(hits / subtree)]
Note
build_* reads files as the database server OS user, not your laptop login. Paths must exist on the server filesystem.
Full write-ups live under docs/ (all lowercase filenames—docs/readme.md is the map):
| docs/install.md | Build, make install, runtime MuPDF, CREATE EXTENSION |
| docs/api.md | Every pageindex.* function, grouped |
| docs/architecture.md | C ↔ Go bridge, .so layout, failure modes |
| docs/development.md | CI, make bridge-bump, local loop |
| docs/contributing.md | PR hygiene, changelog, PGXN |
| docs/security.md | Private disclosure, scope |
| docs/conduct.md | Code of Conduct pointer |
| docs/packaging.md | META.json, releases |
| docs/history.md | Link to CHANGELOG.md |
Too lazy to click—give me the shell bits
git clone https://github.com/neurondb/pg_pageindex.git
cd pg_pageindex
make
sudo make install # or: sudo make install PG_CONFIG=/wherever/pg_configThen CREATE EXTENSION in psql. If the bridge cannot load MuPDF, the backend tends to die loudly—see docs/install.md.
- License: LICENSE
- Changelog: CHANGELOG.md
- Authors: AUTHORS
- CoC: CODE_OF_CONDUCT.md
- Contributing (stub → full doc): CONTRIBUTING.md
- PGXN: META.json · docs/packaging.md
Footnotes
-
The bridge still needs MuPDF shared objects at runtime even though the Go side is built with
-tags=nocgo. Details: docs/install.md. ↩