This parser converts PBZ statements to CSV.
- Python 3.9+ (stdlib only)
pdftotextfrom Poppler
Install Poppler:
- macOS:
brew install poppler - Ubuntu/Debian:
sudo apt-get install poppler-utils
python3 -m venv .venv
source .venv/bin/activatepython parser.py data/pdfs/*.pdf \
--out out/pbz_statements.csv \
--validation-out out/pbz_validation.csvNotes:
- Output includes
raw_text_blockfor audit andtx_fingerprintfor idempotent imports. - Dedupe by
tx_fingerprintis on by default. Use--no-dedupeto keep all rows. raw_text_blockcontains embedded newlines and will be CSV-quoted.
Requires Bun 1.0+ and pdftotext (Poppler).
bun parser.js data/pdfs/*.pdf \
--out out/pbz_statements_js.csv \
--validation-out out/pbz_validation_js.csvRun tests (compares JS output to the sample CSVs in out/):
bun testInstall dependencies:
bun installBuild the static app:
bun run buildThe output is written to dist/. Host that folder on any static provider.
Notes:
- The browser app uses PDF.js to rebuild a fixed-width layout before parsing.
- If columns drift, tune the line grouping and x-position tolerance in
src/app.js. - UI strings live in
src/strings.json(Croatian by default).