Skip to content

bitfalls/pbzsucks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PBZ PDF to CSV parser

This parser converts PBZ statements to CSV.

Requirements

  • Python 3.9+ (stdlib only)
  • pdftotext from Poppler

Install Poppler:

  • macOS: brew install poppler
  • Ubuntu/Debian: sudo apt-get install poppler-utils

Python setup

python3 -m venv .venv
source .venv/bin/activate

Usage

python parser.py data/pdfs/*.pdf \
  --out out/pbz_statements.csv \
  --validation-out out/pbz_validation.csv

Notes:

  • Output includes raw_text_block for audit and tx_fingerprint for idempotent imports.
  • Dedupe by tx_fingerprint is on by default. Use --no-dedupe to keep all rows.
  • raw_text_block contains embedded newlines and will be CSV-quoted.

JavaScript parser (Bun)

Requires Bun 1.0+ and pdftotext (Poppler).

bun parser.js data/pdfs/*.pdf \
  --out out/pbz_statements_js.csv \
  --validation-out out/pbz_validation_js.csv

Run tests (compares JS output to the sample CSVs in out/):

bun test

Static app (Bun build)

Install dependencies:

bun install

Build the static app:

bun run build

The output is written to dist/. Host that folder on any static provider.

Notes:

  • The browser app uses PDF.js to rebuild a fixed-width layout before parsing.
  • If columns drift, tune the line grouping and x-position tolerance in src/app.js.
  • UI strings live in src/strings.json (Croatian by default).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors