Skip to content

CLI Reference

Tmob edited this page Jan 28, 2026 · 2 revisions

CLI Reference

Kiri OCR provides a comprehensive Command Line Interface (CLI) accessed via the kiri-ocr command.

predict

Run OCR inference on a document image.

kiri-ocr predict [IMAGE_PATH] [OPTIONS]

Arguments:

Argument Description Default
image Path to the input image file (Required). -
--mode Detection mode: lines or words. lines
--model Path to model file or Hugging Face repo ID. mrrtmob/kiri-ocr
--padding Padding (pixels) around detected text boxes. 10
--output, -o Directory to save results. output
--no-render Skip generation of visual reports (images/HTML). False
--device Compute device: cpu or cuda. cpu
--verbose, -v Enable verbose logging. False

train

Train the Transformer-based recognition model.

kiri-ocr train [OPTIONS]

Key Arguments:

Argument Description Default
--train-labels Path to training labels file (image_path \t label). -
--val-labels Path to validation labels file. -
--hf-dataset Hugging Face dataset ID (e.g., mrrtmob/km_en_image_line). -
--output-dir Directory to save model checkpoints. models
--epochs Number of training epochs. 100
--batch-size Batch size. 32
--height Input image height. 48
--width Input image width. 640
--lr Learning rate. 0.0003
--device cuda or cpu. cuda
--resume Resume from the latest checkpoint in output dir. False

Model Architecture Arguments:

Argument Description Default
--encoder-dim Encoder hidden dimension. 256
--encoder-heads Encoder attention heads. 8
--encoder-layers Number of encoder layers. 4
--encoder-ffn-dim Encoder feedforward dimension. 1024
--decoder-dim Decoder hidden dimension. 256
--decoder-heads Decoder attention heads. 8
--decoder-layers Number of decoder layers. 3
--decoder-ffn-dim Decoder feedforward dimension. 1024
--dropout Dropout rate. 0.15

See Training Guide for full details and examples.


generate

Generate synthetic training data (images of text lines).

kiri-ocr generate [OPTIONS]

Arguments:

Argument Description Default
--train-file, -t Input text file (one line per sample). Required
--output, -o Output directory for images/labels. data
--fonts-dir Directory containing .ttf fonts. fonts
--augment Number of augmented versions per line. 1
--random-augment Apply random noise, rotation, blur. False
--height Output image height. 32
--width Output image width. 512

init-config

Create a default YAML configuration file for training.

kiri-ocr init-config -o config.yaml

generate-detector

Generate a dataset for training the text detector (CRAFT).

kiri-ocr generate-detector [OPTIONS]

Arguments:

Argument Description Default
--text-file Source text file. Required
--fonts-dir Directory of fonts. fonts
--output Output directory. detector_dataset
--num-train Number of training images. 800
--num-val Number of validation images. 200

train-detector

Train the text detector model.

kiri-ocr train-detector [OPTIONS]

Arguments:

Argument Description Default
--data-yaml Path to dataset YAML config. detector_dataset/data.yaml
--epochs Number of epochs. 100
--batch-size Batch size. 16
--image-size Image size for training. 640

Clone this wiki locally