CLI Reference

Kiri OCR provides a comprehensive Command Line Interface (CLI) accessed via the kiri-ocr command.

`predict`

Run OCR inference on a document image.

kiri-ocr predict [IMAGE_PATH] [OPTIONS]

Arguments:

Argument	Description	Default
`image`	Path to the input image file (Required).	-
`--mode`	Detection mode: `lines` or `words`.	`lines`
`--model`	Path to model file or Hugging Face repo ID.	`mrrtmob/kiri-ocr`
`--padding`	Padding (pixels) around detected text boxes.	`10`
`--output`, `-o`	Directory to save results.	`output`
`--no-render`	Skip generation of visual reports (images/HTML).	`False`
`--device`	Compute device: `cpu` or `cuda`.	`cpu`
`--verbose`, `-v`	Enable verbose logging.	`False`

`train`

Train the Transformer-based recognition model.

kiri-ocr train [OPTIONS]

Key Arguments:

Argument	Description	Default
`--train-labels`	Path to training labels file (image_path \t label).	-
`--val-labels`	Path to validation labels file.	-
`--hf-dataset`	Hugging Face dataset ID (e.g., `mrrtmob/km_en_image_line`).	-
`--output-dir`	Directory to save model checkpoints.	`models`
`--epochs`	Number of training epochs.	`100`
`--batch-size`	Batch size.	`32`
`--height`	Input image height.	`48`
`--width`	Input image width.	`640`
`--lr`	Learning rate.	`0.0003`
`--device`	`cuda` or `cpu`.	`cuda`
`--resume`	Resume from the latest checkpoint in output dir.	`False`

Model Architecture Arguments:

Argument	Description	Default
`--encoder-dim`	Encoder hidden dimension.	`256`
`--encoder-heads`	Encoder attention heads.	`8`
`--encoder-layers`	Number of encoder layers.	`4`
`--encoder-ffn-dim`	Encoder feedforward dimension.	`1024`
`--decoder-dim`	Decoder hidden dimension.	`256`
`--decoder-heads`	Decoder attention heads.	`8`
`--decoder-layers`	Number of decoder layers.	`3`
`--decoder-ffn-dim`	Decoder feedforward dimension.	`1024`
`--dropout`	Dropout rate.	`0.15`

See Training Guide for full details and examples.

`generate`

Generate synthetic training data (images of text lines).

kiri-ocr generate [OPTIONS]

Arguments:

Argument	Description	Default
`--train-file`, `-t`	Input text file (one line per sample).	Required
`--output`, `-o`	Output directory for images/labels.	`data`
`--fonts-dir`	Directory containing `.ttf` fonts.	`fonts`
`--augment`	Number of augmented versions per line.	`1`
`--random-augment`	Apply random noise, rotation, blur.	`False`
`--height`	Output image height.	`32`
`--width`	Output image width.	`512`

`init-config`

Create a default YAML configuration file for training.

kiri-ocr init-config -o config.yaml

`generate-detector`

Generate a dataset for training the text detector (CRAFT).

kiri-ocr generate-detector [OPTIONS]

Arguments:

Argument	Description	Default
`--text-file`	Source text file.	Required
`--fonts-dir`	Directory of fonts.	`fonts`
`--output`	Output directory.	`detector_dataset`
`--num-train`	Number of training images.	`800`
`--num-val`	Number of validation images.	`200`

`train-detector`

Train the text detector model.

kiri-ocr train-detector [OPTIONS]

Arguments:

Argument	Description	Default
`--data-yaml`	Path to dataset YAML config.	`detector_dataset/data.yaml`
`--epochs`	Number of epochs.	`100`
`--batch-size`	Batch size.	`16`
`--image-size`	Image size for training.	`640`

Kiri OCR Home | GitHub Repository | Report Issue

Home
Getting Started
- Installation
- Quick Start
Usage
Training & Data
About
- Architecture

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLI Reference

CLI Reference

`predict`

`train`

`generate`

`init-config`

`generate-detector`

`train-detector`

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally