-
Notifications
You must be signed in to change notification settings - Fork 3
CLI Reference
Tmob edited this page Jan 28, 2026
·
2 revisions
Kiri OCR provides a comprehensive Command Line Interface (CLI) accessed via the kiri-ocr command.
Run OCR inference on a document image.
kiri-ocr predict [IMAGE_PATH] [OPTIONS]Arguments:
| Argument | Description | Default |
|---|---|---|
image |
Path to the input image file (Required). | - |
--mode |
Detection mode: lines or words. |
lines |
--model |
Path to model file or Hugging Face repo ID. | mrrtmob/kiri-ocr |
--padding |
Padding (pixels) around detected text boxes. | 10 |
--output, -o
|
Directory to save results. | output |
--no-render |
Skip generation of visual reports (images/HTML). | False |
--device |
Compute device: cpu or cuda. |
cpu |
--verbose, -v
|
Enable verbose logging. | False |
Train the Transformer-based recognition model.
kiri-ocr train [OPTIONS]Key Arguments:
| Argument | Description | Default |
|---|---|---|
--train-labels |
Path to training labels file (image_path \t label). | - |
--val-labels |
Path to validation labels file. | - |
--hf-dataset |
Hugging Face dataset ID (e.g., mrrtmob/km_en_image_line). |
- |
--output-dir |
Directory to save model checkpoints. | models |
--epochs |
Number of training epochs. | 100 |
--batch-size |
Batch size. | 32 |
--height |
Input image height. | 48 |
--width |
Input image width. | 640 |
--lr |
Learning rate. | 0.0003 |
--device |
cuda or cpu. |
cuda |
--resume |
Resume from the latest checkpoint in output dir. | False |
Model Architecture Arguments:
| Argument | Description | Default |
|---|---|---|
--encoder-dim |
Encoder hidden dimension. | 256 |
--encoder-heads |
Encoder attention heads. | 8 |
--encoder-layers |
Number of encoder layers. | 4 |
--encoder-ffn-dim |
Encoder feedforward dimension. | 1024 |
--decoder-dim |
Decoder hidden dimension. | 256 |
--decoder-heads |
Decoder attention heads. | 8 |
--decoder-layers |
Number of decoder layers. | 3 |
--decoder-ffn-dim |
Decoder feedforward dimension. | 1024 |
--dropout |
Dropout rate. | 0.15 |
See Training Guide for full details and examples.
Generate synthetic training data (images of text lines).
kiri-ocr generate [OPTIONS]Arguments:
| Argument | Description | Default |
|---|---|---|
--train-file, -t
|
Input text file (one line per sample). | Required |
--output, -o
|
Output directory for images/labels. | data |
--fonts-dir |
Directory containing .ttf fonts. |
fonts |
--augment |
Number of augmented versions per line. | 1 |
--random-augment |
Apply random noise, rotation, blur. | False |
--height |
Output image height. | 32 |
--width |
Output image width. | 512 |
Create a default YAML configuration file for training.
kiri-ocr init-config -o config.yamlGenerate a dataset for training the text detector (CRAFT).
kiri-ocr generate-detector [OPTIONS]Arguments:
| Argument | Description | Default |
|---|---|---|
--text-file |
Source text file. | Required |
--fonts-dir |
Directory of fonts. | fonts |
--output |
Output directory. | detector_dataset |
--num-train |
Number of training images. | 800 |
--num-val |
Number of validation images. | 200 |
Train the text detector model.
kiri-ocr train-detector [OPTIONS]Arguments:
| Argument | Description | Default |
|---|---|---|
--data-yaml |
Path to dataset YAML config. | detector_dataset/data.yaml |
--epochs |
Number of epochs. | 100 |
--batch-size |
Batch size. | 16 |
--image-size |
Image size for training. | 640 |
Kiri OCR Home | GitHub Repository | Report Issue
© 2026 Kiri OCR. Released under the Apache 2.0 License.