Skip to content

AttributeError when loading OCR model - kiri-ocr version incompatibility #9

@mrrtmob

Description

@mrrtmob
AttributeError: 'dict' object has no attribute 'UNK_TOKEN'

Error Details

Full Traceback:

Traceback (most recent call last):
  File "/home/user/app/app.py", line 49, in process_image
    ocr = load_ocr(device="cuda")
  File "/home/user/app/app.py", line 20, in load_ocr
    return OCR(
  File "/usr/local/lib/python3.10/site-packages/kiri_ocr/core.py", line 77, in __init__
    self._load_model(model_path, charset_path)
  File "/usr/local/lib/python3.10/site-packages/kiri_ocr/core.py", line 162, in _load_model
    self._load_transformer_model(checkpoint, model_path)
  File "/usr/local/lib/python3.10/site-packages/kiri_ocr/core.py", line 222, in _load_transformer_model
    self.transformer_tok = CharTokenizer(vocab_path, self.transformer_cfg)
  File "/usr/local/lib/python3.10/site-packages/kiri_ocr/model_transformer.py", line 76, in __init__
    if cfg.UNK_TOKEN not in vocab_raw:
AttributeError: 'dict' object has no attribute 'UNK_TOKEN'

Root Cause

The updated model on Hugging Face uses a newer configuration format that is incompatible with older versions of kiri-ocr. The CharTokenizer class expects cfg to have an UNK_TOKEN attribute, but receives a dictionary instead.

Solution

Upgrade kiri-ocr to version >= 0.2.0 which supports the updated model configuration format.

Steps to Reproduce

  1. Use the updated model from Hugging Face
  2. Attempt to load OCR with load_ocr(device="cuda")
  3. Error occurs during CharTokenizer initialization

Fix

Update requirements.txt or your dependency file:

- kiri-ocr==0.1.x
+ kiri-ocr>=0.2.0

Then reinstall dependencies:

pip install --upgrade kiri-ocr

Environment

  • Python: 3.10
  • Current kiri-ocr version: < 0.2.0
  • Required kiri-ocr version: >= 0.2.0

Additional Notes

This breaking change was introduced due to model updates on Hugging Face. All users should upgrade to ensure compatibility with the latest models.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions