Skip to content

Latest commit

 

History

History
266 lines (190 loc) · 11.1 KB

File metadata and controls

266 lines (190 loc) · 11.1 KB

OpenOCR: A general OCR system with accuracy and efficiency

⚡[Quick Start] [Model] [ModelScope Demo] [Hugging Face Demo] [Local Demo] [PaddleOCR Implementation]

We proposed strategies to comprehensively enhance CTC-based STR models and developed a novel CTC-based method, SVTRv2. SVTRv2 can outperform previous attention-based STR methods in terms of accuracy while maintaining the advantages of CTC, such as fast inference and robust recognition of long text. These features make SVTRv2 particularly well-suited for practical applications. To this end, building on SVTRv2, we develop a practical version of the model from scratch on publicly available Chinese and English datasets. Combined with a detection model, this forms a general OCR system with accuracy and efficiency, OpenOCR. Comparing with PP-OCRv4 baseline in the OCR competition leaderboard, OpenOCR (mobile) achieve a 4.5% improvement in terms of accuracy, while preserving quite similar inference speed on NVIDIA 1080Ti GPU.

Model Config E2E Metric Downloading
PP-OCRv4 62.77% PaddleOCR Model List
SVTRv2 (Rec Server) configs/rec/svtrv2/svtrv2_ch.yml 68.81% Google Dirve, Github Released
RepSVTR (Mobile) Rec: configs/rec/svtrv2/repsvtr_ch.yml
Det: configs/det/dbnet/repvit_db.yml
67.22% Rec: Google Drive, Github Released
Det: Google Drive, Github Released

Quick Start

Note: OpenOCR supports inference using both the ONNX and Torch frameworks, with the dependency environments for the two frameworks being isolated. When using ONNX for inference, there is no need to install Torch, and vice versa.

Installation

# Install from PyPI (recommended)
pip install openocr-python==0.1.5

# Or install from source
git clone https://github.com/Topdu/OpenOCR.git
cd OpenOCR
python build_package.py
pip install ./build/dist/openocr_python-*.whl

1. Text Detection + Recognition (OCR)

End-to-end OCR for Chinese/English text detection and recognition:

# Basic usage
openocr --task ocr --input_path path/to/img

# With visualization
openocr --task ocr --input_path path/to/img --is_vis

# Process directory with custom output
openocr --task ocr --input_path ./images --output_path ./results --is_vis

# Use server mode (higher accuracy)
pip install torch torchvision
openocr --task ocr --input_path path/to/img --mode server --backend torch

2. Text Detection Only

Detect text regions without recognition:

# Basic detection
openocr --task det --input_path path/to/img

# With visualization
openocr --task det --input_path path/to/img --is_vis

# Use polygon detection (more accurate for curved text)
openocr --task det --input_path path/to/img --det_box_type poly

3. Text Recognition Only

Recognize text from cropped word/line images:

# Basic recognition
openocr --task rec --input_path path/to/img

# Use server mode (higher accuracy)
pip install torch torchvision
openocr --task rec --input_path path/to/img --mode server --backend torch

# Batch processing
openocr --task rec --input_path ./word_images --rec_batch_num 16

Local Demo

Launch Gradio web interface for OCR tasks:

pip install gradio
openocr --task launch_openoce_demo --server_port 7862 --share

Python API Usage

1. OCR Task

from openocr import OpenOCR

# Initialize OCR engine
ocr = OpenOCR(mode='mobile', backend=='onnx')

# Process single image
results, time_dicts = ocr(
    image_path='path/to/image.jpg',
    save_dir='./output',
    is_visualize=True
)

# Access results
for result in results:
    for line in result:
        print(f"Text: {line['text']}, Score: {line['score']}")

2. Detection Task

from openocr import OpenOCR

# Initialize detector
detector = OpenOCR(task='det')

# Detect text regions
results = detector(image_path='path/to/image.jpg')

# Access detection boxes
boxes = results[0]['boxes']
print(f"Found {len(boxes)} text regions")

3. Recognition Task

from openocr import OpenOCR

# Initialize recognizer
recognizer = OpenOCR(task='rec', mode='server', backend='torch') # pip install torch torchvision

# Recognize text
results = recognizer(image_path='path/to/word.jpg')

# Access recognition result
text = results[0]['text']
score = results[0]['score']
print(f"Text: {text}, Confidence: {score}")

Get Started with Source

Dependencies:

  • PyTorch version >= 1.13.0
  • Python version >= 3.7
conda create -n openocr python==3.8
conda activate openocr
# install gpu version torch
conda install pytorch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 pytorch-cuda=11.8 -c pytorch -c nvidia
# or cpu version
conda install pytorch torchvision torchaudio cpuonly -c pytorch

git clone https://github.com/Topdu/OpenOCR.git
cd OpenOCR
pip install -r requirements.txt
wget https://github.com/Topdu/OpenOCR/releases/download/develop0.0.1/openocr_det_repvit_ch.pth
wget https://github.com/Topdu/OpenOCR/releases/download/develop0.0.1/openocr_repsvtr_ch.pth
# Rec Server model
# wget https://github.com/Topdu/OpenOCR/releases/download/develop0.0.1/openocr_svtrv2_ch.pth

Usage:

# OpenOCR system: Det + Rec model
python tools/infer_e2e.py --img_path=/path/img_fold or /path/img_file
# Det model
python tools/infer_det.py --c ./configs/det/dbnet/repvit_db.yml --o Global.infer_img=/path/img_fold or /path/img_file
# Rec model
python tools/infer_rec.py --c ./configs/rec/svtrv2/repsvtr_ch.yml --o Global.infer_img=/path/img_fold or /path/img_file

Fine-tuning on a Custom dataset

Referring to Finetuning Det and Finetuning Rec.

Exporting to ONNX Engine

Export ONNX model

pip install onnx
python tools/toonnx.py --c configs/rec/svtrv2/repsvtr_ch.yml --o Global.device=cpu
python tools/toonnx.py --c configs/det/dbnet/repvit_db.yml --o Global.device=cpu

The det onnx model is saved in ./output/det_repsvtr_db/export_det/det_model.onnx. The rec onnx model is saved in ./output/rec/repsvtr_ch/export_rec/rec_model.onnx.

Inference with ONNXRuntime

pip install onnxruntime
# OpenOCR system: Det + Rec model
python tools/infer_e2e.py --img_path=/path/img_fold or /path/img_file --backend=onnx --device=cpu --onnx_det_model_path=./output/det_repsvtr_db/export_det/det_model.onnx --onnx_rec_model_path=output/rec/repsvtr_ch/export_rec/rec_model.onnx
# Det model
python tools/infer_det.py --c ./configs/det/dbnet/repvit_db.yml --o Global.backend=onnx Global.device=cpu  Global.infer_img=/path/img_fold or /path/img_file Global.onnx_model_path=./output/det_repsvtr_db/export_det/det_model.onnx
# Rec model
python tools/infer_rec.py --c ./configs/rec/svtrv2/repsvtr_ch.yml --o Global.backend=onnx Global.device=cpu Global.infer_img=/path/img_fold or /path/img_file Global.onnx_model_path=./output/rec/repsvtr_ch/export_rec/rec_model.onnx

Results Showcase

Detection results

Recognition results

Det + Rec System results

Detection Model Performance

In the examples provided, OpenOCR's detection model generates bounding boxes that are generally more comprehensive and better aligned with the boundaries of text instances compared to PP-OCRv4. In addition, OpenOCR excels in distinguishing separate text instances, avoiding errors such as merging two distinct text instances into one or splitting a single instance into multiple parts. This indicates superior handling of semantic completeness and spatial understanding, making it particularly effective for complex layouts.

Recognition Model Generalization

OpenOCR's recognition model demonstrates enhanced generalization capabilities when compared to PP-OCRv4. It performs exceptionally well in recognizing text under difficult conditions, such as:

  • Artistic or stylized fonts.
  • Handwritten text.
  • Blurry or low-resolution images.
  • Incomplete or occluded text.

Remarkably, the OpenOCR mobile recognition model delivers results comparable to the larger and more resource-intensive PP-OCRv4 server model. This highlights OpenOCR's efficiency and accuracy, making it a versatile solution across different hardware platforms.

System used in Real-World Scenarios

As shown in Det + Rec System results, OpenOCR demonstrates outstanding performance in practical scenarios, including documents, tables, invoices, and similar contexts. This underscores its potential as a general-purpose OCR system. It is capable of adapting to diverse use cases with high accuracy and reliability.

Citation

If you find our method useful for your reserach, please cite:

@inproceedings{Du2025SVTRv2,
      title={SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition},
      author={Yongkun Du and Zhineng Chen and Hongtao Xie and Caiyan Jia and Yu-Gang Jiang},
      booktitle={ICCV},
      year={2025},
      pages={20147-20156}
}