[API] Add DeepOCR pipeline API provider#1473
[API] Add DeepOCR pipeline API provider#1473leejooan wants to merge 2 commits intoopen-compass:mainfrom
Conversation
Adds `DeepOCRAPI`, an OpenAI-compatible wrapper for the DeepOCR pipeline. Credentials are configured via environment variables `DEEPOCR_API_BASE` and `DEEPOCR_API_KEY`. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Hi, I’ve emailed the OpenCompass team (opencompass@pjlab.org.cn) with the environment If you need any additional setup or have questions, please feel free to reach out We look forward to your review. |
vlmeval/api/deepocr_api.py
Outdated
|
|
||
| def __init__( | ||
| self, | ||
| model: str = "gpt-4-1106-vision-preview", |
There was a problem hiding this comment.
The model name is 'gpt-4-1106-vision-preview'? better to use your own name, becasue it's related to the result file name.
There was a problem hiding this comment.
The model name is 'gpt-4-1106-vision-preview'? better to use your own name, becasue it's related to the result file name.
Thanks for the feedback! I've updated the default model name from
"gpt-4-1106-vision-preview" to "deepocr" in the latest commit.
The previous name was only a placeholder to indicate OpenAI-compatible
format support. Using "deepocr" is more appropriate as the actual
model identifier and will align with the generated result/output names.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
Add support for DeepOCR pipeline so VLMEvalKit can run evaluations
using DeepOCR's document processing pipeline via an OpenAI-compatible
chat completions API.
The DeepOCR pipeline combines deep document OCR with a large
vision-language model, making it especially strong on document
understanding and text recognition tasks.
OCRBench result: 91.7 / 100
Changes
vlmeval/api/deepocr_api.py: NewDeepOCRAPIclass.Uses
DEEPOCR_API_BASEwith BearerDEEPOCR_API_KEY.vlmeval/api/__init__.py: ExportDeepOCRAPI.vlmeval/config.py: New model entry:DEEPOCR