Skip to content

docs: supplement format_block_content image path behavior explanation#17817

Open
TingquanGao wants to merge 1 commit intoPaddlePaddle:mainfrom
TingquanGao:fix_format_block_content_doc
Open

docs: supplement format_block_content image path behavior explanation#17817
TingquanGao wants to merge 1 commit intoPaddlePaddle:mainfrom
TingquanGao:fix_format_block_content_doc

Conversation

@TingquanGao
Copy link
Collaborator

Summary

Supplement documentation for format_block_content parameter in PaddleOCR-VL and PP-StructureV3 pipeline docs (both Chinese and English) to clarify the image path behavior.

Previously the description only said "controls whether to format block_content as Markdown", without mentioning the key side effect on image-type blocks. This caused confusion as reported in #17143.

Clarification added:

  • When format_block_content=True: image-type block's block_content includes image path info (e.g. <img src="..." />)
  • When format_block_content=False (default): image-type block's block_content only contains OCR-recognized text, no image path
  • To get image paths in JSON output, users should set format_block_content=True

Changes

  • docs/version3.x/pipeline_usage/PaddleOCR-VL.md: Supplement format_block_content description (5 locations)
  • docs/version3.x/pipeline_usage/PaddleOCR-VL.en.md: Supplement format_block_content description (5 locations)
  • docs/version3.x/pipeline_usage/PP-StructureV3.md: Supplement format_block_content description (4 locations)
  • docs/version3.x/pipeline_usage/PP-StructureV3.en.md: Supplement format_block_content description (4 locations)

Test plan

  • Verify JSON output contains image paths when format_block_content=True
  • Verify JSON output does NOT contain image paths when format_block_content=False

Closes #17143

🤖 Generated with Claude Code

Supplement docs for format_block_content parameter in PaddleOCR-VL
and PP-StructureV3 pipelines to clarify that image-type block's
block_content includes image path (e.g. <img src="..." />) only when
format_block_content=True; when False (default) only OCR text is
included without image paths.

Fixes PaddlePaddle#17143

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@paddle-bot
Copy link

paddle-bot bot commented Mar 16, 2026

Thanks for your contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PaddleOCR-VL解析json数据中图片block没有图片地址

1 participant