UN-3639 [FEAT] Add VLM/multimodal LLM support for per-prompt vision modes#2131
UN-3639 [FEAT] Add VLM/multimodal LLM support for per-prompt vision modes#2131harini-venkataraman wants to merge 5 commits into
Conversation
…odes Add opt-in vision capability to the answer-step LLM context, enabling VLM/multimodal models to receive page images alongside extracted text. SDK (Phase 1 + 3): - Rasteriser module: PDF pages to preprocessed PNG bytes via pypdfium2 - complete_vision() method with full complete() parity (retry, extract_json, usage recording, error wrapping) - Vision message builder: OpenAI-style multimodal messages with mode-aware ordering (spatial_helper vs source_of_truth) - VisionMode/ExtractionInputs constants and derive_vision_mode() pure function Backend (Phase 2 + 5 + 6): - Prompt model: extraction_inputs and source_of_truth fields with migration - Profile model: nullable vector_store/embedding/x2text FKs with app-layer validation and migration - Registry export: vision fields in per-prompt output dicts, single-pass export-time override to force text mode - Cache key discriminator: vision signature prevents cache collisions between text-only and vision runs Workers (Phase 4 + 8): - Answer path branching: vision mode derivation, rasterisation, and complete_vision() call in both executor and prompt-service paths - Structure tool: extraction skip for all-image-only prompts, source file path passthrough for rasterisation Frontend (Phase 7): - PromptCardItems: dynamic import hook for VisionModeSelector plugin Tests (Phase 10): - SDK: 46 tests (mode derivation, message builder, rasteriser, complete_vision) - Workers: 12 tests (structure tool vision skip, single-pass vision guard) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary by CodeRabbit
WalkthroughAdds end-to-end vision mode support for Prompt Studio. Per-prompt ChangesVision Mode Feature
Sequence Diagram(s)sequenceDiagram
participant StructureToolTask
participant LegacyExecutor
participant AnswerPromptService
participant SDKRasteriser
participant SDKVision
participant LLM
StructureToolTask->>LegacyExecutor: answer_params with source_file_path, extraction_inputs, source_of_truth
LegacyExecutor->>AnswerPromptService: construct_and_run_prompt(source_file_path, extraction_inputs, source_of_truth)
AnswerPromptService->>AnswerPromptService: derive_vision_mode(extraction_inputs, source_of_truth)
alt vision_mode != TEXT_ONLY
AnswerPromptService->>SDKRasteriser: rasterise_pages(pdf_bytes)
SDKRasteriser-->>AnswerPromptService: list[(page, png_bytes)]
AnswerPromptService->>SDKVision: build_vision_messages(text_context, page_images, prompt, mode)
SDKVision-->>AnswerPromptService: messages
AnswerPromptService->>LLM: complete_vision(messages, extract_json=True)
LLM-->>AnswerPromptService: LLMResponseCompat
else TEXT_ONLY
AnswerPromptService->>LLM: complete(prompt, ...)
LLM-->>AnswerPromptService: LLMResponseCompat
end
AnswerPromptService-->>LegacyExecutor: response text
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
for more information, see https://pre-commit.ci
Frontend Lint Report (Biome)✅ All checks passed! No linting or formatting issues found. |
|
Unstract test resultsPer-group results
Critical paths
|
|
| Filename | Overview |
|---|---|
| backend/prompt_studio/prompt_profile_manager_v2/serializers.py | Adds validate() to enforce extraction adapters when prompts need text; validation logic has a gap when attrs explicitly sends null for an adapter field during PATCH/PUT |
| workers/executor/executors/answer_prompt.py | Branches run_completion() into vision path; adds run_vision_completion() that reads PDF bytes, rasterises, builds messages, and calls complete_vision() |
| backend/workflow_manager/endpoint_v2/source.py | Injects vision signature into file content hash for cache discrimination; vision signature not applied to hash in the add_input_from_api_storage_to_volume fallback path |
| unstract/sdk1/src/unstract/sdk1/rasteriser.py | New module rasterising PDF pages to PNG bytes via pypdfium2+Pillow; duplicates MAX_VISION_PAGES definition from constants.py |
| unstract/sdk1/src/unstract/sdk1/llm.py | complete_vision() now has full parity with complete(): retry logic, extract_json, process_text callback, usage recording |
| workers/file_processing/structure_tool_task.py | Adds skip-extraction logic for image-only vision prompts; source_file_path hardcoded to SOURCE which matches source connector convention |
| backend/prompt_studio/prompt_studio_registry_v2/prompt_studio_registry_helper.py | Carries vision fields in per-prompt export; forces text_only for single-pass extraction mode with appropriate warning |
Sequence Diagram
%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
participant UI as Prompt Studio UI
participant BE as Backend (PromptStudioHelper)
participant W as Worker (LegacyExecutor)
participant AP as AnswerPromptService
participant R as rasteriser
participant V as vision (message builder)
participant LLM as LLM.complete_vision()
UI->>BE: Run prompt (extraction_inputs, source_of_truth)
BE->>BE: Build output dict incl. vision fields + source_file_path
BE->>W: Dispatch answer params via Celery
W->>AP: "run_completion(output, source_file_path=...)"
AP->>AP: derive_vision_mode(extraction_inputs, source_of_truth)
alt "vision_mode == text_only"
AP->>AP: run_completion() → LLM.complete()
else spatial_helper or source_of_truth
AP->>AP: run_vision_completion()
AP->>R: rasterise_pages(file_bytes, settings)
R-->>AP: [(page_num, png_bytes), ...]
AP->>V: build_vision_messages(system, text_ctx, images, prompt, mode)
V-->>AP: OpenAI-style multimodal messages
AP->>LLM: "complete_vision(messages, extract_json=...)"
LLM-->>AP: "{response: LLMResponseCompat, ...}"
AP-->>W: answer string
end
W-->>BE: answer
BE-->>UI: structured output
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
participant UI as Prompt Studio UI
participant BE as Backend (PromptStudioHelper)
participant W as Worker (LegacyExecutor)
participant AP as AnswerPromptService
participant R as rasteriser
participant V as vision (message builder)
participant LLM as LLM.complete_vision()
UI->>BE: Run prompt (extraction_inputs, source_of_truth)
BE->>BE: Build output dict incl. vision fields + source_file_path
BE->>W: Dispatch answer params via Celery
W->>AP: "run_completion(output, source_file_path=...)"
AP->>AP: derive_vision_mode(extraction_inputs, source_of_truth)
alt "vision_mode == text_only"
AP->>AP: run_completion() → LLM.complete()
else spatial_helper or source_of_truth
AP->>AP: run_vision_completion()
AP->>R: rasterise_pages(file_bytes, settings)
R-->>AP: [(page_num, png_bytes), ...]
AP->>V: build_vision_messages(system, text_ctx, images, prompt, mode)
V-->>AP: OpenAI-style multimodal messages
AP->>LLM: "complete_vision(messages, extract_json=...)"
LLM-->>AP: "{response: LLMResponseCompat, ...}"
AP-->>W: answer string
end
W-->>BE: answer
BE-->>UI: structured output
Comments Outside Diff (1)
-
backend/workflow_manager/endpoint_v2/source.py, line 1102-1123 (link)Vision signature not applied on the API-storage fallback path:
add_input_from_api_storage_to_volumereturns the raw content hash from_copy_file_to_destinationwithout mixing inself._vision_signature. The caller at line 1210–1212 overrides this withfile_hash.file_hashwhen it is truthy (the hash computed incopy_from_api_source, which does include the signature), but falls back to the unsigned hash whenfile_hash.file_hashis falsy. A workflow whose file arrives through this edge case would produce a cache key that doesn't discriminate between text-only and vision runs of the same file.Prompt To Fix With AI
This is a comment left during a code review. Path: backend/workflow_manager/endpoint_v2/source.py Line: 1102-1123 Comment: Vision signature not applied on the API-storage fallback path: `add_input_from_api_storage_to_volume` returns the raw content hash from `_copy_file_to_destination` without mixing in `self._vision_signature`. The caller at line 1210–1212 overrides this with `file_hash.file_hash` when it is truthy (the hash computed in `copy_from_api_source`, which does include the signature), but falls back to the unsigned hash when `file_hash.file_hash` is falsy. A workflow whose file arrives through this edge case would produce a cache key that doesn't discriminate between text-only and vision runs of the same file. How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
Fix the following 3 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 3
backend/prompt_studio/prompt_profile_manager_v2/serializers.py:50-56
Validation gap on explicit null in PATCH/PUT: when a client sends `{"vector_store": null}` the check `not attrs.get(field)` evaluates True (None is falsy), but `not getattr(instance, "vector_store_id", None)` evaluates False (the old FK is still on the instance), so the field is never added to `missing` and the null is saved silently — even when linked prompts still need text extraction. The guard needs to treat an inbound null as a removal, not as "not provided".
```suggestion
if needs_text:
missing = [
field
for field in _TEXT_EXTRACTION_FIELDS
if not (
# Use incoming value when explicitly provided, else fall
# back to whatever is already set on the instance.
attrs.get(field)
or (
field not in attrs
and instance is not None
and getattr(instance, f"{field}_id", None)
)
)
]
```
### Issue 2 of 3
unstract/sdk1/src/unstract/sdk1/rasteriser.py:17-22
`MAX_VISION_PAGES` is read from the environment a second time here, duplicating the definition already published in `constants.py`. The default parameter `max_pages: int = MAX_VISION_PAGES` bakes in the module-level value at import time; if these two definitions ever diverge (e.g. different defaults), callers relying on the constant from `constants.py` would get a different cap than the rasteriser's own default.
```suggestion
import pypdfium2 as pdfium
from PIL import Image, ImageFilter
from unstract.sdk1.constants import MAX_VISION_PAGES
logger = logging.getLogger(__name__)
```
### Issue 3 of 3
backend/workflow_manager/endpoint_v2/source.py:1102-1123
Vision signature not applied on the API-storage fallback path: `add_input_from_api_storage_to_volume` returns the raw content hash from `_copy_file_to_destination` without mixing in `self._vision_signature`. The caller at line 1210–1212 overrides this with `file_hash.file_hash` when it is truthy (the hash computed in `copy_from_api_source`, which does include the signature), but falls back to the unsigned hash when `file_hash.file_hash` is falsy. A workflow whose file arrives through this edge case would produce a cache key that doesn't discriminate between text-only and vision runs of the same file.
Reviews (1): Last reviewed commit: "Commit uv.lock changes" | Re-trigger Greptile
| if needs_text: | ||
| missing = [ | ||
| field | ||
| for field in _TEXT_EXTRACTION_FIELDS | ||
| if not attrs.get(field) | ||
| and (instance is None or not getattr(instance, f"{field}_id", None)) | ||
| ] |
There was a problem hiding this comment.
Validation gap on explicit null in PATCH/PUT: when a client sends
{"vector_store": null} the check not attrs.get(field) evaluates True (None is falsy), but not getattr(instance, "vector_store_id", None) evaluates False (the old FK is still on the instance), so the field is never added to missing and the null is saved silently — even when linked prompts still need text extraction. The guard needs to treat an inbound null as a removal, not as "not provided".
| if needs_text: | |
| missing = [ | |
| field | |
| for field in _TEXT_EXTRACTION_FIELDS | |
| if not attrs.get(field) | |
| and (instance is None or not getattr(instance, f"{field}_id", None)) | |
| ] | |
| if needs_text: | |
| missing = [ | |
| field | |
| for field in _TEXT_EXTRACTION_FIELDS | |
| if not ( | |
| # Use incoming value when explicitly provided, else fall | |
| # back to whatever is already set on the instance. | |
| attrs.get(field) | |
| or ( | |
| field not in attrs | |
| and instance is not None | |
| and getattr(instance, f"{field}_id", None) | |
| ) | |
| ) | |
| ] |
Prompt To Fix With AI
This is a comment left during a code review.
Path: backend/prompt_studio/prompt_profile_manager_v2/serializers.py
Line: 50-56
Comment:
Validation gap on explicit null in PATCH/PUT: when a client sends `{"vector_store": null}` the check `not attrs.get(field)` evaluates True (None is falsy), but `not getattr(instance, "vector_store_id", None)` evaluates False (the old FK is still on the instance), so the field is never added to `missing` and the null is saved silently — even when linked prompts still need text extraction. The guard needs to treat an inbound null as a removal, not as "not provided".
```suggestion
if needs_text:
missing = [
field
for field in _TEXT_EXTRACTION_FIELDS
if not (
# Use incoming value when explicitly provided, else fall
# back to whatever is already set on the instance.
attrs.get(field)
or (
field not in attrs
and instance is not None
and getattr(instance, f"{field}_id", None)
)
)
]
```
How can I resolve this? If you propose a fix, please make it concise.| import pypdfium2 as pdfium | ||
| from PIL import Image, ImageFilter | ||
|
|
||
| logger = logging.getLogger(__name__) | ||
|
|
||
| MAX_VISION_PAGES = int(os.environ.get("MAX_VISION_PAGES", "10")) |
There was a problem hiding this comment.
MAX_VISION_PAGES is read from the environment a second time here, duplicating the definition already published in constants.py. The default parameter max_pages: int = MAX_VISION_PAGES bakes in the module-level value at import time; if these two definitions ever diverge (e.g. different defaults), callers relying on the constant from constants.py would get a different cap than the rasteriser's own default.
| import pypdfium2 as pdfium | |
| from PIL import Image, ImageFilter | |
| logger = logging.getLogger(__name__) | |
| MAX_VISION_PAGES = int(os.environ.get("MAX_VISION_PAGES", "10")) | |
| import pypdfium2 as pdfium | |
| from PIL import Image, ImageFilter | |
| from unstract.sdk1.constants import MAX_VISION_PAGES | |
| logger = logging.getLogger(__name__) |
Prompt To Fix With AI
This is a comment left during a code review.
Path: unstract/sdk1/src/unstract/sdk1/rasteriser.py
Line: 17-22
Comment:
`MAX_VISION_PAGES` is read from the environment a second time here, duplicating the definition already published in `constants.py`. The default parameter `max_pages: int = MAX_VISION_PAGES` bakes in the module-level value at import time; if these two definitions ever diverge (e.g. different defaults), callers relying on the constant from `constants.py` would get a different cap than the rasteriser's own default.
```suggestion
import pypdfium2 as pdfium
from PIL import Image, ImageFilter
from unstract.sdk1.constants import MAX_VISION_PAGES
logger = logging.getLogger(__name__)
```
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (1)
workers/executor/executors/answer_prompt.py (1)
433-433: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valuePrefer
logger.exceptionin these except blocks.Both handlers already sit inside
exceptclauses with an active exception;logger.exception(...)captures the traceback, which is valuable for diagnosing rasterisation/VLM failures. Optionally also apply{e!s}at Line 429 to satisfy RUF010.Also applies to: 441-441
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@workers/executor/executors/answer_prompt.py` at line 433, The error handling in the answer prompt exception blocks currently uses logger.error inside active except clauses, so update the relevant handlers in answer_prompt.py to use logger.exception instead to automatically include the traceback for rasterisation/VLM failures. Also, if the message at the nearby error formatting site uses a raw exception object, switch it to explicit string coercion with e!s to satisfy the lint rule. Keep the change localized to the exception handlers around the existing logger calls.Source: Linters/SAST tools
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@backend/prompt_studio/prompt_studio_core_v2/prompt_studio_helper.py`:
- Around line 2016-2017: In `_fetch_response`, `SOURCE_FILE_PATH` is being set
from the mutated `doc_path` instead of the original source document path. Update
the payload construction to use `file_path` for `TSPKeys.SOURCE_FILE_PATH`
(while keeping `TSPKeys.FILE_PATH` as-is), matching the behavior in
`build_fetch_response_payload`, `build_bulk_fetch_response_payload`, and
`build_single_pass_payload` so `run_vision_completion` can rasterize the
original PDF correctly.
In `@unstract/sdk1/src/unstract/sdk1/rasteriser.py`:
- Around line 25-35: Reject invalid render settings at construction time by
validating RenderSettings so dpi and max_dimension must both be positive,
instead of letting pdfium/Pillow fail later. Add the checks in the
RenderSettings dataclass lifecycle (or an equivalent constructor hook) and
ensure any callers in the rendering flow, including the rasteriser code that
uses RenderSettings.scale and downstream size calculations, surface a clear
validation error before rendering starts.
- Line 22: Validate the MAX_VISION_PAGES setting defensively in rasteriser.py
instead of converting it directly at import time; a bad or non-numeric env value
should not crash module import. Update the MAX_VISION_PAGES initialization to
parse the env var safely, fall back to a positive default on parse failure, and
clamp zero/negative values to that fallback. Apply the same positive-default
handling anywhere MAX_VISION_PAGES is consumed in the rasterisation/truncation
logic so the page-slicing behavior stays correct.
- Around line 89-99: Normalize the PDF loading failure in rasteriser.py so the
documented ValueError is always raised from the PdfDocument construction path.
In the function that creates pdfium.PdfDocument(file_bytes), catch the
library-specific PdfiumError around that call and re-raise ValueError with the
existing invalid-PDF message, keeping the empty-bytes check and RenderSettings
handling unchanged.
In `@unstract/sdk1/tests/test_complete_vision.py`:
- Around line 38-42: The helper _make_litellm_response currently annotates
content as str, but it is called with None to model an empty/refused completion.
Update the signature of _make_litellm_response to accept str | None for content,
and keep the response-building logic in sync with that nullable value so the
test case using content=None is type-correct.
In `@workers/executor/executors/legacy_executor.py`:
- Line 1352: Remove the unused source_file_path local from _handle_answer_prompt
in LegacyExecutor, since it is assigned from params but never referenced in that
method; keep the existing flow that passes the value through
context.executor_params to _execute_single_prompt, and leave the surrounding
prompt-handling logic unchanged.
---
Nitpick comments:
In `@workers/executor/executors/answer_prompt.py`:
- Line 433: The error handling in the answer prompt exception blocks currently
uses logger.error inside active except clauses, so update the relevant handlers
in answer_prompt.py to use logger.exception instead to automatically include the
traceback for rasterisation/VLM failures. Also, if the message at the nearby
error formatting site uses a raw exception object, switch it to explicit string
coercion with e!s to satisfy the lint rule. Keep the change localized to the
exception handlers around the existing logger calls.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 3a1457c9-2fc2-4c54-a34d-a379a1a2ef15
⛔ Files ignored due to path filters (5)
backend/uv.lockis excluded by!**/*.lockplatform-service/uv.lockis excluded by!**/*.lockunstract/filesystem/uv.lockis excluded by!**/*.lockunstract/sdk1/uv.lockis excluded by!**/*.lockworkers/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (28)
backend/prompt_studio/prompt_profile_manager_v2/migrations/0006_make_extraction_adapters_nullable.pybackend/prompt_studio/prompt_profile_manager_v2/models.pybackend/prompt_studio/prompt_profile_manager_v2/serializers.pybackend/prompt_studio/prompt_studio_core_v2/constants.pybackend/prompt_studio/prompt_studio_core_v2/prompt_studio_helper.pybackend/prompt_studio/prompt_studio_registry_v2/constants.pybackend/prompt_studio/prompt_studio_registry_v2/prompt_studio_registry_helper.pybackend/prompt_studio/prompt_studio_v2/migrations/0015_toolstudioprompt_extraction_inputs_and_more.pybackend/prompt_studio/prompt_studio_v2/models.pybackend/workflow_manager/endpoint_v2/source.pybackend/workflow_manager/workflow_v2/file_history_helper.pybackend/workflow_manager/workflow_v2/models/file_history.pyfrontend/src/components/custom-tools/prompt-card/PromptCardItems.jsxunstract/sdk1/pyproject.tomlunstract/sdk1/src/unstract/sdk1/constants.pyunstract/sdk1/src/unstract/sdk1/llm.pyunstract/sdk1/src/unstract/sdk1/rasteriser.pyunstract/sdk1/src/unstract/sdk1/vision.pyunstract/sdk1/tests/test_complete_vision.pyunstract/sdk1/tests/test_derive_vision_mode.pyunstract/sdk1/tests/test_rasteriser.pyunstract/sdk1/tests/test_vision_messages.pyworkers/executor/executors/answer_prompt.pyworkers/executor/executors/constants.pyworkers/executor/executors/legacy_executor.pyworkers/file_processing/structure_tool_task.pyworkers/tests/test_single_pass_vision_guard.pyworkers/tests/test_structure_tool_task.py
| TSPKeys.FILE_PATH: doc_path, | ||
| TSPKeys.SOURCE_FILE_PATH: doc_path, |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
SOURCE_FILE_PATH points to the extracted text file, not the source document.
In _fetch_response, doc_path is reassigned at Line 1865 to the extract .txt path (and again to the summarize path at Line 1898), while the original source file is retained in file_path. Setting TSPKeys.SOURCE_FILE_PATH: doc_path therefore passes the extracted-text path downstream. run_vision_completion uses SOURCE_FILE_PATH to read and rasterise the original PDF, so vision-mode prompts on this IDE single-prompt path will fail to rasterise (no pages / read error). Every sibling builder (build_fetch_response_payload Line 885, build_bulk_fetch_response_payload Line 1076, build_single_pass_payload Line 1238) correctly uses the original source path here.
🐛 Proposed fix
TSPKeys.FILE_PATH: doc_path,
- TSPKeys.SOURCE_FILE_PATH: doc_path,
+ TSPKeys.SOURCE_FILE_PATH: file_path,📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| TSPKeys.FILE_PATH: doc_path, | |
| TSPKeys.SOURCE_FILE_PATH: doc_path, | |
| TSPKeys.FILE_PATH: doc_path, | |
| TSPKeys.SOURCE_FILE_PATH: file_path, |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@backend/prompt_studio/prompt_studio_core_v2/prompt_studio_helper.py` around
lines 2016 - 2017, In `_fetch_response`, `SOURCE_FILE_PATH` is being set from
the mutated `doc_path` instead of the original source document path. Update the
payload construction to use `file_path` for `TSPKeys.SOURCE_FILE_PATH` (while
keeping `TSPKeys.FILE_PATH` as-is), matching the behavior in
`build_fetch_response_payload`, `build_bulk_fetch_response_payload`, and
`build_single_pass_payload` so `run_vision_completion` can rasterize the
original PDF correctly.
|
|
||
| logger = logging.getLogger(__name__) | ||
|
|
||
| MAX_VISION_PAGES = int(os.environ.get("MAX_VISION_PAGES", "10")) |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major | ⚡ Quick win
Validate MAX_VISION_PAGES before using it as a default.
A non-numeric env value will crash module import here, and 0/negative values later produce broken truncation semantics ([:0] returns no pages, [:-1] drops only the last page). This should be parsed defensively and clamped to a positive fallback so a bad deploy-time config does not take down vision processing.
Proposed fix
-MAX_VISION_PAGES = int(os.environ.get("MAX_VISION_PAGES", "10"))
+def _read_max_vision_pages() -> int:
+ raw_value = os.environ.get("MAX_VISION_PAGES", "10")
+ try:
+ value = int(raw_value)
+ except ValueError:
+ logger.warning(
+ "Invalid MAX_VISION_PAGES=%r; falling back to 10",
+ raw_value,
+ )
+ return 10
+
+ if value <= 0:
+ logger.warning(
+ "Non-positive MAX_VISION_PAGES=%d; falling back to 10",
+ value,
+ )
+ return 10
+
+ return value
+
+
+MAX_VISION_PAGES = _read_max_vision_pages()Also applies to: 75-76, 120-127
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@unstract/sdk1/src/unstract/sdk1/rasteriser.py` at line 22, Validate the
MAX_VISION_PAGES setting defensively in rasteriser.py instead of converting it
directly at import time; a bad or non-numeric env value should not crash module
import. Update the MAX_VISION_PAGES initialization to parse the env var safely,
fall back to a positive default on parse failure, and clamp zero/negative values
to that fallback. Apply the same positive-default handling anywhere
MAX_VISION_PAGES is consumed in the rasterisation/truncation logic so the
page-slicing behavior stays correct.
| @dataclass | ||
| class RenderSettings: | ||
| """Configuration for PDF page rendering and preprocessing.""" | ||
|
|
||
| dpi: int = 150 | ||
| max_dimension: int = 1568 | ||
|
|
||
| @property | ||
| def scale(self) -> float: | ||
| """Convert DPI to pypdfium2 scale factor (PDF points are 72 DPI).""" | ||
| return self.dpi / 72.0 |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Reject non-positive render settings up front.
dpi <= 0 and max_dimension <= 0 currently fail much later inside pdfium/Pillow with opaque errors. Since RenderSettings is part of the new SDK surface, it should enforce a valid range at construction time.
Proposed fix
`@dataclass`
class RenderSettings:
"""Configuration for PDF page rendering and preprocessing."""
dpi: int = 150
max_dimension: int = 1568
+
+ def __post_init__(self) -> None:
+ if self.dpi <= 0:
+ raise ValueError("dpi must be > 0")
+ if self.max_dimension <= 0:
+ raise ValueError("max_dimension must be > 0")Also applies to: 63-66, 132-143
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@unstract/sdk1/src/unstract/sdk1/rasteriser.py` around lines 25 - 35, Reject
invalid render settings at construction time by validating RenderSettings so dpi
and max_dimension must both be positive, instead of letting pdfium/Pillow fail
later. Add the checks in the RenderSettings dataclass lifecycle (or an
equivalent constructor hook) and ensure any callers in the rendering flow,
including the rasteriser code that uses RenderSettings.scale and downstream size
calculations, surface a clear validation error before rendering starts.
| Raises: | ||
| ValueError: If file_bytes is empty or not a valid PDF. | ||
| """ | ||
| if not file_bytes: | ||
| raise ValueError("file_bytes is empty") | ||
|
|
||
| if settings is None: | ||
| settings = RenderSettings() | ||
|
|
||
| pdf = pdfium.PdfDocument(file_bytes) | ||
| try: |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
printf 'FILE:\n'
git ls-files 'unstract/sdk1/src/unstract/sdk1/rasteriser.py'
printf '\nCONTEXT (lines 1-220):\n'
cat -n unstract/sdk1/src/unstract/sdk1/rasteriser.py | sed -n '1,220p'
printf '\nSEARCH PdfiumError / PdfDocument:\n'
rg -n "PdfiumError|PdfDocument" unstract/sdk1/src/unstract/sdk1/rasteriser.py unstract/sdk1/src -SRepository: Zipstack/unstract
Length of output: 1947
🌐 Web query:
pypdfium2 PdfDocument load failure PdfiumError documentation
💡 Result:
In pypdfium2, the PdfDocument class raises a PdfiumError when a PDF fails to load [1][2]. This exception is a subclass of RuntimeError that indicates a failure reported by the underlying PDFium library [1][3]. Key details regarding PdfiumError and document loading: 1. Exception Structure: Since version 5.0.0, the PdfiumError exception is annotated with an err_code attribute [4][5]. This allows for programmatic handling of specific error subtypes, replacing the previous, less stable method of parsing error strings [5]. 2. Error Codes: The library provides an ErrorToStr dictionary that maps PDFium error constants to human-readable strings [1][3]. Common error codes include: - 2: File access error - 3: Data format error - 4: Incorrect password error - 5: Unsupported security scheme error - 6: Page not found or content error [1] 3. Handling: When using PdfDocument, you should wrap the initialization in a try-except block to catch PdfiumError [1][2]. If you are working with encrypted files, an Incorrect password error (code 4) is frequently raised if an invalid or missing password is provided [5]. 4. Related Exceptions: - FileNotFoundError is raised separately if the provided file path is invalid or non-existent, rather than a PdfiumError [1][2]. - PdfiumWarning is issued for failures related to XFA forms, which may accompany a PdfiumError or occur during other operations [1][4]. For more precise error identification, you can inspect the err_code attribute of the caught PdfiumError instance [1].
Citations:
- 1: https://pypdfium2.readthedocs.io/en/stable/python_api.html
- 2: https://pypdfium2.readthedocs.io/en/stable/_modules/pypdfium2/_helpers/document.html
- 3: https://pypdfium2.readthedocs.io/en/v4/python_api.html
- 4: https://pypdfium2.readthedocs.io/en/stable/changelog.html
- 5: expose more details in PdfiumError pypdfium2-team/pypdfium2#308
Normalize PDF load failures to ValueError. pdfium.PdfDocument(...) raises PdfiumError on invalid input, so callers will see a library-specific exception instead of the SDK’s documented ValueError. Catch it here and re-raise ValueError("file_bytes is not a valid PDF").
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@unstract/sdk1/src/unstract/sdk1/rasteriser.py` around lines 89 - 99,
Normalize the PDF loading failure in rasteriser.py so the documented ValueError
is always raised from the PdfDocument construction path. In the function that
creates pdfium.PdfDocument(file_bytes), catch the library-specific PdfiumError
around that call and re-raise ValueError with the existing invalid-PDF message,
keeping the empty-bytes check and RenderSettings handling unchanged.
| def _make_litellm_response( | ||
| content: str = "extracted value", | ||
| finish_reason: str = "stop", | ||
| usage: dict | None = None, | ||
| ) -> dict: |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win
Widen content type to str | None.
_make_litellm_response is called with content=None (line 164) to simulate a refused/empty response, but the parameter is annotated content: str. This is the SonarCloud type mismatch flagged on line 164.
🔧 Proposed fix
def _make_litellm_response(
- content: str = "extracted value",
+ content: str | None = "extracted value",
finish_reason: str = "stop",
usage: dict | None = None,
) -> dict:📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def _make_litellm_response( | |
| content: str = "extracted value", | |
| finish_reason: str = "stop", | |
| usage: dict | None = None, | |
| ) -> dict: | |
| def _make_litellm_response( | |
| content: str | None = "extracted value", | |
| finish_reason: str = "stop", | |
| usage: dict | None = None, | |
| ) -> dict: |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@unstract/sdk1/tests/test_complete_vision.py` around lines 38 - 42, The helper
_make_litellm_response currently annotates content as str, but it is called with
None to model an empty/refused completion. Update the signature of
_make_litellm_response to accept str | None for content, and keep the
response-building logic in sync with that nullable value so the test case using
content=None is type-correct.
Source: Linters/SAST tools
| tool_id: str = params.get(PSKeys.TOOL_ID, "") | ||
| run_id: str = context.run_id | ||
| file_path = params.get(PSKeys.FILE_PATH) | ||
| source_file_path: str = params.get(PSKeys.SOURCE_FILE_PATH, "") |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win
Remove the unused source_file_path local in _handle_answer_prompt.
This local is assigned but never referenced within _handle_answer_prompt (the value reaches _execute_single_prompt independently via context.executor_params at Line 1658/1663). It's dead code and matches the SonarCloud warning.
🧹 Proposed cleanup
file_path = params.get(PSKeys.FILE_PATH)
- source_file_path: str = params.get(PSKeys.SOURCE_FILE_PATH, "")
doc_name = str(params.get(PSKeys.FILE_NAME, ""))📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| source_file_path: str = params.get(PSKeys.SOURCE_FILE_PATH, "") |
🧰 Tools
🪛 GitHub Check: SonarCloud Code Analysis
[warning] 1352-1352: Remove the unused local variable "source_file_path".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@workers/executor/executors/legacy_executor.py` at line 1352, Remove the
unused source_file_path local from _handle_answer_prompt in LegacyExecutor,
since it is assigned from params but never referenced in that method; keep the
existing flow that passes the value through context.executor_params to
_execute_single_prompt, and leave the surrounding prompt-handling logic
unchanged.
Source: Linters/SAST tools



What
extraction_inputsandsource_of_truth) derive a vision mode that enriches the answer-step LLM context with page imagestext_only(default, today's behavior),spatial_helper(text primary + images as layout aids),source_of_truth(images primary, text as hint)Why
How
SDK Foundation (Phase 1 + 3):
rasteriser.py: Renders PDF pages to preprocessed PNG bytes in-memory via pypdfium2 + Pillow (mirrors agentic table's proven preprocessing pipeline)complete_vision()method onLLMclass with fullcomplete()parity (retry, extract_json, process_text, usage recording, error wrapping)vision.py: Message builder assembling OpenAI-style multimodal messages with mode-aware content orderingderive_vision_mode()pure function mapping two UI fields to three modesBackend Models & Serializers (Phase 2):
ToolStudioPrompt: Addedextraction_inputsandsource_of_truthfields (default:text)ProfileManager: Madevector_store,embedding_model,x2textFKs nullable with app-layer validation (image-only prompts don't need extraction adapters)Answer Path Integration (Phase 4):
run_completion()— vision modes callcomplete_vision()with rasterised page imagesRegistry Export (Phase 5):
textmode whensingle_pass_extraction_mode=TrueCache Key Discriminator (Phase 6):
cache_keyandprovider_file_uuidlookup pathsFrontend (Phase 7):
PromptCardItems.jsxfor the cloud-sideVisionModeSelectorpluginDeployed Workflow Path (Phase 8):
structure_tool_task.py: Extraction skip when all prompts are image-only, source file path in answer paramslegacy_executor.py: Fixedsource_file_pathunpacking in_execute_single_prompt()Can this PR break any existing features? If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)
text(today's behavior)derive_vision_mode("text", "text")returnstext_only, which follows the existing code path exactlytext_only(version check)VisionModeSelectoris dynamically imported — if the plugin is absent (OSS), the hook silently resolves tonulland nothing rendersRelevant Docs
Related Issues or PRs
UN-3639-vlm-support(VisionModeSelector plugin, SinglePassToggleSwitch warning, single-pass executor guard)Dependencies Versions / Env Variables
MAX_VISION_PAGES(default:10) — caps the number of pages rasterised per VLM callNotes on Testing
test_derive_vision_mode.py— 8 tests: all 4 derivation cases + enum constantstest_vision_messages.py— 10 tests: content ordering, base64 encoding, page labels, empty/null inputs, error casestest_rasteriser.py— 16 tests: settings, preprocessing, page filtering, max_pages truncation, RGBA conversion, cleanup guaranteestest_complete_vision.py— 8 tests: response format, message passthrough, usage recording, extract_json, error wrappingtest_structure_tool_task.py— 7 vision skip teststest_single_pass_vision_guard.py— 5 guard testsScreenshots
N/A — frontend changes are a dynamic import hook only (plugin renders in cloud repo)
Checklist
I have read and understood the Contribution Guidelines.