Skip to content

UN-3639 [FEAT] Add VLM/multimodal LLM support for per-prompt vision modes#2131

Open
harini-venkataraman wants to merge 5 commits into
mainfrom
UN-3639-vlm-support
Open

UN-3639 [FEAT] Add VLM/multimodal LLM support for per-prompt vision modes#2131
harini-venkataraman wants to merge 5 commits into
mainfrom
UN-3639-vlm-support

Conversation

@harini-venkataraman

Copy link
Copy Markdown
Contributor

What

  • Add VLM/multimodal LLM support as an opt-in, per-prompt capability in Prompt Studio and deployed workflows
  • Two new per-prompt fields (extraction_inputs and source_of_truth) derive a vision mode that enriches the answer-step LLM context with page images
  • Three vision modes: text_only (default, today's behavior), spatial_helper (text primary + images as layout aids), source_of_truth (images primary, text as hint)

Why

  • Unstract is currently text-centric: LLMWhisperer extracts text, text goes to LLM, structured data comes out
  • Many documents contain visual layout information (tables, forms, charts) that text extraction alone cannot capture
  • VLM-capable models (Claude 3.5 Sonnet, GPT-4o, Gemini) can extract structured data directly from page images, improving accuracy for layout-sensitive documents
  • This is additive and opt-in — existing text-only workflows are completely unaffected

How

SDK Foundation (Phase 1 + 3):

  • New rasteriser.py: Renders PDF pages to preprocessed PNG bytes in-memory via pypdfium2 + Pillow (mirrors agentic table's proven preprocessing pipeline)
  • New complete_vision() method on LLM class with full complete() parity (retry, extract_json, process_text, usage recording, error wrapping)
  • New vision.py: Message builder assembling OpenAI-style multimodal messages with mode-aware content ordering
  • derive_vision_mode() pure function mapping two UI fields to three modes

Backend Models & Serializers (Phase 2):

  • ToolStudioPrompt: Added extraction_inputs and source_of_truth fields (default: text)
  • ProfileManager: Made vector_store, embedding_model, x2text FKs nullable with app-layer validation (image-only prompts don't need extraction adapters)
  • Serializers updated to accept/validate new fields

Answer Path Integration (Phase 4):

  • Branching in run_completion() — vision modes call complete_vision() with rasterised page images
  • Highlight/postamble suppressed for vision modes (no OCR line metadata to ground against)
  • Source file path plumbed through for rasterisation

Registry Export (Phase 5):

  • Vision fields carried in per-prompt output dicts for deployed workflows
  • Single-pass export-time override: forces text mode when single_pass_extraction_mode=True
  • Metadata versioning for backward compatibility

Cache Key Discriminator (Phase 6):

  • Vision signature injected into cache key hash to prevent collisions between text-only and vision runs
  • Applied consistently in both cache_key and provider_file_uuid lookup paths

Frontend (Phase 7):

  • Dynamic import hook in PromptCardItems.jsx for the cloud-side VisionModeSelector plugin

Deployed Workflow Path (Phase 8):

  • structure_tool_task.py: Extraction skip when all prompts are image-only, source file path in answer params
  • legacy_executor.py: Fixed source_file_path unpacking in _execute_single_prompt()

Can this PR break any existing features? If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)

  • No — all changes are additive and backward-compatible:
    • New model fields default to text (today's behavior)
    • derive_vision_mode("text", "text") returns text_only, which follows the existing code path exactly
    • Nullable FK migration is additive — existing profiles with populated FKs are unaffected; app-layer validation ensures they remain required for text-using prompts
    • Cache key: when no vision prompts exist, the vision signature is empty, producing the same hash as before
    • Registry export: old exports without vision fields are consumed as text_only (version check)
    • Frontend: VisionModeSelector is dynamically imported — if the plugin is absent (OSS), the hook silently resolves to null and nothing renders

Relevant Docs

Related Issues or PRs

  • UN-3639
  • Cloud PR: UN-3639-vlm-support (VisionModeSelector plugin, SinglePassToggleSwitch warning, single-pass executor guard)

Dependencies Versions / Env Variables

  • New optional env: MAX_VISION_PAGES (default: 10) — caps the number of pages rasterised per VLM call
  • No new package dependencies (pypdfium2 and Pillow are existing transitive dependencies; explicitly pinned in sdk1 pyproject.toml)

Notes on Testing

  • SDK unit tests (46 tests):
    • test_derive_vision_mode.py — 8 tests: all 4 derivation cases + enum constants
    • test_vision_messages.py — 10 tests: content ordering, base64 encoding, page labels, empty/null inputs, error cases
    • test_rasteriser.py — 16 tests: settings, preprocessing, page filtering, max_pages truncation, RGBA conversion, cleanup guarantees
    • test_complete_vision.py — 8 tests: response format, message passthrough, usage recording, extract_json, error wrapping
  • Workers tests (12 new tests):
    • test_structure_tool_task.py — 7 vision skip tests
    • test_single_pass_vision_guard.py — 5 guard tests
  • Full SDK suite: 425/429 pass (4 pre-existing async failures)
  • Full workers suite: 55/55 pass

Screenshots

N/A — frontend changes are a dynamic import hook only (plugin renders in cloud repo)

Checklist

I have read and understood the Contribution Guidelines.

harini-venkataraman and others added 3 commits June 30, 2026 20:56
…odes

Add opt-in vision capability to the answer-step LLM context, enabling
VLM/multimodal models to receive page images alongside extracted text.

SDK (Phase 1 + 3):
- Rasteriser module: PDF pages to preprocessed PNG bytes via pypdfium2
- complete_vision() method with full complete() parity (retry, extract_json,
  usage recording, error wrapping)
- Vision message builder: OpenAI-style multimodal messages with mode-aware
  ordering (spatial_helper vs source_of_truth)
- VisionMode/ExtractionInputs constants and derive_vision_mode() pure function

Backend (Phase 2 + 5 + 6):
- Prompt model: extraction_inputs and source_of_truth fields with migration
- Profile model: nullable vector_store/embedding/x2text FKs with app-layer
  validation and migration
- Registry export: vision fields in per-prompt output dicts, single-pass
  export-time override to force text mode
- Cache key discriminator: vision signature prevents cache collisions between
  text-only and vision runs

Workers (Phase 4 + 8):
- Answer path branching: vision mode derivation, rasterisation, and
  complete_vision() call in both executor and prompt-service paths
- Structure tool: extraction skip for all-image-only prompts, source file
  path passthrough for rasterisation

Frontend (Phase 7):
- PromptCardItems: dynamic import hook for VisionModeSelector plugin

Tests (Phase 10):
- SDK: 46 tests (mode derivation, message builder, rasteriser, complete_vision)
- Workers: 12 tests (structure tool vision skip, single-pass vision guard)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Summary by CodeRabbit

  • New Features

    • Added vision-mode support for prompts, including new options for text, image, or both inputs.
    • Enabled multimodal processing with page images and source-document context.
    • Added support for rasterizing PDF pages for vision-based completions.
  • Bug Fixes

    • Improved handling of optional extraction-related settings so image-only workflows no longer require unused fields.
    • Updated hashing and caching behavior to better distinguish text-only and vision-enabled runs.
    • Added validation and fallback behavior to reduce errors in vision-mode prompt execution.

Walkthrough

Adds end-to-end vision mode support for Prompt Studio. Per-prompt extraction_inputs and source_of_truth fields are introduced in the data model; profile adapter FKs are made nullable for image-only setups. The SDK gains a PDF rasteriser, multimodal message builder, and vision constants. The executor pipeline threads source_file_path through to a new run_vision_completion path. File content hashes incorporate a vision-mode signature, and a VisionModeSelector plugin is conditionally rendered in the frontend prompt card.

Changes

Vision Mode Feature

Layer / File(s) Summary
ToolStudioPrompt and ProfileManager model changes
backend/prompt_studio/prompt_studio_v2/models.py, backend/prompt_studio/prompt_studio_v2/migrations/0015_toolstudioprompt_extraction_inputs_and_more.py, backend/prompt_studio/prompt_profile_manager_v2/models.py, backend/prompt_studio/prompt_profile_manager_v2/migrations/0006_make_extraction_adapters_nullable.py
Adds ExtractionInput/SourceOfTruth enums and extraction_inputs/source_of_truth fields to ToolStudioPrompt; makes ProfileManager.embedding_model and x2text nullable; creates matching migrations.
ProfileManager serializer conditional validation
backend/prompt_studio/prompt_profile_manager_v2/serializers.py, backend/prompt_studio/prompt_studio_registry_v2/prompt_studio_registry_helper.py
Adds _TEXT_EXTRACTION_FIELDS and ProfileManagerSerializer.validate() requiring extraction adapter fields only when linked prompts need text extraction; updates export helper to handle null adapter FKs and conditionally override vision fields in single-pass mode.
SDK vision constants, rasteriser, and message builder
unstract/sdk1/pyproject.toml, unstract/sdk1/src/unstract/sdk1/constants.py, unstract/sdk1/src/unstract/sdk1/rasteriser.py, unstract/sdk1/src/unstract/sdk1/vision.py
Adds VisionMode/ExtractionInputs/SourceOfTruthValues and derive_vision_mode(); implements RenderSettings and rasterise_pages for PDF-to-PNG conversion via pypdfium2; implements build_vision_messages for OpenAI-style multimodal payloads.
LLM.complete_vision parity
unstract/sdk1/src/unstract/sdk1/llm.py
Routes complete_vision through call_with_retry, applies _post_process_response with extract_json/process_text, wraps into LLMResponseCompat, attaches raw response, and records usage.
SDK unit tests
unstract/sdk1/tests/test_derive_vision_mode.py, unstract/sdk1/tests/test_rasteriser.py, unstract/sdk1/tests/test_vision_messages.py, unstract/sdk1/tests/test_complete_vision.py
Tests for derive_vision_mode constants and mapping, rasterise_pages edge cases, build_vision_messages modes and error paths, and LLM.complete_vision retry/post-processing/error behavior.
Vision field constants and prompt studio payload wiring
backend/prompt_studio/prompt_studio_core_v2/constants.py, backend/prompt_studio/prompt_studio_registry_v2/constants.py, workers/executor/executors/constants.py, backend/prompt_studio/prompt_studio_core_v2/prompt_studio_helper.py
Adds EXTRACTION_INPUTS/SOURCE_OF_TRUTH/SOURCE_FILE_PATH constants; threads these fields into all prompt output and executor payload builders.
Executor vision completion path
workers/executor/executors/answer_prompt.py, workers/executor/executors/legacy_executor.py
Adds source_file_path to construct_and_run_prompt, derives vision_mode per prompt, short-circuits to run_vision_completion for non-TEXT_ONLY modes (file read → rasterise → build messages → complete_vision); threads source_file_path through legacy_executor.
Structure tool task vision skip and source_file_path wiring
workers/file_processing/structure_tool_task.py
Adds _should_skip_extraction_for_vision to skip indexing when all prompts are image-only; computes source_file_path from the immutable SOURCE file and passes it to answer_params.
Vision-aware file content hashing
backend/workflow_manager/endpoint_v2/source.py, backend/workflow_manager/workflow_v2/file_history_helper.py, backend/workflow_manager/workflow_v2/models/file_history.py
Adds _compute_vision_signature querying prompt registry for non-text extraction inputs; incorporates the signature into SHA-256 hashes across all three file-ingestion paths; updates FileHistory.cache_key db comment.
VisionModeSelector in PromptCardItems
frontend/src/components/custom-tools/prompt-card/PromptCardItems.jsx
Dynamically imports the VisionModeSelector plugin and conditionally renders it in the prompt card, wired to promptDetails/handleChange/enforceType with appropriate disabled conditions.
Worker tests for vision skip and single-pass guard
workers/tests/test_structure_tool_task.py, workers/tests/test_single_pass_vision_guard.py
Tests _should_skip_extraction_for_vision across all extraction modes; tests SinglePassExtractionExecutor vision guard blocking image/both prompts while passing text-only prompts.

Sequence Diagram(s)

sequenceDiagram
    participant StructureToolTask
    participant LegacyExecutor
    participant AnswerPromptService
    participant SDKRasteriser
    participant SDKVision
    participant LLM

    StructureToolTask->>LegacyExecutor: answer_params with source_file_path, extraction_inputs, source_of_truth
    LegacyExecutor->>AnswerPromptService: construct_and_run_prompt(source_file_path, extraction_inputs, source_of_truth)
    AnswerPromptService->>AnswerPromptService: derive_vision_mode(extraction_inputs, source_of_truth)
    alt vision_mode != TEXT_ONLY
        AnswerPromptService->>SDKRasteriser: rasterise_pages(pdf_bytes)
        SDKRasteriser-->>AnswerPromptService: list[(page, png_bytes)]
        AnswerPromptService->>SDKVision: build_vision_messages(text_context, page_images, prompt, mode)
        SDKVision-->>AnswerPromptService: messages
        AnswerPromptService->>LLM: complete_vision(messages, extract_json=True)
        LLM-->>AnswerPromptService: LLMResponseCompat
    else TEXT_ONLY
        AnswerPromptService->>LLM: complete(prompt, ...)
        LLM-->>AnswerPromptService: LLMResponseCompat
    end
    AnswerPromptService-->>LegacyExecutor: response text
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title is concise and accurately summarizes the main change: adding per-prompt VLM vision modes.
Description check ✅ Passed The description covers most required sections and key implementation details, though Database Migrations and Env Config are not separated exactly as the template requests.
Docstring Coverage ✅ Passed Docstring coverage is 91.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch UN-3639-vlm-support

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@github-actions

Copy link
Copy Markdown
Contributor

Frontend Lint Report (Biome)

All checks passed! No linting or formatting issues found.

@sonarqubecloud

Copy link
Copy Markdown

@github-actions

Copy link
Copy Markdown
Contributor

Unstract test results

Per-group results

Status Group Tier Passed Failed Errors Skipped Duration (s)
unit-connectors unit 64 12 0 3 16.6
unit-core unit 0 0 4 0 1.2
unit-platform-service unit 9 0 1 0 1.3
unit-rig unit 53 0 0 0 3.3
unit-runner unit 11 0 0 0 3.0
unit-sdk1 unit 427 0 0 0 20.8
unit-tool-registry unit 0 0 1 0 1.3
unit-workers unit 0 0 0 0 17.7
TOTAL 564 12 6 3 65.2

Critical paths

⚠️ Critical paths not yet covered

  • auth-login — User can log in and obtain a session cookie. (entry: POST /api/v1/auth/login; declared coverage: no groups declared)
  • adapter-register-llm — Register and validate an LLM adapter. (entry: POST /api/v1/adapter/; declared coverage: no groups declared)
  • workflow-create-execute — Create a workflow, configure source+destination, execute, poll, fetch result. (entry: POST /api/v1/workflow/{id}/execute/; declared coverage: e2e-workflow)
  • api-deployment-run — Deploy a workflow as an API, POST a document, receive structured JSON. (entry: POST /deployment/api/{org}/{name}/; declared coverage: e2e-api-deployment)
  • prompt-studio-fetch-response — Prompt Studio: create project, add prompt, run single-pass, get response. (entry: POST /api/v1/prompt-studio/prompt-studio-tool/{id}/fetch_response/; declared coverage: e2e-prompt-studio)
  • pipeline-etl-execute — Run an ETL pipeline from source connector to destination. (entry: POST /api/v1/pipeline/{id}/execute/; declared coverage: no groups declared)
  • usage-token-tracking — Per-execution token usage is recorded and retrievable. (entry: GET /api/v1/usage/get_token_usage/; declared coverage: no groups declared)
  • workflow-execution-fan-out — Multi-file workflow execution fans out to file-processing workers and rejoins. (entry: internal: backend → rabbitmq → workers/file_processing; declared coverage: no groups declared)
  • callback-result-delivery — Async results are posted back via the callback worker. (entry: internal: workers/callback → backend /internal endpoints; declared coverage: no groups declared)
✅ Covered critical paths
  • tool-sandbox-exec — covered by unit-runner

@greptile-apps

greptile-apps Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds opt-in VLM/multimodal support to Prompt Studio and deployed workflows through three new vision modes (text_only, spatial_helper, source_of_truth), controlled by two new per-prompt fields. All changes are additive: new model fields default to text, derive_vision_mode("text","text") follows the existing code path exactly, and the cache key discriminator produces the same hash as before when no vision prompts exist.

  • SDK foundation (rasteriser.py, vision.py, llm.complete_vision()): in-memory PDF rasterisation, OpenAI-style multimodal message assembly, and full parity with complete() (retry, extract_json, process_text, usage recording).
  • Backend models & serializers: ToolStudioPrompt gains extraction_inputs/source_of_truth; ProfileManager extraction adapter FKs are made nullable with app-layer validation; registry export carries vision fields and forces text mode for single-pass workflows.
  • Answer path & deployed workflow: run_completion() branches into run_vision_completion() for non-text modes; structure_tool_task.py skips text extraction when all prompts are image-only; vision signature is injected into the file-content hash to prevent cache collisions.

Confidence Score: 3/5

Safe to merge for the text-only path; the image extraction path has a validation gap in the ProfileManager serializer that could allow an operator to null out required adapters through an explicit PATCH, which would be caught only at query time downstream.

The serializer's validate() method correctly guards new profile creation and most update flows, but falls through when a PATCH request explicitly sets an extraction adapter to null while linked prompts still expect text extraction. The old instance FK value gives a false negative in the missing-field check, so the null is persisted silently. This is a definite defect in the new validation code, not a pre-existing issue, and it lives in the path that protects production data integrity for the feature's core invariant.

backend/prompt_studio/prompt_profile_manager_v2/serializers.py — the validate() guard; unstract/sdk1/src/unstract/sdk1/rasteriser.py — duplicate MAX_VISION_PAGES env read

Important Files Changed

Filename Overview
backend/prompt_studio/prompt_profile_manager_v2/serializers.py Adds validate() to enforce extraction adapters when prompts need text; validation logic has a gap when attrs explicitly sends null for an adapter field during PATCH/PUT
workers/executor/executors/answer_prompt.py Branches run_completion() into vision path; adds run_vision_completion() that reads PDF bytes, rasterises, builds messages, and calls complete_vision()
backend/workflow_manager/endpoint_v2/source.py Injects vision signature into file content hash for cache discrimination; vision signature not applied to hash in the add_input_from_api_storage_to_volume fallback path
unstract/sdk1/src/unstract/sdk1/rasteriser.py New module rasterising PDF pages to PNG bytes via pypdfium2+Pillow; duplicates MAX_VISION_PAGES definition from constants.py
unstract/sdk1/src/unstract/sdk1/llm.py complete_vision() now has full parity with complete(): retry logic, extract_json, process_text callback, usage recording
workers/file_processing/structure_tool_task.py Adds skip-extraction logic for image-only vision prompts; source_file_path hardcoded to SOURCE which matches source connector convention
backend/prompt_studio/prompt_studio_registry_v2/prompt_studio_registry_helper.py Carries vision fields in per-prompt export; forces text_only for single-pass extraction mode with appropriate warning

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant UI as Prompt Studio UI
    participant BE as Backend (PromptStudioHelper)
    participant W as Worker (LegacyExecutor)
    participant AP as AnswerPromptService
    participant R as rasteriser
    participant V as vision (message builder)
    participant LLM as LLM.complete_vision()

    UI->>BE: Run prompt (extraction_inputs, source_of_truth)
    BE->>BE: Build output dict incl. vision fields + source_file_path
    BE->>W: Dispatch answer params via Celery
    W->>AP: "run_completion(output, source_file_path=...)"
    AP->>AP: derive_vision_mode(extraction_inputs, source_of_truth)
    alt "vision_mode == text_only"
        AP->>AP: run_completion() → LLM.complete()
    else spatial_helper or source_of_truth
        AP->>AP: run_vision_completion()
        AP->>R: rasterise_pages(file_bytes, settings)
        R-->>AP: [(page_num, png_bytes), ...]
        AP->>V: build_vision_messages(system, text_ctx, images, prompt, mode)
        V-->>AP: OpenAI-style multimodal messages
        AP->>LLM: "complete_vision(messages, extract_json=...)"
        LLM-->>AP: "{response: LLMResponseCompat, ...}"
        AP-->>W: answer string
    end
    W-->>BE: answer
    BE-->>UI: structured output
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant UI as Prompt Studio UI
    participant BE as Backend (PromptStudioHelper)
    participant W as Worker (LegacyExecutor)
    participant AP as AnswerPromptService
    participant R as rasteriser
    participant V as vision (message builder)
    participant LLM as LLM.complete_vision()

    UI->>BE: Run prompt (extraction_inputs, source_of_truth)
    BE->>BE: Build output dict incl. vision fields + source_file_path
    BE->>W: Dispatch answer params via Celery
    W->>AP: "run_completion(output, source_file_path=...)"
    AP->>AP: derive_vision_mode(extraction_inputs, source_of_truth)
    alt "vision_mode == text_only"
        AP->>AP: run_completion() → LLM.complete()
    else spatial_helper or source_of_truth
        AP->>AP: run_vision_completion()
        AP->>R: rasterise_pages(file_bytes, settings)
        R-->>AP: [(page_num, png_bytes), ...]
        AP->>V: build_vision_messages(system, text_ctx, images, prompt, mode)
        V-->>AP: OpenAI-style multimodal messages
        AP->>LLM: "complete_vision(messages, extract_json=...)"
        LLM-->>AP: "{response: LLMResponseCompat, ...}"
        AP-->>W: answer string
    end
    W-->>BE: answer
    BE-->>UI: structured output
Loading

Comments Outside Diff (1)

  1. backend/workflow_manager/endpoint_v2/source.py, line 1102-1123 (link)

    P2 Vision signature not applied on the API-storage fallback path: add_input_from_api_storage_to_volume returns the raw content hash from _copy_file_to_destination without mixing in self._vision_signature. The caller at line 1210–1212 overrides this with file_hash.file_hash when it is truthy (the hash computed in copy_from_api_source, which does include the signature), but falls back to the unsigned hash when file_hash.file_hash is falsy. A workflow whose file arrives through this edge case would produce a cache key that doesn't discriminate between text-only and vision runs of the same file.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: backend/workflow_manager/endpoint_v2/source.py
    Line: 1102-1123
    
    Comment:
    Vision signature not applied on the API-storage fallback path: `add_input_from_api_storage_to_volume` returns the raw content hash from `_copy_file_to_destination` without mixing in `self._vision_signature`. The caller at line 1210–1212 overrides this with `file_hash.file_hash` when it is truthy (the hash computed in `copy_from_api_source`, which does include the signature), but falls back to the unsigned hash when `file_hash.file_hash` is falsy. A workflow whose file arrives through this edge case would produce a cache key that doesn't discriminate between text-only and vision runs of the same file.
    
    How can I resolve this? If you propose a fix, please make it concise.

    Fix in Claude Code

Fix All in Claude Code

Prompt To Fix All With AI
Fix the following 3 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 3
backend/prompt_studio/prompt_profile_manager_v2/serializers.py:50-56
Validation gap on explicit null in PATCH/PUT: when a client sends `{"vector_store": null}` the check `not attrs.get(field)` evaluates True (None is falsy), but `not getattr(instance, "vector_store_id", None)` evaluates False (the old FK is still on the instance), so the field is never added to `missing` and the null is saved silently — even when linked prompts still need text extraction. The guard needs to treat an inbound null as a removal, not as "not provided".

```suggestion
        if needs_text:
            missing = [
                field
                for field in _TEXT_EXTRACTION_FIELDS
                if not (
                    # Use incoming value when explicitly provided, else fall
                    # back to whatever is already set on the instance.
                    attrs.get(field)
                    or (
                        field not in attrs
                        and instance is not None
                        and getattr(instance, f"{field}_id", None)
                    )
                )
            ]
```

### Issue 2 of 3
unstract/sdk1/src/unstract/sdk1/rasteriser.py:17-22
`MAX_VISION_PAGES` is read from the environment a second time here, duplicating the definition already published in `constants.py`. The default parameter `max_pages: int = MAX_VISION_PAGES` bakes in the module-level value at import time; if these two definitions ever diverge (e.g. different defaults), callers relying on the constant from `constants.py` would get a different cap than the rasteriser's own default.

```suggestion
import pypdfium2 as pdfium
from PIL import Image, ImageFilter

from unstract.sdk1.constants import MAX_VISION_PAGES

logger = logging.getLogger(__name__)
```

### Issue 3 of 3
backend/workflow_manager/endpoint_v2/source.py:1102-1123
Vision signature not applied on the API-storage fallback path: `add_input_from_api_storage_to_volume` returns the raw content hash from `_copy_file_to_destination` without mixing in `self._vision_signature`. The caller at line 1210–1212 overrides this with `file_hash.file_hash` when it is truthy (the hash computed in `copy_from_api_source`, which does include the signature), but falls back to the unsigned hash when `file_hash.file_hash` is falsy. A workflow whose file arrives through this edge case would produce a cache key that doesn't discriminate between text-only and vision runs of the same file.

Reviews (1): Last reviewed commit: "Commit uv.lock changes" | Re-trigger Greptile

Comment on lines +50 to +56
if needs_text:
missing = [
field
for field in _TEXT_EXTRACTION_FIELDS
if not attrs.get(field)
and (instance is None or not getattr(instance, f"{field}_id", None))
]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Validation gap on explicit null in PATCH/PUT: when a client sends {"vector_store": null} the check not attrs.get(field) evaluates True (None is falsy), but not getattr(instance, "vector_store_id", None) evaluates False (the old FK is still on the instance), so the field is never added to missing and the null is saved silently — even when linked prompts still need text extraction. The guard needs to treat an inbound null as a removal, not as "not provided".

Suggested change
if needs_text:
missing = [
field
for field in _TEXT_EXTRACTION_FIELDS
if not attrs.get(field)
and (instance is None or not getattr(instance, f"{field}_id", None))
]
if needs_text:
missing = [
field
for field in _TEXT_EXTRACTION_FIELDS
if not (
# Use incoming value when explicitly provided, else fall
# back to whatever is already set on the instance.
attrs.get(field)
or (
field not in attrs
and instance is not None
and getattr(instance, f"{field}_id", None)
)
)
]
Prompt To Fix With AI
This is a comment left during a code review.
Path: backend/prompt_studio/prompt_profile_manager_v2/serializers.py
Line: 50-56

Comment:
Validation gap on explicit null in PATCH/PUT: when a client sends `{"vector_store": null}` the check `not attrs.get(field)` evaluates True (None is falsy), but `not getattr(instance, "vector_store_id", None)` evaluates False (the old FK is still on the instance), so the field is never added to `missing` and the null is saved silently — even when linked prompts still need text extraction. The guard needs to treat an inbound null as a removal, not as "not provided".

```suggestion
        if needs_text:
            missing = [
                field
                for field in _TEXT_EXTRACTION_FIELDS
                if not (
                    # Use incoming value when explicitly provided, else fall
                    # back to whatever is already set on the instance.
                    attrs.get(field)
                    or (
                        field not in attrs
                        and instance is not None
                        and getattr(instance, f"{field}_id", None)
                    )
                )
            ]
```

How can I resolve this? If you propose a fix, please make it concise.

Fix in Claude Code

Comment on lines +17 to +22
import pypdfium2 as pdfium
from PIL import Image, ImageFilter

logger = logging.getLogger(__name__)

MAX_VISION_PAGES = int(os.environ.get("MAX_VISION_PAGES", "10"))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 MAX_VISION_PAGES is read from the environment a second time here, duplicating the definition already published in constants.py. The default parameter max_pages: int = MAX_VISION_PAGES bakes in the module-level value at import time; if these two definitions ever diverge (e.g. different defaults), callers relying on the constant from constants.py would get a different cap than the rasteriser's own default.

Suggested change
import pypdfium2 as pdfium
from PIL import Image, ImageFilter
logger = logging.getLogger(__name__)
MAX_VISION_PAGES = int(os.environ.get("MAX_VISION_PAGES", "10"))
import pypdfium2 as pdfium
from PIL import Image, ImageFilter
from unstract.sdk1.constants import MAX_VISION_PAGES
logger = logging.getLogger(__name__)
Prompt To Fix With AI
This is a comment left during a code review.
Path: unstract/sdk1/src/unstract/sdk1/rasteriser.py
Line: 17-22

Comment:
`MAX_VISION_PAGES` is read from the environment a second time here, duplicating the definition already published in `constants.py`. The default parameter `max_pages: int = MAX_VISION_PAGES` bakes in the module-level value at import time; if these two definitions ever diverge (e.g. different defaults), callers relying on the constant from `constants.py` would get a different cap than the rasteriser's own default.

```suggestion
import pypdfium2 as pdfium
from PIL import Image, ImageFilter

from unstract.sdk1.constants import MAX_VISION_PAGES

logger = logging.getLogger(__name__)
```

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Fix in Claude Code

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (1)
workers/executor/executors/answer_prompt.py (1)

433-433: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Prefer logger.exception in these except blocks.

Both handlers already sit inside except clauses with an active exception; logger.exception(...) captures the traceback, which is valuable for diagnosing rasterisation/VLM failures. Optionally also apply {e!s} at Line 429 to satisfy RUF010.

Also applies to: 441-441

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@workers/executor/executors/answer_prompt.py` at line 433, The error handling
in the answer prompt exception blocks currently uses logger.error inside active
except clauses, so update the relevant handlers in answer_prompt.py to use
logger.exception instead to automatically include the traceback for
rasterisation/VLM failures. Also, if the message at the nearby error formatting
site uses a raw exception object, switch it to explicit string coercion with e!s
to satisfy the lint rule. Keep the change localized to the exception handlers
around the existing logger calls.

Source: Linters/SAST tools

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/prompt_studio/prompt_studio_core_v2/prompt_studio_helper.py`:
- Around line 2016-2017: In `_fetch_response`, `SOURCE_FILE_PATH` is being set
from the mutated `doc_path` instead of the original source document path. Update
the payload construction to use `file_path` for `TSPKeys.SOURCE_FILE_PATH`
(while keeping `TSPKeys.FILE_PATH` as-is), matching the behavior in
`build_fetch_response_payload`, `build_bulk_fetch_response_payload`, and
`build_single_pass_payload` so `run_vision_completion` can rasterize the
original PDF correctly.

In `@unstract/sdk1/src/unstract/sdk1/rasteriser.py`:
- Around line 25-35: Reject invalid render settings at construction time by
validating RenderSettings so dpi and max_dimension must both be positive,
instead of letting pdfium/Pillow fail later. Add the checks in the
RenderSettings dataclass lifecycle (or an equivalent constructor hook) and
ensure any callers in the rendering flow, including the rasteriser code that
uses RenderSettings.scale and downstream size calculations, surface a clear
validation error before rendering starts.
- Line 22: Validate the MAX_VISION_PAGES setting defensively in rasteriser.py
instead of converting it directly at import time; a bad or non-numeric env value
should not crash module import. Update the MAX_VISION_PAGES initialization to
parse the env var safely, fall back to a positive default on parse failure, and
clamp zero/negative values to that fallback. Apply the same positive-default
handling anywhere MAX_VISION_PAGES is consumed in the rasterisation/truncation
logic so the page-slicing behavior stays correct.
- Around line 89-99: Normalize the PDF loading failure in rasteriser.py so the
documented ValueError is always raised from the PdfDocument construction path.
In the function that creates pdfium.PdfDocument(file_bytes), catch the
library-specific PdfiumError around that call and re-raise ValueError with the
existing invalid-PDF message, keeping the empty-bytes check and RenderSettings
handling unchanged.

In `@unstract/sdk1/tests/test_complete_vision.py`:
- Around line 38-42: The helper _make_litellm_response currently annotates
content as str, but it is called with None to model an empty/refused completion.
Update the signature of _make_litellm_response to accept str | None for content,
and keep the response-building logic in sync with that nullable value so the
test case using content=None is type-correct.

In `@workers/executor/executors/legacy_executor.py`:
- Line 1352: Remove the unused source_file_path local from _handle_answer_prompt
in LegacyExecutor, since it is assigned from params but never referenced in that
method; keep the existing flow that passes the value through
context.executor_params to _execute_single_prompt, and leave the surrounding
prompt-handling logic unchanged.

---

Nitpick comments:
In `@workers/executor/executors/answer_prompt.py`:
- Line 433: The error handling in the answer prompt exception blocks currently
uses logger.error inside active except clauses, so update the relevant handlers
in answer_prompt.py to use logger.exception instead to automatically include the
traceback for rasterisation/VLM failures. Also, if the message at the nearby
error formatting site uses a raw exception object, switch it to explicit string
coercion with e!s to satisfy the lint rule. Keep the change localized to the
exception handlers around the existing logger calls.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3a1457c9-2fc2-4c54-a34d-a379a1a2ef15

📥 Commits

Reviewing files that changed from the base of the PR and between cc952bf and e449a0c.

⛔ Files ignored due to path filters (5)
  • backend/uv.lock is excluded by !**/*.lock
  • platform-service/uv.lock is excluded by !**/*.lock
  • unstract/filesystem/uv.lock is excluded by !**/*.lock
  • unstract/sdk1/uv.lock is excluded by !**/*.lock
  • workers/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (28)
  • backend/prompt_studio/prompt_profile_manager_v2/migrations/0006_make_extraction_adapters_nullable.py
  • backend/prompt_studio/prompt_profile_manager_v2/models.py
  • backend/prompt_studio/prompt_profile_manager_v2/serializers.py
  • backend/prompt_studio/prompt_studio_core_v2/constants.py
  • backend/prompt_studio/prompt_studio_core_v2/prompt_studio_helper.py
  • backend/prompt_studio/prompt_studio_registry_v2/constants.py
  • backend/prompt_studio/prompt_studio_registry_v2/prompt_studio_registry_helper.py
  • backend/prompt_studio/prompt_studio_v2/migrations/0015_toolstudioprompt_extraction_inputs_and_more.py
  • backend/prompt_studio/prompt_studio_v2/models.py
  • backend/workflow_manager/endpoint_v2/source.py
  • backend/workflow_manager/workflow_v2/file_history_helper.py
  • backend/workflow_manager/workflow_v2/models/file_history.py
  • frontend/src/components/custom-tools/prompt-card/PromptCardItems.jsx
  • unstract/sdk1/pyproject.toml
  • unstract/sdk1/src/unstract/sdk1/constants.py
  • unstract/sdk1/src/unstract/sdk1/llm.py
  • unstract/sdk1/src/unstract/sdk1/rasteriser.py
  • unstract/sdk1/src/unstract/sdk1/vision.py
  • unstract/sdk1/tests/test_complete_vision.py
  • unstract/sdk1/tests/test_derive_vision_mode.py
  • unstract/sdk1/tests/test_rasteriser.py
  • unstract/sdk1/tests/test_vision_messages.py
  • workers/executor/executors/answer_prompt.py
  • workers/executor/executors/constants.py
  • workers/executor/executors/legacy_executor.py
  • workers/file_processing/structure_tool_task.py
  • workers/tests/test_single_pass_vision_guard.py
  • workers/tests/test_structure_tool_task.py

Comment on lines 2016 to +2017
TSPKeys.FILE_PATH: doc_path,
TSPKeys.SOURCE_FILE_PATH: doc_path,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

SOURCE_FILE_PATH points to the extracted text file, not the source document.

In _fetch_response, doc_path is reassigned at Line 1865 to the extract .txt path (and again to the summarize path at Line 1898), while the original source file is retained in file_path. Setting TSPKeys.SOURCE_FILE_PATH: doc_path therefore passes the extracted-text path downstream. run_vision_completion uses SOURCE_FILE_PATH to read and rasterise the original PDF, so vision-mode prompts on this IDE single-prompt path will fail to rasterise (no pages / read error). Every sibling builder (build_fetch_response_payload Line 885, build_bulk_fetch_response_payload Line 1076, build_single_pass_payload Line 1238) correctly uses the original source path here.

🐛 Proposed fix
             TSPKeys.FILE_PATH: doc_path,
-            TSPKeys.SOURCE_FILE_PATH: doc_path,
+            TSPKeys.SOURCE_FILE_PATH: file_path,
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
TSPKeys.FILE_PATH: doc_path,
TSPKeys.SOURCE_FILE_PATH: doc_path,
TSPKeys.FILE_PATH: doc_path,
TSPKeys.SOURCE_FILE_PATH: file_path,
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/prompt_studio/prompt_studio_core_v2/prompt_studio_helper.py` around
lines 2016 - 2017, In `_fetch_response`, `SOURCE_FILE_PATH` is being set from
the mutated `doc_path` instead of the original source document path. Update the
payload construction to use `file_path` for `TSPKeys.SOURCE_FILE_PATH` (while
keeping `TSPKeys.FILE_PATH` as-is), matching the behavior in
`build_fetch_response_payload`, `build_bulk_fetch_response_payload`, and
`build_single_pass_payload` so `run_vision_completion` can rasterize the
original PDF correctly.


logger = logging.getLogger(__name__)

MAX_VISION_PAGES = int(os.environ.get("MAX_VISION_PAGES", "10"))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Validate MAX_VISION_PAGES before using it as a default.

A non-numeric env value will crash module import here, and 0/negative values later produce broken truncation semantics ([:0] returns no pages, [:-1] drops only the last page). This should be parsed defensively and clamped to a positive fallback so a bad deploy-time config does not take down vision processing.

Proposed fix
-MAX_VISION_PAGES = int(os.environ.get("MAX_VISION_PAGES", "10"))
+def _read_max_vision_pages() -> int:
+    raw_value = os.environ.get("MAX_VISION_PAGES", "10")
+    try:
+        value = int(raw_value)
+    except ValueError:
+        logger.warning(
+            "Invalid MAX_VISION_PAGES=%r; falling back to 10",
+            raw_value,
+        )
+        return 10
+
+    if value <= 0:
+        logger.warning(
+            "Non-positive MAX_VISION_PAGES=%d; falling back to 10",
+            value,
+        )
+        return 10
+
+    return value
+
+
+MAX_VISION_PAGES = _read_max_vision_pages()

Also applies to: 75-76, 120-127

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@unstract/sdk1/src/unstract/sdk1/rasteriser.py` at line 22, Validate the
MAX_VISION_PAGES setting defensively in rasteriser.py instead of converting it
directly at import time; a bad or non-numeric env value should not crash module
import. Update the MAX_VISION_PAGES initialization to parse the env var safely,
fall back to a positive default on parse failure, and clamp zero/negative values
to that fallback. Apply the same positive-default handling anywhere
MAX_VISION_PAGES is consumed in the rasterisation/truncation logic so the
page-slicing behavior stays correct.

Comment on lines +25 to +35
@dataclass
class RenderSettings:
"""Configuration for PDF page rendering and preprocessing."""

dpi: int = 150
max_dimension: int = 1568

@property
def scale(self) -> float:
"""Convert DPI to pypdfium2 scale factor (PDF points are 72 DPI)."""
return self.dpi / 72.0

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Reject non-positive render settings up front.

dpi <= 0 and max_dimension <= 0 currently fail much later inside pdfium/Pillow with opaque errors. Since RenderSettings is part of the new SDK surface, it should enforce a valid range at construction time.

Proposed fix
 `@dataclass`
 class RenderSettings:
     """Configuration for PDF page rendering and preprocessing."""
 
     dpi: int = 150
     max_dimension: int = 1568
+
+    def __post_init__(self) -> None:
+        if self.dpi <= 0:
+            raise ValueError("dpi must be > 0")
+        if self.max_dimension <= 0:
+            raise ValueError("max_dimension must be > 0")

Also applies to: 63-66, 132-143

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@unstract/sdk1/src/unstract/sdk1/rasteriser.py` around lines 25 - 35, Reject
invalid render settings at construction time by validating RenderSettings so dpi
and max_dimension must both be positive, instead of letting pdfium/Pillow fail
later. Add the checks in the RenderSettings dataclass lifecycle (or an
equivalent constructor hook) and ensure any callers in the rendering flow,
including the rasteriser code that uses RenderSettings.scale and downstream size
calculations, surface a clear validation error before rendering starts.

Comment on lines +89 to +99
Raises:
ValueError: If file_bytes is empty or not a valid PDF.
"""
if not file_bytes:
raise ValueError("file_bytes is empty")

if settings is None:
settings = RenderSettings()

pdf = pdfium.PdfDocument(file_bytes)
try:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail
printf 'FILE:\n'
git ls-files 'unstract/sdk1/src/unstract/sdk1/rasteriser.py'
printf '\nCONTEXT (lines 1-220):\n'
cat -n unstract/sdk1/src/unstract/sdk1/rasteriser.py | sed -n '1,220p'
printf '\nSEARCH PdfiumError / PdfDocument:\n'
rg -n "PdfiumError|PdfDocument" unstract/sdk1/src/unstract/sdk1/rasteriser.py unstract/sdk1/src -S

Repository: Zipstack/unstract

Length of output: 1947


🌐 Web query:

pypdfium2 PdfDocument load failure PdfiumError documentation

💡 Result:

In pypdfium2, the PdfDocument class raises a PdfiumError when a PDF fails to load [1][2]. This exception is a subclass of RuntimeError that indicates a failure reported by the underlying PDFium library [1][3]. Key details regarding PdfiumError and document loading: 1. Exception Structure: Since version 5.0.0, the PdfiumError exception is annotated with an err_code attribute [4][5]. This allows for programmatic handling of specific error subtypes, replacing the previous, less stable method of parsing error strings [5]. 2. Error Codes: The library provides an ErrorToStr dictionary that maps PDFium error constants to human-readable strings [1][3]. Common error codes include: - 2: File access error - 3: Data format error - 4: Incorrect password error - 5: Unsupported security scheme error - 6: Page not found or content error [1] 3. Handling: When using PdfDocument, you should wrap the initialization in a try-except block to catch PdfiumError [1][2]. If you are working with encrypted files, an Incorrect password error (code 4) is frequently raised if an invalid or missing password is provided [5]. 4. Related Exceptions: - FileNotFoundError is raised separately if the provided file path is invalid or non-existent, rather than a PdfiumError [1][2]. - PdfiumWarning is issued for failures related to XFA forms, which may accompany a PdfiumError or occur during other operations [1][4]. For more precise error identification, you can inspect the err_code attribute of the caught PdfiumError instance [1].

Citations:


Normalize PDF load failures to ValueError. pdfium.PdfDocument(...) raises PdfiumError on invalid input, so callers will see a library-specific exception instead of the SDK’s documented ValueError. Catch it here and re-raise ValueError("file_bytes is not a valid PDF").

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@unstract/sdk1/src/unstract/sdk1/rasteriser.py` around lines 89 - 99,
Normalize the PDF loading failure in rasteriser.py so the documented ValueError
is always raised from the PdfDocument construction path. In the function that
creates pdfium.PdfDocument(file_bytes), catch the library-specific PdfiumError
around that call and re-raise ValueError with the existing invalid-PDF message,
keeping the empty-bytes check and RenderSettings handling unchanged.

Comment on lines +38 to +42
def _make_litellm_response(
content: str = "extracted value",
finish_reason: str = "stop",
usage: dict | None = None,
) -> dict:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Widen content type to str | None.

_make_litellm_response is called with content=None (line 164) to simulate a refused/empty response, but the parameter is annotated content: str. This is the SonarCloud type mismatch flagged on line 164.

🔧 Proposed fix
 def _make_litellm_response(
-    content: str = "extracted value",
+    content: str | None = "extracted value",
     finish_reason: str = "stop",
     usage: dict | None = None,
 ) -> dict:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _make_litellm_response(
content: str = "extracted value",
finish_reason: str = "stop",
usage: dict | None = None,
) -> dict:
def _make_litellm_response(
content: str | None = "extracted value",
finish_reason: str = "stop",
usage: dict | None = None,
) -> dict:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@unstract/sdk1/tests/test_complete_vision.py` around lines 38 - 42, The helper
_make_litellm_response currently annotates content as str, but it is called with
None to model an empty/refused completion. Update the signature of
_make_litellm_response to accept str | None for content, and keep the
response-building logic in sync with that nullable value so the test case using
content=None is type-correct.

Source: Linters/SAST tools

tool_id: str = params.get(PSKeys.TOOL_ID, "")
run_id: str = context.run_id
file_path = params.get(PSKeys.FILE_PATH)
source_file_path: str = params.get(PSKeys.SOURCE_FILE_PATH, "")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Remove the unused source_file_path local in _handle_answer_prompt.

This local is assigned but never referenced within _handle_answer_prompt (the value reaches _execute_single_prompt independently via context.executor_params at Line 1658/1663). It's dead code and matches the SonarCloud warning.

🧹 Proposed cleanup
         file_path = params.get(PSKeys.FILE_PATH)
-        source_file_path: str = params.get(PSKeys.SOURCE_FILE_PATH, "")
         doc_name = str(params.get(PSKeys.FILE_NAME, ""))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
source_file_path: str = params.get(PSKeys.SOURCE_FILE_PATH, "")
🧰 Tools
🪛 GitHub Check: SonarCloud Code Analysis

[warning] 1352-1352: Remove the unused local variable "source_file_path".

See more on https://sonarcloud.io/project/issues?id=Zipstack_unstract&issues=AZ8ZVOe03i2zxRL0UHrW&open=AZ8ZVOe03i2zxRL0UHrW&pullRequest=2131

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@workers/executor/executors/legacy_executor.py` at line 1352, Remove the
unused source_file_path local from _handle_answer_prompt in LegacyExecutor,
since it is assigned from params but never referenced in that method; keep the
existing flow that passes the value through context.executor_params to
_execute_single_prompt, and leave the surrounding prompt-handling logic
unchanged.

Source: Linters/SAST tools

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant